Welcome back to this week’s edition of the Power BI blog series. This week, we look at how to view links in a webpage.
Using links on webpages is a great way to offer additional information without overcomplicating a page. Extracting those links in way that can be refreshed at the click of a button, is something that Power BI is great at. To show this in action, we will start by creating a ‘Blank Query’ from the ‘Get data’ dropdown:
This takes us to the Power Query Editor:
The Blank Query has been given the name Query1. We start by renaming this to SumProduct Links.
To extract the links, we will be using two M functions, WebBrowserContents() and Html.Table():
WebBrowserContents()
Web.BrowserContents(url as text, optional options as nullable record) as text
This returns the Html for the specified url as viewed by a web browser. An optional record parameter, options, may be provided to specify additional properties. The record can contain the following fields:
- WaitFor: this specifies a condition to wait for before downloading the HTML, in addition to waiting for the page to load (which is always done). This can be a record containing Timeout and / or Selector fields. If only a Timeout is specified, the function will wait the amount of time specified before downloading the HTML. If both a Selector and Timeout are specified, and the Timeout elapses before the Selector exists on the page, an error will occur. If a Selector is specified with no Timeout, a default Timeout of 30 seconds is applied.
Html.Table()
Html.Table(html as any, columnNameSelectorPairs as list, optional options as nullable record) as table
This returns a table containing the results of running the specified columnNameSelectorPairs (CSS selectors) against the provided html. An optional record parameter, options, may be provided to specify additional properties. The record can contain the following fields:
We will combine these functions to achieve our goal, but to show what each one does, we will use Web.BrowserContents() first.
The M code we have used is:
= Web.BrowserContents(“https://www.sumproduct.com/”)
Before we can see the contents of the famous website, sumproduct.com, we need to specify the credentials:
We can use ‘Anonymous’ access, since no username or password is required. The connection is made with the website and the contents appear in the Power Query Editor:
The next step is to extract the links from this Html code. To do this we use the correct CSS selectors. We don’t claim to be experts in CSS, however, lists of selectors and their purposes can be Googled!
The first issue to solve, is how to enter another step. We can right-click on the Source step in the ‘APPLIED STEPS’ window:
We choose ‘Insert Step After’, and we can access the Formula bar:
We can now enter the Hmtl.Table() step:
The M code we have used here is:
= Html.Table(Source, {{“Link”, “a[href^=””http””]”, each [Attributes][href]}})
The CSS selector used here searches all the <a> elements (as shown in the source step earlier) where the href attribute begins ‘http’, signifying a hyperlink to another page.
Check back next week for more Power BI tips and tricks!
Be the first to comment