Importing data from web pages using Power BI

Sometimes the data we need for our data models can only found on web pages. The challenge with web pages is that they are often poorly structured and contain a mix of content types such as tables, lists and images, this makes importing data challenging so is often omitted in most ETL tools. Fortunately importing data from websites is made pretty easy in Power BI. In this post, we will import a list of countries along with its population

URLhttps://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)

As with all query in Power BI, we begin with the [Get Data] button in the Power BI ribbon.

Image1.png
Get Data menu

By selecting the [Web] option in the [Get Data] menu a dialogue box appears where the URL of the web page can be entered. You are also able to switch to the [Advanced] option where you will be able to provide additional values such as timeout duration and request header parameters.

Image2.PNG
Get Data From Web dialogue box

After configuring the URL of the page and any advanced options press [OK], Power BI will then download the content of the website. once the web page has been downloaded the query navigator will appear with a list of all the tables found on the page, there you will be able to browse each table to view its content as well as access the entire document (conveniently called Document).

Image3
Query navigator with list of available data sets

In this example, we see three objects in the query navigator, one being the document and the other two are tables. One of these contains the information we are looking for while the other is the appendix at the bottom of the page that is of little use to us so will be ignored. The table we are interested is called [Countries and areas ranked by population in 2016] (yours may differ). After selecting this tables (remember you can select more than one if you like), you can then the table and its content into your data model by pressing the [Load] button, this loads the table into the data model.

Image4.PNG
Table loaded from web page into data model

As is often the case with information coming from web pages, the data is poorly structured and requires some additional cleansing to make it useful. Some of the transformations I made to clean this data set were

  • Removed the first row as this contained a summary row.
  • Removed unneeded columns such as [rank] and [Population(1 July 2015) ], I did this by selecting the columns I wanted to keep, then selecting the [Remove other columns] option in the ribbon to remove the remaining columns.
  • Finally, I gave each column a more appropriate name.

My final data set looks like this

Image5.PNG
Country table with population

You can reproduce these steps by pasting the following M code into the advanced editor.

let
Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)")),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Rank", type text}, {"Country or area", type text}, {"UN continental #(lf)region[1]", type text}, {"UN statistical #(lf)region[1]", type text}, {"Population #(lf)(1 July 2016)[2]", type text}, {"Population #(lf)(1 July 2015)[2]", type text}, {"Change", type text}}),
#"Removed Top Rows" = Table.Skip(#"Changed Type",1),
#"Removed Other Columns" = Table.SelectColumns(#"Removed Top Rows",{"Country or area", "UN continental #(lf)region[1]", "UN statistical #(lf)region[1]", "Population #(lf)(1 July 2016)[2]"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Country or area", "Country"}, {"UN continental #(lf)region[1]", "Continent"}, {"UN statistical #(lf)region[1]", "Region"}, {"Population #(lf)(1 July 2016)[2]", "Population"}})
in
#"Renamed Columns"

Conclusion

In this post, we saw how easy it was to import data from a web-page into our data model making it easy to enrich our data using publically available information from the internet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s