Search This Blog

Sunday, May 27, 2018

Power BI with Mapbox and Volcanic Ring of Fire

(2018-May-27) With recent Hawaiian volcano activity, it would be interesting to see how other similar events have shaped our Earth planet. The NOAA's National Centers for Environmental Information (NCEI) provides public access to the Volcano Location Database - https://www.ngdc.noaa.gov/nndc/struts/form?t=102557&s=5&d=5. And currently, it contains information about 1571 known volcanic eruptions.







I was able to extract this dataset into my Power BI data model and decided to use the new custom map visualization created by Mapbox that became available with this monthly Power BI update. 

The working experience with Mapbox is similar to other custom and native map visualization, however, it has one new additional feature that would require you to register and get your personal Mapbox access token.

As soon as you place this visualization control into your report canvas and then link it with your geo dataset, the following initial screen will show up:




Then after the initial steps of registration, you are able to get this access token to place into the Power BI Viz format panel:





Going back to the actual volcanic map, I found the proof, that the Ring of Fire does exist and it's around the Pacific Ocean.



And here is the actual Power BI report with the volcanic interactive map using Mapbox (which currently is only supported in Chrome and Firefox) :



Happy data adventures!

Sunday, May 20, 2018

Power BI with Maps: Choose Your Destination

(2018-May-20) A childhood dream to travel around the world fueled by reading Gulliver's Travels stories and Robinson Crusoe attempts to survive on a deserted island. Those books were filled with geographical description and map locations. I think I always loved working with maps and I still do! There is something that could captivate your mind and propel your imagination for traveling when you immerse yourself in maps.

Power BI itself a great tool that could feed your appetite for an adventure using built-in and custom map visualizations.

The built-in (or native) map visualizations are available right away when you install the Power BI desktop for the first time and Microsoft has provided sufficient information for you to get started working with them:
- Map
- Filled Map
- ArgGIS Map
- Shape Map

And there is another set of custom map visualization that we can use in building data analytics in Power BI (not a complete list):
- Flow map by Weiwei Cui
- Heatmap by Weiwei Cui
- Globe Map by Microsoft 
- Icon Map by Altius
- Mapbox Visual by Mapbox

Since there is more information about the built-in map visualizations then I'd like to share my experience working with the custom ones.



But before I do this, let me spend a few minutes describing how the Shape Map (still in Preview at the moment) works.

Shape Map
https://docs.microsoft.com/en-us/power-bi/desktop-shape-map

Once you enable this Preview feature in your desktop version of the Power BI then can you use a predefined set of Shapes:

or you can free your mind and upload a custom shapefile with Canadian provinces and Northen Territories, which I did from this site http://mapstarter.com/





Flow map by Weiwei Cui
https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104380901
https://weiweicui.github.io/PowerBI-Flowmap/

Flow map is a special type of network visualization that portrays movements of objects among geo-locations. 

I built a simple dataset with the list of Provinces and Territories capitals and linked them all to Ottawa (Canadian capital). The component itself has a very rich set of formatting settings which I liked the most; so the final visualization could look totally different by changing one of them.




Heatmap by Weiwei Cui

https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104381072
https://weiweicui.github.io/PowerBI-Heatmap

Heat maps are a type of visualization to show data density on a map. They are particularly helpful when you have a lot of (e.g., tens of thousands of) data points on the map and are mainly interested in their overall distribution. Technically, in a heatmap, data points are aggregated locally and mapped to colors (either gradient or quantile), so that we can make better sense of the density of the data from the colors while still being able to see and use the map. 

I already blogged about the use of the heatmaps in Power BI - http://datanrg.blogspot.com/2018/01/when-its-cold-outside-perhaps-power-bi.html and we can always use the built-in ArgGIS heat maps option as well to see alternative ways for your dataset density.



Globe Map by Microsoft 

https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104380799

One the best features of the GlobeMap is that it allows you to rotate the Globe and see it from different angles, and this is right within your Power BI report.

Globe Map is a 3D Map that makes the map exploration experience more immersive and magical. It provides the sense of connection to the data with the physical world. This, combined with our spatial ability, brings a new perspective to the data when presented as 3D objects. Globe Map also supports heat map on the spatial map. You can use a second measure for heat intensity and draw immediate attention to the right areas.

I used the Berkley Earth datasets - http://berkeleyearth.org/data/ for this 3D reporting which was already featured in my previous blog post - http://datanrg.blogspot.com/2016/02/excel-power-map-vs-power-bi-globe-map.html 



Icon Map by Altius
https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104381497

The Icon Map visual shows data-bound images & lines or circles on a map and allows you to render images and lines or circles on the map. 



Mapbox Visual by Mapbox

https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104381472
https://github.com/mapbox/mapboxgl-powerbi
https://www.mapbox.com/help/power-bi/

The Mapbox makes heatmaps, circles, and clusters using big location data.
Your location data is more than just points on a map. With the Mapbox Visual for Power BI, quickly create beautiful custom map visuals that answer your business question with fast performance.



And here is the actual Power BI report with some of the custom map visualizations:



Happy data adventures!

Saturday, March 31, 2018

Geometry Objects in SQL Server using Latitude/Longitude coordinates

(2018-Mat-31) Support for spatial geometry functions was initially introduced in SQL Server 2008 and then it was greatly improved in Denali (SQL Server 2012) version of the product - https://docs.microsoft.com/en-us/sql/relational-databases/spatial/spatial-data-sql-server

Back then, when I was reading about the spatial features in SQL Server, I wondered where I could use this functionality with the creation of geo points, line polygons; however, working with geometry objects a bit more helped me to see some real use cases where this could be applicable.

Let's say you have the London Tower Bridge and you want to know when a taxi cab with your very important guest crosses this bridge. By creating a geo polygon for the bridge you can use spatial functions in SQL Server to check if a taxi cab GPS tracker coordinates intersect with the bridge polygon geometry object - STIntersects (geometry Data Type).





First, I've located my map objects in the QGIS tool using Google Satelite layer:




Then I created a polygon that would resemble the bridge area over the Thames River:




Then by extracting the nodes of my polygon, I can see each individual geo points coordinates:





And then after migrating this dataset into my dbo.LondonBridgeCoordinates SQL Server table, I can use this script to create the very same geometry object in my database:


-- POLYGON definition
DECLARE @coords nvarchar(max);

-- POLYGON CREATION FROM LON/LAT COORDINATES
WITH polygon
AS (SELECT
  id,
  latitude,
  longitude,
  CONCAT(LTRIM(STR(longitude, 22, 6)), ' ', LTRIM(STR(latitude, 22, 6))) AS coords
FROM [dbo].[LondonBridgeCoordinates]),
-- LON/LAT CONCATENATION
polygon_coordinates
AS (SELECT
  STUFF((SELECT
    ', ' + coords
  FROM polygon p
  ORDER BY p.id
  FOR xml PATH (''), TYPE)
  .value('(./text())[1]', 'VARCHAR(MAX)'), 1, 2, '') AS polygon_coords,
  (SELECT
    CONCAT(LTRIM(STR(longitude, 22, 6)), ' ', LTRIM(STR(latitude, 22, 6))) AS coords
  FROM [dbo].[LondonBridgeCoordinates]
  WHERE id = 1)
  AS first_coordinate)

SELECT  @coords = (polygon_coords) FROM polygon_coordinates
-- LET’S SEE HOW A POLYONG WOULD LOOK LIKE
SELECT  geometry ::STPolyFromText('POLYGON((' + @coords + '))', 4326).MakeValid()






I can also save the output of the STPolyFromText function as a database object and use it with all other available spatial functions in SQL Server. 

Please let me know if you can find other uses besides checking geo objects intersection. There are so many possibilities!

Saturday, February 10, 2018

Power BI with different Network Visualizations

(2018-Feb-10) A few days ago Microsoft provided another update for its desktop version of the Power BI data analytical tool - https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-february-2018-feature-summary/. Among all the major updates in this release, there was one new custom network visualization added by ZoomCharts. It wasn't the first network visualization available in the Power BI and I played a little bit with some of them in the past.

Currently, there are 5 network visual controls that you can add to your Power BI reports:
- Network Navigator Chart by Microsoft
- Force-Directed Graph by Microsoft
- Journey Chart by MAQ Software
- Social Network Graph by Arthur Graus
- Network Visualization by ZoomCharts

I used an open dataset from one of my previous reports based on the City of Toronto Lobbyist Registry and for testing various network visualizations I've used a portion of this data that describes all the communications between registered lobbyist and city officials.





Here is a data model of the original Power BI report and for this test case I only used the highlighted la_communication table:


Most of the network visualizations in Power BI share a similar concept, your dataset needs to have a least one descriptive attribute for a Source and another one as a Target for your network relationship. Let's take a look how all those five visual components are similar and different to each other.



Network Navigator Chart
https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104380795?&tab=Overview












Network Navigator lets you explore node-link data by panning over and zooming into a force-directed node layout (which can be precomputed or animated live).

From an initial overview of all nodes, you can use simple text search to enlarge matching nodes in ways that guide subsequent navigation. Network nodes can also be color-coded based on additional attributes of the dataset and filtered by linked visuals.

You can set source and target nodes using your dataset, it would also help if you add additional attributes to set different colors for your nodes. The difficult part is that with a very large dataset, this visualization becomes very slow to process and animate all the relationships between the nodes.



Force-Directed Graph
https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104380764?&tab=Overview













This custom visual implements a D3 force layout diagram with curved paths. The thickness of the path represents the weight of the relationship between the nodes.


Since the relationship and interconnection between a large set of entities could be very complex, the visual positions the nodes in such a way that there are few crossings as possible, making the exploration experience easy, fun. Users can also adjust the layout manually by simply dragging the nodes around.

This visualization is my favorite, it has its own animation engine. Relationships between a source and target nodes are marked with arrows which help to identify who initiates a communication and who is a recipient.


Journey Chart
https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104380989?&tab=Overview











Journey Chart by MAQ Software enables users to display complex, multi-stage lead paths in a clean, uncluttered visual. Nodes represent categories and vertices represent relationships between categories. The bigger the node or vertex, the larger the value. Customizable node colors make distinguishing between categories easy.

This visualization doesn't have a source or target nodes, it rather creates a metric journey between several categories (two minimum); it also has a root node and then it shows the first level of descriptive category and a relationship is then built using next level category. It's fun to play with this chart, you can move nodes and unfold its vertices; I would highly recommend this visual chart for your own data discovery.


Social Network Graph
https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104381236?&tab=Overview













Super easy network visualization to use; it has standard source and target nodes that could be linked to you dataset attributes. Along with basic node relationships, you can assign graphics node representations (aka photos) using URL based links. However, if you don't have those links then your report's network graph looks like a bunch of empty animated circles because the current version of this visualization doesn't support text label settings, only tooltip messages are available.


Network Visualization
https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104381419?tab=Overview











ZoomCharts Network Visual enables tabular data visualization, exploration, and filtering using network layout. Clicking on any node acts as a filter for the rest of the dashboard, making it easy to drill-through data and boosting Power BI productivity.

In one way this network chart resembles the Journey chart by MAQ Software since it allows you to list all of the chart nodes by listing them in the order, like categories. From the other angle, it works like two other Microsoft mainstream network visualizations: you can see how nodes relationships are constructed and metric values define the size of each node. Plus the ZoomCharts network visualization has additional user controls that you can adjust right within your Power BI report, which is super cool! The one thing that I didn't like about this chart was that I couldn't hide or remove legend of my nodes (Lobbyist and POH Office name in my dataset case).

And, if you have reached to this point of reading this blog post, here is the actual Power BI report, fully interactive and accessible. I will let you decide which network Power BI visualizations you would like to use, and I already have my own preferences :-)




(2018-Feb-11) Update: On the set of unfiltered 13177 records of Source to Target pairs, the fastest network visualization in Power BI became the visual control by ZoomCharts and the slowest one was Force-Directed Graph by Microsoft.

Here is a brief summary of testing each of the individual custom visualizations. 

- Network Navigator - 20 seconds
- Force-Directed Graph - more than 300 seconds and I still didn't get the final network finished rendering (it works well with smaller datasets though).
- Journey Chart by MAQ Software - 120 seconds
- Social Network Graph - 40 seconds
- Network Visualization - 15 seconds

Sunday, February 4, 2018

Power BI with Azure Data Lake - where the water doesn't run dry

(2018-Feb-04) I've had a case working with very large data files as a source for my heat map visualizations in Power BI, where the main file was slightly above 600 MB - http://datanrg.blogspot.com/2018/01/when-its-cold-outside-perhaps-power-bi.html. Not a big deal when you have only a few such data files that you can directly be connected in the Power BI Query Editor. However, it's a whole different story, when you need to maneuver between several hundreds of thousands of such large data files.

A quick side thought though, why don't we try to use the Azure Data Lake for storing such files and then using them for our Power BI data modeling:



Here are some of the highlights of the Azure Data Lake Store:
Built for Hadoop: A Hadoop Distributed File System for the Cloud
Unlimited storage: No fixed limits on file size, account size, or the number of files
Performance Tuned for Big Data: Optimized for massive throughput to query and analyze any amount of data
All Data: Store data in its native format without prior transformation 

So here is my story of putting the Azure Data Lake to work with my original data file.
First, I created my instance of the Azure Data Lake store and placed my 629 MB data file in the root folder of its HDFS repository:




Then I went straight to my Power BI desktop environment and connected to my Azure Data Lake Store that I had just created:



For the first time, it will ask for your credential to authentication yourself with your Azure subscription and then it will pass you to explore your data files repository in Azure using the Power BI Query Editor, which is really groovy by itself!





In this testing exercise, I'm only interested in my sourcing data.txt file, and by clicking the Binary link I immerse myself into the very structure of this file, which is basically a formation of 7 columns and ~15 million rows of data.

Here is a quick look at how the initial table extract looks in M language:

let
    Source = DataLake.Contents("https://rayisdls.azuredatalakestore.net"),
    #"data txt" = Source{[Name="data.txt"]}[Content],
    #"Imported CSV" = Csv.Document(#"data txt",[Delimiter="#(tab)", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(#"Imported CSV",{{"Column1", Int64.Type}, {"Column2", Int64.Type}, {"Column3", type number}, {"Column4", type number}, {"Column5", type number}, {"Column6", Int64.Type}, {"Column7", Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "Station ID"}, {"Column2", "Series Number"}, {"Column3", "Date"}, {"Column4", "Temperature"}, {"Column5", "Uncertainty"}, {"Column6", "Observations"}, {"Column7", "Time of Observation"}})
in
    #"Renamed Columns"

It's worthwhile to mention how easy the M language is in converting HDFS data files into a tabular stucture with this Csv.Document function.

And then to transform my Date column value as a year and decimal fraction of the midpoint of the time period being represented, I just copied my previously used M script for this from my previous Power BI report:




Here is the final script that I used in my Azure-based dataset:


let
    Source = DataLake.Contents("https://rayisdls.azuredatalakestore.net"),
    #"data txt" = Source{[Name="data.txt"]}[Content],
    #"Imported CSV" = Csv.Document(#"data txt",[Delimiter="#(tab)", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(#"Imported CSV",{{"Column1", Int64.Type}, {"Column2", Int64.Type}, {"Column3", type number}, {"Column4", type number}, {"Column5", type number}, {"Column6", Int64.Type}, {"Column7", Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "Station ID"}, {"Column2", "Series Number"}, {"Column3", "Date"}, {"Column4", "Temperature"}, {"Column5", "Uncertainty"}, {"Column6", "Observations"}, {"Column7", "Time of Observation"}}),
    #"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Time of Observation", "Observations", "Uncertainty", "Series Number"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Custom", each [Date]),
    #"Renamed ColumnsB" = Table.RenameColumns(#"Added Custom",{{"Custom", "Year"}}),
    #"Inserted Round Down" = Table.AddColumn(#"Renamed ColumnsB", "Round Down", each Number.RoundDown([Year]), Int64.Type),
    #"Added Custom1" = Table.AddColumn(#"Inserted Round Down", "Custom", each ( [Date] - [Round Down] ) * 12 + 0.5),
    #"Changed TypeB" = Table.TransformColumnTypes(#"Added Custom1",{{"Custom", Int64.Type}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Changed TypeB",{"Year"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Removed Columns1",{{"Round Down", "Year"}, {"Custom", "Month"}}),
    #"Removed Columns2" = Table.RemoveColumns(#"Renamed Columns1",{"Date"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Columns2",{{"Year", type text}, {"Month", type text}}),
    #"Added Custom2" = Table.AddColumn(#"Changed Type1", "Custom", each [Year]&"-"&[Month]&"-01"),
    #"Changed Type2" = Table.TransformColumnTypes(#"Added Custom2",{{"Custom", type datetime}}),
    #"Renamed Columns2" = Table.RenameColumns(#"Changed Type2",{{"Custom", "Date"}}),
    #"Final Change" = Table.TransformColumnTypes(#"Renamed Columns2",{{"Year", Int64.Type}, {"Month", Int64.Type}})
in
    #"Final Change"

then I was able to get to the very same table that I had previously used in my Power BI heat map visualizations.



Basically, I just replaced my originat data table with the very same structure but being sourced from the Azure Data Lake Store, and Power BI data integration experience was only slightly changed at the very beginning. 

And while you have your data files in this lake then your data waters won't run dry!

Saturday, January 20, 2018

Get Data experience from Excel to Power BI and SQL Server Analysis Services 2017

(2018-Jan-20) It has been a long journey for the Excel data integration tool Power Query from its early version as an external add-in component in 2010 and 2013 Excel to the built-in feature of the Excel 2016. 

Power Query for Excel provides a seamless experience for data discovery, data transformation and enrichment for data analysts, BI professionals and other Excel users. With Power Query, you can combine data from multiple data sources and shape it in order to prepare the data for further analysis in tools like Excel and Power Pivot, or visualization in tools like Power View and Power Map.

It only takes to click New Query button of the Data ribbon to start working with Power Query in Excel, and then the whole experience of selecting different data source begins:



Currently, Excel Power Query allows creating 30 different types of data source connections.

Limiting your data discovery experience with the Excel tool, you tend to ask yourself about the existing worksheets and data models constraints that may prevent you to work with larger file datasets or connecting to a wider range of server databases.

Some of the existing limitations of data modeling in Excel 2016 are:

A) Total number of rows and columns on a worksheet:
  • 1,048,576 rows by 16,384 columns

B) Maximum limits of memory storage and file size for Data Model workbooks:
  • 32-bit environment is subject to 2 gigabytes (GB) of virtual address space, shared by Excel, the workbook, and add-ins that run in the same process. A data model’s share of the address space might run up to 500 – 700 megabytes (MB), but could be less if other data models and add-ins are loaded.
  • 64-bit environment imposes no hard limits on file size. Workbook size is limited only by available memory and system resources
Making a further step into the right direction of the data modeling and data discovery experience with Power BI, you just have to say, where is my data, and in a natural way it gives the option but choosing the Get Data path:




Currently, Power BI allows creating 79 different types of data source connections.


Power BI data modeling limitations:

A) Power BI Desktop

  • there is no restriction on how large a desktop Power BI file can be locally

B) Power BI Service Online (Free and Pro licenses):

  • workspace limit is 10 GB
  • dataset limit is 1 GB

C) Power BI Service Online (Premium license):

  • workspace limit is 100 TB
  • dataset limits are (they may change in future):
    • P1 SKU < 3 GB
    • P2 SKU < 6 GB
    • P3 SKU < 10 GB

The exciting thing is that with SQL Server 2017 Analysis Services for tabular data models of 1400 compatibility level and higher, Microsoft has introduced the very same Get Data experience:



And even Data Query editor with M languages capabilities is now available in Visual Studio, isn't this amazing! 



What's new in SQL Server 2017 Analysis Services

Currently, Visual Studio Data Tools for SSAS Tabular allows creating 35 different types of data source connections. But even if the Power BI experience has a lot more variety of various data types to connect, SSAS Tabular now has more advantages of migrating a self-service Power BI models into an organization analytics platform. 

And from the data size limitations perspective, the sky is the limit, or Azure, to put it correctly :-)