Search This Blog

Saturday, January 20, 2018

Get Data experience from Excel to Power BI and SQL Server Analysis Services 2017

(2018-Jan-20) It has been a long journey for the Excel data integration tool Power Query from its early version as an external add-in component in 2010 and 2013 Excel to the built-in feature of the Excel 2016. 

Power Query for Excel provides a seamless experience for data discovery, data transformation and enrichment for data analysts, BI professionals and other Excel users. With Power Query, you can combine data from multiple data sources and shape it in order to prepare the data for further analysis in tools like Excel and Power Pivot, or visualization in tools like Power View and Power Map.

It only takes to click New Query button of the Data ribbon to start working with Power Query in Excel, and then the whole experience of selecting different data source begins:



Currently, Excel Power Query allows creating 30 different types of data source connections.

Limiting your data discovery experience with the Excel tool, you tend to ask yourself about the existing worksheets and data models constraints that may prevent you to work with larger file datasets or connecting to a wider range of server databases.

Some of the existing limitations of data modeling in Excel 2016 are:

A) Total number of rows and columns on a worksheet:
  • 1,048,576 rows by 16,384 columns

B) Maximum limits of memory storage and file size for Data Model workbooks:
  • 32-bit environment is subject to 2 gigabytes (GB) of virtual address space, shared by Excel, the workbook, and add-ins that run in the same process. A data model’s share of the address space might run up to 500 – 700 megabytes (MB), but could be less if other data models and add-ins are loaded.
  • 64-bit environment imposes no hard limits on file size. Workbook size is limited only by available memory and system resources
Making a further step into the right direction of the data modeling and data discovery experience with Power BI, you just have to say, where is my data, and in a natural way it gives the option but choosing the Get Data path:




Currently, Power BI allows creating 79 different types of data source connections.


Power BI data modeling limitations:

A) Power BI Desktop

  • there is no restriction on how large a desktop Power BI file can be locally

B) Power BI Service Online (Free and Pro licenses):

  • workspace limit is 10 GB
  • dataset limit is 1 GB

C) Power BI Service Online (Premium license):

  • workspace limit is 100 TB
  • dataset limits are (they may change in future):
    • P1 SKU < 3 GB
    • P2 SKU < 6 GB
    • P3 SKU < 10 GB

The exciting thing is that with SQL Server 2017 Analysis Services for tabular data models of 1400 compatibility level and higher, Microsoft has introduced the very same Get Data experience:



And even Data Query editor with M languages capabilities is now available in Visual Studio, isn't this amazing! 



What's new in SQL Server 2017 Analysis Services

Currently, Visual Studio Data Tools for SSAS Tabular allows creating 35 different types of data source connections. But even if the Power BI experience has a lot more variety of various data types to connect, SSAS Tabular now has more advantages of migrating a self-service Power BI models into an organization analytics platform. 

And from the data size limitations perspective, the sky is the limit, or Azure, to put it correctly :-)

Saturday, January 13, 2018

When it’s cold outside, perhaps Power BI heat maps can warm you up

(2018-Jan-13) You never know how the weather will behave itself, especially when the temperature gets dramatically changed from +10 to -10 within a day. During this Christmass and New Year holidays, it was very cold here in Canada. So I thought to explore some open data set that may contain historical temperature records and if possible geo-coordinates to locate those thermal data points to.

About two years ago I already had created a blog post “Excel Power Map vs. Power BI Globe Map visualization” based on the Berkley Earth datasets - http://berkeleyearth.org/data/. At that time I was probing the 3D Globe Power BI visualization, the one that you can use and rotate it right within your report.

This time I thought to check and compare heat map visualizations in Power BI. Currently, there are two ways to see variously heated or cooled geo areas in Power BI, either using a standard ArcGIS Map visualization or a custom Power BI Heatmap visualization from the Office store - https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104381072?src=office&tab=Overview

The Berkley Earth dataset for this data case exercise came in a form of multiple flat files:


I only used the following fields from the data.txt file:
- Station ID, 
- Date, 
- Temperature (C)
  
And from the site_detail.txt file I extracted the following fields to provide more geo details for each of the weather stations that there was temperature data point available:
- Station ID, 
- Station Name, 
- Latitude, 
- Longitude, 
- Country

And with some minor data transformation, I was able to build my Power BI data model with two tables:



Just a side note on a data extract technique, SQL Server Management Studio (SSMS) version 17.3 or later now has a new feature to Import Flat File along with the common Import and Export Data, you can read more about it here - https://docs.microsoft.com/en-us/sql/relational-databases/import-export/import-flat-file-wizard



So, after populating the data: temperature ~15.5 million records for the date time range from the year 1763 till 2017 and close to 43 thousand of various weather stations around the globe, I was able to place it into my Power BI report and test both types of the Heat Map visualizations. I additionally filtered my dataset then to show the Canadian weather stations landscape only and chose the year 2017 for the most recent data.



I could also change colors for both visualizations, either by manual selection or by specifying a color palette in the edit mode:







ArgGIS Map visualization has a lot of detailed settings that you can choose and apply for your various geo-analytics scenarios (and not only your heat map ones), however, the custom Heatmap visualization brings even more flexibility for adjusting your visualization that ArcGIS Map doesn't have.

So, there are no winners or loosers and I can easily use them both!

Friday, December 29, 2017

Power BI Sankey Chart: a case to review your Budget books

(2017-Dec-29) Sankey diagram (chart), named after Matthew H Riall Sankey, is a good visualization tool for information flow within a system. It can also identify main players (contributors) to different stages of this flow, where those stages can be textually described and flow thickness its distribution can be attributed to a numeric value of your dataset.

A usual use case for Stanley charts could a population migration between countries, as it has been shown in the Microsoft Power BI demo -  https://powerbi.microsoft.com/en-us/blog/visual-awesomeness-unlocked-sankey-diagram/. Or, I've found it's a very fascinated case to show voters flow between first and second rounds in the recent presidential elections in France - French election results: Macron’s victory in charts via @financialtimes



I have been thinking about another case where Sankey diagrams could be used as well, which is a budget analysis, with multiple revenues flow contributing to an overall budget and then spending streams coming out of the same budget.

So I took the recent City of Toronto Budget 2018 open dataset: https://www.toronto.ca/city-government/budget-finances/city-budget/. With naturally good pie charts for separate revenue and spending categories. Here is how my two initial datasets looked like in Power BI:



Then for Revenue, I renamed my customer Category column to Source and added one more additional custom column Destination with a value "Budget". For Spending dataset, I added one custom column Source with a value "Budget" and renamed its custom category to Destination. Thus to show how sourcing revenue streams flow to the budget and then they flow from the sourcing budget to the sending destination. Here is how my final Sankey Budget dataset looked like:



Which could be used for the Sankey Power BI visualization that mainly requires Source, Destination, and Weight of the flow.



And now it looks like a budget book with revenue and spending data streams together.
Here is my actual report in Power BI that show this all information along with the regular pic charts, please let me know what you think.


Sunday, November 19, 2017

Lost in Big Data, perhaps Small Data could help

(2017-Nov-19) I don't remember how I actually found this book "Small Data: The Tiny Clues that Uncover Huge Trends". Most likely, it came as one of the numerous suggestions from Amazon when I was purchasing other databases related book. And I don't regret of getting to read this "technology-agnostic" work created by Martin Lindstrom; marketing consultant very often hired by multinational corporations to investigate certain difficulties those companies may experience and potentially helping them to build their brands by performing so-called small data mining. He even calls himself a "forensic investigator of emotional DNA" by conducting many personal interviews with customers trying to understand the decisions those customers make consciously and it most cases unconsciously. Very interesting research work and practicality applied to each of his data investigation pursuits.



Recently, I read this interview "Big Data: Insights Or Illusions?" with Christian Madsbjerg about a use of bias assumptions from big data because of "intellectual laziness and insecurity". Another article "Hadoop Has Failed Us, Tech Experts Say" states that, "It’s (Hadoop) better than a data warehouse in that have all the raw data there, but it’s a lot worse in that it’s so slow.” And in another section, they outline that, "Unless you have a large amount of unstructured data like photos, videos, or sound files that you want to analyze, a relational data warehouse will always outperform a Hadoop-based warehouse".

I really enjoyed reading Martin Lindstrom's "Small Data" book, where along with his intricate data discovery techniques he tried to find a balance between big and small data. Automation of our lives with big data computing and unnoticed sometimes small data points that better descibe who we are. I like this quote from the book, "Big data and small data are partners in a dance, a shared quest for balance".

Tuesday, October 31, 2017

Book from a library. Learning never stops for data professionals.

[2017-Oct-31] If you have never read the "Iacocca: An Autobiography" book by Lee Iacocca, I highly recommend it. I read it first when I was 17 years old, walked into a local library in my search for any computer related books. By accident, found it on a bookshelf, read a couple of pages, was told that I couldn't borrow the book but was allowed to come and read it in the library. My next week was filled with personal stories from Lee Iacocca's book, I even carried a notebook to take notes.




It has been an interesting reading about Lee Iacocca's early life, his relationship with his father, and how then Lee began his career at Ford and then continued his passionate work at Chrysler auto company. Going back to Lee's father, the book is filled with his quotes and phrases, no wonder I had a notebook to write them down. And there was one his phrase that had planted deep roots in my memory. Where he said, "When times get tough, get into a food business because people will always need to eat".




During a recent Power BI World Tour in Toronto (https://www.pbiusergroup.com/pbiworldtour/locations/wt-toronto, Oct 4-5, 2017) I retold my story with the book and slightly paraphrased this quote during my Power BI presentation, by telling that we need to get ourselves into a data business because people will always need information to use. You could be a data analyst crunching big numbers or a basketball coach going through a very detailed report filled with statistics data from a previous game. And I believe that ability to know how to work with data will always be needed.

Data can be found everywhere and we can convert almost everything into a form of measurable facts. Power BI can help us to connect to this ever expanding world of data and enrich our information discoveries through the existing data connectors (file-based datasets, different databases, Azure and other online services). And this list of supported data connectors for Power BI keeps growing.



But I would also encourage to become more technology agnostic IT professionals by spending time mastering more sustainable skills: data modeling, effective data visualization, data science, etc. Because we need to get ourselves into a data business, that's why learning for data professionals never stops!

Happy data adventures!

Friday, September 1, 2017

Macros in ER/Studio Data Architect is your data modeling lifebelt

[2017-Sep-01] There is a well-known phrase, "the rescue of a drowning man is the drowning man's own job" from one of the best books by Ilf and Petrov  "The Twelve Chairs". And if you have ever used macros with the ER/Studio during your data modeling, then you may see how they can be your lifebelt among a large volume of your database design tasks and operations.

ER/Studio Data Architect is a great software tool with extensive data modeling capabilities across multiple relational and big data platforms and macros could automate repetitive tasks and save your time.



My favorites macros are:

1) Import Columns From Excel
2) Import/Export Domains From Excel
3) Name Foreign Constraints
4) Name Primary Constraints

The first two macros are a great time saver for designing tables with a big number of columns and the other two could help you to keep your entity object names aligned with a single naming convention.

Let's briefly take a look at each of those macros.

Import Columns From Excel
Your Excel files with a list of entity attributes should comply with the following structure:
1. Entity Name
2. Table Name
3. Attribute Name
4. Column Name
5. Data type
6. Table Definition
7. Attribute Definition
8. Nullability - 'NULL' or 'NOT NULL'
9. Primary Key - 'True' or 'False

Then when you choose to execute this macro a dialog box appears for you to upload your file. Once you finish this task, ER Studio could add or update attributes to the existing entity of your data model or create a new entity.


Import/Export Domains From Excel
This macro exports the domains and their properties to Excel. Each record in the Excel spread sheet represents domains from the data dictionary for all or selected entities. The domain properties can be further updated and imported back into an existing model or a new model using the "Import Domains From Excel" macro.





Name Foreign Constraints
This macro will prompt the user with a dialog to specify the naming convention for Foreign Constraints.  It will then name all the constraints using the parent and child table names. It will also make sure the name is unique by adding an index for the last characters of duplicate names.



Name Primary Constraints
This macro will name all primary key constraints with the given naming conventions. The table name with either a prefix or suffix.


I hope you will find your lifebelt next time when you do your data modeling in ER/Studio Architect!

Tuesday, August 8, 2017

Data modeling in Power BI using XML file with Parent and Children nodes

[2017-Aug-08] Support for XML structured files as a data source has been available in Excel, then was introduced into the Power Query data extract engine of the same office application which later migrated into the Power BI data modeling tool.

My recent experience working with XML data files at the last Toronto Open Data user group meeting - https://www.meetup.com/opentoronto/events/240911274/ challenged me to look at XML files at a different angle. 

A request was made to analyze Toronto Lobbyist Registry Open Data Set with a very common scenario for an XML data file to hold parent (or master) node elements along with a subset of children (or nested nodes).

Here is an example of this Toronto Lobbyist Registry open XML data file:


And here is how this file looked in the Power BI Query Editor:


Basically, the ROW parent node had a following list of elements:
SMNumber
- Status
- Type
- SubjectMatter
- Particulars
- InitialApprovalDate
- EffectiveDate
- ProposedStartDate
- ProposedEndDate

and several sub-nodes which are shown in Power BI query editor as tables:
- Registrant
- Communications
- etc.

My challenge was to build a data model that would enable these main dataset elements analysis along with its subset data elements. It wasn't that intuitive but eventually proved that Power BI is a very decent data modeling environment.

So, first, I made my parent node as a base table with SMNumber as its Primary Key (PK). 
Then for each of the sub-node datasets: 
1) I had to create a replica of the base table; let's take Communications table for example. 
2) Then I removed every column except for the actual column with the Communications table data and I also left the SMNumber, which now had become a Foreign Key (FK) to my base table PK.


3) Then expanded Communications table columns and gave a proper name to a new table within my data model, which now showed all the evidence of a potential One-to-Many relationship between the base and this new table:


After repeating steps 1-2-3 for the other sub-nodes on my XML sourced main table I was able then to create a stable data model to accommodate my future reporting visualizations and data discovery:


One single XML file with parent and children nodes had been transformed into multiple tables, where Power BI took care of empty sub-node elements (please see the null value from the 2nd step of my child node transformation). In addition to that, a referential integrity between tables was in place and I could start working on my visualizations.


I understand that XML data source for Power BI is just a very small member of the family of other supported data sources, and they keep adding new more complicated data resources to stream your data analytics from. However, it's just another proof of how many blades this Power BI Swiss knife can have :)

Happy data adventures!