Search This Blog

Friday, April 10, 2015

Big Data Hackathon in Microsoft Canada: Power Map will find your way!

Big Data, what a term! Some people instantly think of a huge sometimes cumbersome volume of information, others define it with three or four Vs (volume, velocity, variety, and variability). Debates, other social and technical events are organized in order to promote various technologies that work with big data sets.

One of these events (Big Data Hackathon in Mississauga on March 13-14, 2015) was organized by Microsoft Canada to call for  data scientists, developers and simply data enthusiasts for a contest to select a specific data science problem and then solve it with the help of Microsoft Big Data and BI tools in 2 categories (Data Modeling and Data Visualization). And our team won in Data Visualization category (http://blogs.technet.com/b/cansql/archive/2015/01/28/big-data-hackathon-in-mississauga-bdh.aspx)! 
I didn’t stay for a final demo, was a bit tired and left early Saturday afternoon after working all night on a PowerPivot model and Power Map tour.

Freedom through responsibility, this could be one of the terms that characterize approach taken during this Big Data Hackathon. All the teams were offered with a choice to select any public data available (http://blogs.technet.com/b/cansql/archive/2015/03/12/microsoft-big-data-hackathon-resources.aspx) and utilize HDInsight & Power BI tool set for a winning data science scenario.

Out team decided to work with Toronto Parking Ticket information a combine it with the Toronto Green P Parking location data set to possible prove a case that number of parking tickets would be lower in areas of public parking locations:
http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=ca20256c54ea4310VgnVCM1000003dd60f89RCRD http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=b0993228eb22a310VgnVCM1000003dd60f89RCRD 

Parking Tickets data files were moved to our Azure Blob storage, HDInsight cluster in Azure was created and all the data files were organized into Hive tables to be further queried for our data insights.
Power Queries were built to bring both Hive table’s data and JSON files with Parking locations, datasets were then moved to PowerPivot model and uses as a source for PowerBI and PowerMap visualization. During a preliminary comparison of parking tickets data with parking locations we had discovered a few “hot” spots with high volume of parking infractions at the of December of 2014, mostly in downtown, and there was one such hot spot with a street address “34 Little Norway Cres” that was right across the Billy Bishop Airport.  



I spent a few hours searching any flight related information available and finally found them at the
NAV Canada monthly statistics. It was just needed to build additional Power Queries, filter data for the Billy Bishop Airport only and combined this into one data set for a whole year:
i. http://www.statcan.gc.ca/pub/51-007-x/2013001/t001-eng.htm
ii. http://www.statcan.gc.ca/pub/51-007-x/2013002/t001-eng.htm
iii. http://www.statcan.gc.ca/pub/51-007-x/2013003/t001-eng.htm
iv. http://www.statcan.gc.ca/pub/51-007-x/2013004/t001-eng.htm
v. http://www.statcan.gc.ca/pub/51-007-x/2013005/t001-eng.htm
vi. http://www.statcan.gc.ca/pub/51-007-x/2013006/t001-eng.htm
vii. http://www.statcan.gc.ca/pub/51-007-x/2013007/t001-eng.htm
viii. http://www.statcan.gc.ca/pub/51-007-x/2013008/t001-eng.htm
ix. http://www.statcan.gc.ca/pub/51-007-x/2013009/t001-eng.htm
x. http://www.statcan.gc.ca/pub/51-007-x/2013010/t001-eng.htm
xi. http://www.statcan.gc.ca/pub/51-007-x/2013011/t001-eng.htm
xii. http://www.statcan.gc.ca/pub/51-007-x/2013012/t001-eng.htm

Final PowerMap tour video showed all the steps of data analysis and getting into attempt to find a correlation between parking infractions and air flights. To my big disappointment Parking infractions within area of the airport were not that much correlated at all with the number of flights, or perhaps there might be some data quality issues; but this whole data journey was worth to try!




A PowerMap tour video was also posted to a YouTube channel:

https://www.youtube.com/watch?v=3UMtaYoYHiY

My key learnings from this Big Data Hackathon were:
- Amazing volume of publicly available data sets that could be used for helpful and meaningful insights; need to work with HDInsight technology more
- Power Query is an awesome tool to make X steps forward in your data set modification and Y steps backwards like nothing ever happened; and set of Power Query functions is a whole new world to explore.
- Power Map being powered by Bing Maps allowed to geo locate data points not only using altitude, longitude or street address information; I liked how “other” category worked as well by specifying a particular geo location using its name. That’s how I located “Billy Bishop Airport” just by this phrase in questions marks.


I’ve really enjoyed this Microsoft Big Data Hackathon event, great time to learn and practice HDInsight Azure and Power BI technologies; additionally we won a prize and you don’t get to spend every night at Microsoft office working with those technologies :-)








No comments:

Post a Comment