Big Data, what a term! Some people instantly think of a huge, sometimes cumbersome volume of information, while others define it with three or four Vs (volume, velocity, variety, and variability). Debates and other social and technical events are organized in order to promote various technologies that work with big data sets.
One of these events (Big Data Hackathon in Mississauga on March 13-14, 2015) was organized by Microsoft Canada to call for data scientists, developers and simply data enthusiasts for a contest to select a specific data science problem and then solve it with the help of Microsoft Big Data and BI tools in 2 categories (Data Modelling and Data Visualization). And our team won in the Data Visualization category (https://learn.microsoft.com/en-ca/archive/blogs/cansql/big-data-hackathon-a-story-from-a-winning-team)!
I didn’t stay for a final demo, was a bit tired and left early Saturday afternoon after working all night on a PowerPivot model and Power Map tour.
Freedom through responsibility, this could be one of the terms that characterize the approach taken during this Big Data Hackathon. All the teams were offered a choice to select any public data available and utilize HDInsight & Power BI tool set for a winning data science scenario - https://learn.microsoft.com/en-ca/archive/blogs/cansql/microsoft-big-data-hackathon-resources-vancouver.
Our team decided to work with Toronto Parking Ticket information a combine it with the Toronto Green P Parking location data set to possibly prove a case that the number of parking tickets would be lower in areas of public parking locations:
http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=ca20256c54ea4310VgnVCM1000003dd60f89RCRD http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=b0993228eb22a310VgnVCM1000003dd60f89RCRD
Parking Tickets data files were moved to our Azure Blob storage, an HDInsight cluster in Azure was created and all the data files were organized into Hive tables to be further queried for our data insights.
Power Queries were built to bring both Hive tables’ data and JSON files with Parking locations, datasets were then moved to the PowerPivot model and used as a source for Power BI and Power Map visualization. During a preliminary comparison of parking ticket data with parking locations, we had discovered a few “hot” spots with high volume of parking infractions at the end of December 2014, mostly in downtown, and there was one such hot spot with a street address “34 Little Norway Cres” that was right across the Billy Bishop Airport.
I spent a few hours searching for any flight-related information available and finally found it at the
NAV Canada monthly statistics. It was just needed to build additional Power Queries, filter data for the Billy Bishop Airport only and combine this into one data set for a whole year:
i. http://www.statcan.gc.ca/pub/51-007-x/2013001/t001-eng.htm
ii. http://www.statcan.gc.ca/pub/51-007-x/2013002/t001-eng.htm
iii. http://www.statcan.gc.ca/pub/51-007-x/2013003/t001-eng.htm
iv. http://www.statcan.gc.ca/pub/51-007-x/2013004/t001-eng.htm
v. http://www.statcan.gc.ca/pub/51-007-x/2013005/t001-eng.htm
vi. http://www.statcan.gc.ca/pub/51-007-x/2013006/t001-eng.htm
vii. http://www.statcan.gc.ca/pub/51-007-x/2013007/t001-eng.htm
viii. http://www.statcan.gc.ca/pub/51-007-x/2013008/t001-eng.htm
ix. http://www.statcan.gc.ca/pub/51-007-x/2013009/t001-eng.htm
x. http://www.statcan.gc.ca/pub/51-007-x/2013010/t001-eng.htm
xi. http://www.statcan.gc.ca/pub/51-007-x/2013011/t001-eng.htm
xii. http://www.statcan.gc.ca/pub/51-007-x/2013012/t001-eng.htm
The final PowerMap tour video showed all the steps of data analysis and getting into attempt to find a correlation between parking infractions and air flights. To my big disappointment, Parking infractions within the area of the airport were not that much correlated at all with the number of flights, or perhaps there might be some data quality issues; but this whole data journey was worth trying!
A PowerMap tour video was also posted to a YouTube channel:
https://www.youtube.com/watch?v=3UMtaYoYHiY
My key learnings from this Big Data Hackathon were:
- Amazing volume of publicly available data sets that could be used for helpful and meaningful insights; need to work with HDInsight technology more
- Power Query is an awesome tool to make X steps forward in your data set modification and Y steps backwards like nothing ever happened; and the set of Power Query functions is a whole new world to explore.
- Power Map, being powered by Bing Maps, allowed to geo-locate data points not only using altitude, longitude or street address information; I liked how the “other” category worked as well by specifying a particular geo location using its name. That’s how I located “Billy Bishop Airport” just by this phrase in question marks.
I’ve really enjoyed this Microsoft Big Data Hackathon event, great time to learn and practice HDInsight Azure and Power BI technologies; additionally, we won a prize, and you don’t get to spend every night at the Microsoft office working with those technologies :-)
One of these events (Big Data Hackathon in Mississauga on March 13-14, 2015) was organized by Microsoft Canada to call for data scientists, developers and simply data enthusiasts for a contest to select a specific data science problem and then solve it with the help of Microsoft Big Data and BI tools in 2 categories (Data Modelling and Data Visualization). And our team won in the Data Visualization category (https://learn.microsoft.com/en-ca/archive/blogs/cansql/big-data-hackathon-a-story-from-a-winning-team)!
I didn’t stay for a final demo, was a bit tired and left early Saturday afternoon after working all night on a PowerPivot model and Power Map tour.
Freedom through responsibility, this could be one of the terms that characterize the approach taken during this Big Data Hackathon. All the teams were offered a choice to select any public data available and utilize HDInsight & Power BI tool set for a winning data science scenario - https://learn.microsoft.com/en-ca/archive/blogs/cansql/microsoft-big-data-hackathon-resources-vancouver.
Our team decided to work with Toronto Parking Ticket information a combine it with the Toronto Green P Parking location data set to possibly prove a case that the number of parking tickets would be lower in areas of public parking locations:
http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=ca20256c54ea4310VgnVCM1000003dd60f89RCRD http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=b0993228eb22a310VgnVCM1000003dd60f89RCRD
Parking Tickets data files were moved to our Azure Blob storage, an HDInsight cluster in Azure was created and all the data files were organized into Hive tables to be further queried for our data insights.
Power Queries were built to bring both Hive tables’ data and JSON files with Parking locations, datasets were then moved to the PowerPivot model and used as a source for Power BI and Power Map visualization. During a preliminary comparison of parking ticket data with parking locations, we had discovered a few “hot” spots with high volume of parking infractions at the end of December 2014, mostly in downtown, and there was one such hot spot with a street address “34 Little Norway Cres” that was right across the Billy Bishop Airport.
I spent a few hours searching for any flight-related information available and finally found it at the
NAV Canada monthly statistics. It was just needed to build additional Power Queries, filter data for the Billy Bishop Airport only and combine this into one data set for a whole year:
i. http://www.statcan.gc.ca/pub/51-007-x/2013001/t001-eng.htm
ii. http://www.statcan.gc.ca/pub/51-007-x/2013002/t001-eng.htm
iii. http://www.statcan.gc.ca/pub/51-007-x/2013003/t001-eng.htm
iv. http://www.statcan.gc.ca/pub/51-007-x/2013004/t001-eng.htm
v. http://www.statcan.gc.ca/pub/51-007-x/2013005/t001-eng.htm
vi. http://www.statcan.gc.ca/pub/51-007-x/2013006/t001-eng.htm
vii. http://www.statcan.gc.ca/pub/51-007-x/2013007/t001-eng.htm
viii. http://www.statcan.gc.ca/pub/51-007-x/2013008/t001-eng.htm
ix. http://www.statcan.gc.ca/pub/51-007-x/2013009/t001-eng.htm
x. http://www.statcan.gc.ca/pub/51-007-x/2013010/t001-eng.htm
xi. http://www.statcan.gc.ca/pub/51-007-x/2013011/t001-eng.htm
xii. http://www.statcan.gc.ca/pub/51-007-x/2013012/t001-eng.htm
The final PowerMap tour video showed all the steps of data analysis and getting into attempt to find a correlation between parking infractions and air flights. To my big disappointment, Parking infractions within the area of the airport were not that much correlated at all with the number of flights, or perhaps there might be some data quality issues; but this whole data journey was worth trying!
A PowerMap tour video was also posted to a YouTube channel:
https://www.youtube.com/watch?v=3UMtaYoYHiY
My key learnings from this Big Data Hackathon were:
- Amazing volume of publicly available data sets that could be used for helpful and meaningful insights; need to work with HDInsight technology more
- Power Query is an awesome tool to make X steps forward in your data set modification and Y steps backwards like nothing ever happened; and the set of Power Query functions is a whole new world to explore.
- Power Map, being powered by Bing Maps, allowed to geo-locate data points not only using altitude, longitude or street address information; I liked how the “other” category worked as well by specifying a particular geo location using its name. That’s how I located “Billy Bishop Airport” just by this phrase in question marks.
I’ve really enjoyed this Microsoft Big Data Hackathon event, great time to learn and practice HDInsight Azure and Power BI technologies; additionally, we won a prize, and you don’t get to spend every night at the Microsoft office working with those technologies :-)


Comments
Post a Comment