Search This Blog

Friday, March 31, 2017

SSIS Web Service Task: misleading role of WSDL file

[2017-Mar-31] You can call one of your web service methods in SSIS package using a .NET script task or you can just use an existing Web Service task. The latter option is easier, UI based, doesn't take too much time to set up and Input/Output parameters are very intuitive to navigate.

There is one gotcha with the use of WSDL file though that is not fully documented but is worth to be mentioned. When you initially setup a Web Service task in SSIS, you need to provide a web server URL for your Http connection (let's just take this for example: http://www.webservicex.net/Statistics.asmx?WSDL), then you need to locate a local WSDL web service reference file with all available methods within. Then you test your ETL packages, it works well and all is merry and bright. However if you think that by changing the Http connection for your web service (let's say from Dev to Prod) you will be calling a Production version of your web service method then, actually it's not the case. It still going to be your Dev environment. I've learned this the hard way at my recent BI project, testing a web service call in UAT environment but for unknown reasons I was still getting results from the original Dev environment.

<wsdl:service name="Statistics">
     <wsdl:port name="StatisticsSoap" binding="tns:StatisticsSoap">
         <soap:address location="http://www.webservicex.net/Statistics.asmx"/>
     </wsdl:port>
</wsdl:service>

Apparently an SSIS Web Service task uses WSDL file defined connection to your web server stored in the soap:address tag regardless of what your Http connection would specify your web server location is.

My way to resolve this problem was to create a separate WSDL file for my UAT environment web service and make it a part of SSIS environment configuration, i.e. each different environment Htpp web service connection would be coupled with a corresponding WSDL file in order to prevent data result confusion during SSIS control flow execution, and don't forget to make those two parameters configurable.

Please let me know how you had to resolve a similar issue with the use of SSIS Web Service task after you deploy your ETL package to a different environment.




Tuesday, February 21, 2017

External SQL files vs. Stored procedures in SSIS development

[2017-Feb-21] I'm creating this particular blog post more for myself to remember and use it as a reference to related discussions in future.

One of my client's existent SSIS solution contains a use of multiple SQL execute tasks which are configured to use external files with a complicated SQL code. Such solution architecture design is not very sustainable and my recommendation was to replace those external SQL code files with a set of stored procedures on a SQL Server side.

This client further questions to outline benefits of using stored procedures vs. external files, along with preferable way to use hard coded SQL code vs. stored procedure calls in SSIS packages has led me to write this.

Stability and Security issues:
- By keeping data extract logic in external SQL files will make SSIS data flows tasks more vulnerable to a risk of undesirable SQL scripts change or even worse to a risk of deletion those files which may fail the whole data flow process. And by depositing the same SQL extract logic into stored procedures you now rely on a very stable SQL Server engine to execute your expected set based data mechanism.
- Security wise, externals SQL files requires an extra safekeeping methodology to prevent those files from harmful changes: folder with those scripts will have to be properly secured and additional security policy will have to be defined and maintained. With SQL logic in SQL Server stored procedure you don’t need to rely on something external to protect your data flow task: an existent application service account that already has read/write access to corresponding databases in SQL server will only need to be granted with execute permission for new SQL Server procedures.

Basically, porting SQL code from external files into stored procedures on a SQL Server will prevent your data driven application from a so called "SQL injection" security threat. When both your data and complex data load logic lives in a SQL Server environment, you feel safe and secure!

Recommendations for using SQL code in SSIS:
- There is no right or wrong of using a hard coded SQL code within a SSIS package, in some case it’s valid approach to run some simple SQL commands against a database engine. However when a logic of those SQL commands becomes complicated, it’s recommended to wrap this code into SQL Server procedures and then just execute them within your SSIS control/data flows.
- SQL Server procedure becomes then a better option from a development prospective with a rich set of tools to design and test your data extract code.  Plus, with any further changes you won’t need to re-deploy your SSIS packages and only a code of your stored procedure will have to be updated. 

So using stored procedures for SSIS development in most cases it’s a win-win solution for a developer: it saves development and deployment time for your SSIS solution!


Sunday, January 29, 2017

How to compare two SSAS databases with a custom MDX query

[2017-Jan-29] It came as a request from one of my customers where I had worked as a consultant to quickly compare two similar SSAS databases located in different environments. So I quickly ran an MDX query with some key metrics against each of the SSAS databases and presented the results to the customer. Further along the way, they asked me if I could make it as configurable tool with options to change SSAS servers, database names and possibly adjust some main filtering criteria.

I thought that it would be possible and decided to created an Excel workbook with two OLEDB data connections to SSAS databases with customer MDX queries and additional configurable parameters to adjust those data connections.

There is a very good article that explains how to create a custom MDX query based on a data connection in Excel (Excel – Using a Custom MDX Query in Excel Table) by Gilbert Quevauvilliers, which is not very intuitive. With that thing covered I also needed to get a full list of my cube metrics to build an MDX query (i.e. manually retyping them wasn't the case). SSAS dynamic management views could be very handy for this; with a lot information about SSAS DMVs available I'd recommend this article (SSAS: Using DMV Queries to get Cube Metadata by @BennyAustin) with some basic metadata extract examples already prepared.

In order to show you this work-case I've used the [Adventure Works Internet Sales Tabular Sample Database] and deployed it to my testing SQL Server SSAS 2016 environment. You can adjust your test case and can use your SSAS database either in multidimensional or tabular modes, because MDX data extract logic could be used in both cases.



Then I created this Excel file (SSAS Compare.xlsm) with all dynamic SSAS data connections and MDX queries to support my data compare results, make sure to enable Macro settings in your Excel.



Basically, it's all being done by the use of two SSAS OLEDB connections within this Excel file where I change Connection Strings to SSAS and Command Text for MDX query.




so then it was very easy to create a macro within this Excel workbook and let the configuration magic works:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
    ActiveWorkbook.Sheets("MDX Query").Select
    strMDX_Query = Range("A1").Value
            
    ActiveWorkbook.Sheets("Compare").Select
    
    '' Period settings
    strMDX_Period_1 = Range("B7").Value
    strMDX_Period_2 = Range("E7").Value
        
    '' Server settings
    strSSAS_Server_1 = Range("B5").Value
    strSSAS_Database_1 = Range("B6").Value
    strSSAS_Server_2 = Range("E5").Value
    strSSAS_Database_2 = Range("E6").Value
    
    With ActiveWorkbook.Connections("SSAS_Environment_1").OLEDBConnection
        .Connection = "OLEDB;Provider=MSOLAP.5;Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=" + strSSAS_Database_1 + ";Data Source=" + strSSAS_Server_1 + ";MDX Compatibility=1;Safety Options=2;MDX Missing Member Mode=Error"
        .CommandText = Replace(strMDX_Query, strMDX_Filter, strMDX_Period_1)
    End With
    
    With ActiveWorkbook.Connections("SSAS_Environment_2").OLEDBConnection
        .Connection = "OLEDB;Provider=MSOLAP.5;Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=" + strSSAS_Database_2 + ";Data Source=" + strSSAS_Server_2 + ";MDX Compatibility=1;Safety Options=2;MDX Missing Member Mode=Error"
        .CommandText = Replace(strMDX_Query, strMDX_Filter, strMDX_Period_2)
    End With
    
    ActiveWorkbook.Connections("SSAS_Environment_1").Refresh
    ActiveWorkbook.Connections("SSAS_Environment_2").Refresh
        
    MsgBox "Compare has been finished successfully.", vbInformation, "SSAS Measure Compare"

As a result I was able to dynamically configure SSAS servers and databases; also, I had my MDX query that I used to test for both SSAS data connection and I also added a filtering criteria to enhance this scenario. Feel free to change this file (SSAS Compare.xlsm) any way you want to in order to accommodate your own compare test case.

And have a happy data adventure!

Friday, November 4, 2016

Open Data is like a soup

[2016-Nov-04] Open data is like a soup, some people like it hot (or made with all the latest updates); some people like it spicy (probably experimenting with various data representation forms). And only on some special occasions you'd prefer a c-old soup over a hot one (where old data analysis is driven by a search for some historical patterns or insights).

I'm not new working with data, and in a corporate world most of the datasets are highly structured and regulated. With open data... it's bit different, less regulatory norms and sometimes more difficult to work with this data (it's open, whom do you complain to about it).

I've seen cases when a data custodian made datasets available for public with only old data (like very old, 10 years and more), at the same time presenting their own visualization using data which might not have existed moments ago. In other cases, open data was made available in a form of Excel files which you could open and read but to aggregate one geo region for several years it would take almost a hundred of those files, plus their data format was not a very structured and numerous data massage moves were necessary. Good examples of an open data management also exist where data owners responsibly control availability and updateability of their data.

Going back to the soup analogy, this whole idea working with data and especially with open data reminded me the ending scene of the classic movie "Coming to America" with Eddi Murhpy.




In case if this YouTube video gets removed, here is the text of this scene:
"Wait a minute, wait a minute, wait stop right there! Listen: Stop right there, man. A man goes into a restaurant. You listenin'? A man goes into a restaurant, and he sits down, he's having a bowl of soup and he says to the waiter, waiter come taste the soup. Waiter says: Is something wrong with the soup? He says: Taste the soup. He says: Is there something wrong with the soup? Is the soup too hot? He says: Will you taste the soup? What's wrong, is the soup too cold? Will you just taste the soup?! Allright, I'll taste the soup - where's the spoon?? Aha. Aha! ..."

Last week at the Open Data Toronto meetup (https://www.meetup.com/opentoronto/events/234322841/) there was an opportunity to review various open data sets and data visualizations built on them over the course of the last 1.5 years. Different vendor tools along with various visualization techniques were used to present data insights: from simple tables or charts to morphing maps and word clouds.

And then I realized that when an open data inquiry and a right and appropriate data exploration tool intersect with each other, only then you can get this Aha moment; no more explanations are needed and a data visualization speaks for itself. This really motivates me to explore other open datasets, try different tools and visualization techniques and read books about effective data analysis.

Otherwise my data soup may get cold without a right data spoon :-)

Happy data adventures!

Tuesday, October 18, 2016

Power BI with Narrative Science: Look Who's Talking (Part 2: Predicting the Future)

[2016-Oct-18] I read a book one time about the importance of using visual aids in teaching. They also used a well known illustration that a human optic nerve is thicker than the auditory one; thus, resulting to more information being consumed by brain with seeing things rather than hearing or listening to them. I still believe in this, that a picture is worth a thousand words.

However now we tend to put more visual elements to our business reports and then we spend more time trying to understand a story that those elements may tell us, rather than having a direct conversation or explanation.


In one of my previous blog posts I had already talked about the Narrative Science company and their product to enhance visual business reports with textual narratives (Power BI with Narrative Science: Look Who's Talking?). Basically the Narrative Science component takes numeric report measures and analytically generates a story that business users can read.


Recently they have released an update for the Power BI Narrative Science component (Narratives for Power BI 1.2 Release Notes) and added a few more features to enable more complex narratives customization and data analytics. Here is a brief list of those new features:
  1. Story Inclusion Thresholds
  2. Custom Formatting
  3. Predictive Analytics
  4. Additional Measure Relationships

And Predictive Analytics excited me the most, yes, I wanted to know the future! So, I needed to test this new Narrative Science functionality with an idea in mind that this could enhance my customers' Power BI reports.







There are a few prerequisites for your dataset though to start working with the Trend Analytics functionality of the Narrative Science:

- it has to be a time series of metric/metrics being measured

- it also requires at least 30 time intervals in order to start populating textual predictions if possible.



I took an open dataset that tracks water main breaks within the City of Toronto (Watermain Breaks) for the last 26 years (1990-2015). With the following data columns:
- BREAK_DATE - Date of watermain break reported
- BREAK_YEAR - Year of watermain break reported
- XCOORD - Easting in MTM NAD 27(3 degree) Projection & Lat and Long
- YCOORD - Northing in MTMNAD 27(3 degree) Projection & Lat and Long

Then I quickly aggregated this data and placed it into a column chart:


With the Narrative Science component and my (Break Date, Break Count) dataset I was able to receive a very decent textual story of the 26 years of history of the water breaks:


However along the way I receive the following note about existing limitation of working with my dataset, which I believe will be fixed in future releases:



The other thing that I couldn't solve with my dataset was the actual prediction of future results. No matter how hard I tried to either change data aggregation level or filtering options, I kept getting this message along with more helpful narratives, "A prediction could not be made because [metric name] did not have a good linear trend".

I assume that for the Trend Analytics a linear extrapolation algorithm is used and it's very peculiar to the historical data. However I do believe that this Predictive Analytics functionality of Narrative Science would be enhanced in future too, because for me seeing that a number of breaks is greater in the winter months because of low temperatures is a good trend :-).

In overall I'm very happy to see all these new features being introduced in the recent update release of the Narrative Science for the Power BI. And I will be recommending my customers to start using it more in their reports.


Because you don't say a word, it's the Narrative Science who does :-)



Happy data adventures!

Friday, September 30, 2016

Historical Reports vs. Predictive analytics: Rear view mirror look vs. Forward-looking view. Really?

[2016-Sep-30] Working with reports based on historical data is like driving a car while looking in the rear view mirror. I heard this term again when I went to the Microsoft Data Science Summit in Atlanta (https://ignite.microsoft.com/microsoftdatascience) this past week. Which gave me a second thought to consider: how do we define a historical fact and what sets a current time frame from anything else (i.e. history).

Then I read good article "Myth: Managing by historical data is like driving while looking in the rear view mirror" by Allan Wille and realized that I'm not alone in this camp of doubters in the real meaning of the ineffective rear view mirror driving vs. better front driving for business metrics analytics.

In this article Allan outlined three types of a business performance analytics:
1) Current performance (real-time or near-real-time)
2) Timeframe-appropriate historical performance data
3) Predictive or conditional alerts (based on goals, history, or environmental factors).

And I would go even further, I believe all reporting that we currently poses with our modern existent technology is based on historical data. Stream analytics is based on some near-real-time events but they still happen a few moments ago, they're already history. Predictive analytics which aims to predict future events are based on analytical models which usually are trained and tested using historical data sets. Even in cognitive analysis we will have to use a large data repository that vendors may have collected in order to work with image or speech recognition technologies.

My middle school teacher of physics had planted this seed of doubt a while ago when she confronted our class with an idea that a car speedometer never showed a real-time vehicle speed. Because there is always a latency between different components of this mechanism. And speed measurement, that I can observe at the moment, may simply be a real representation of the speed fact a few moments ago.

So, if I want to become a data scientist, I will start defining what I mean by a historical fact, at least to myself. Otherwise I may assume a car in my rear-view mirror as history while it overtakes me and starts showing its rear lights in my front view instead.




Wednesday, August 31, 2016

Power BI with Narrative Science: Look Who's Talking?

[2016-Aug-31] Someone said a picture is worth a thousand words and this is true that in most cases a visual representation of a fact is absorbed better compared with a verbal or textual version of it.  However sometimes just looking at a “picture” or analytical report it also would be beneficial to have some narratives along the way. 

It’s just like when I take my family to an art gallery to observe various paintings then it helps to look at some exhibit notes on the museum walls or tiny descriptions beneath an artist painting; and then it comes, that “Aha” moment, after which I can tell to myself, that’s what he (or she) meant by drawing that piece or art.

In a business world of analytical reports, we understand this very well; we don’t just design and create reports filled with all sorts of visual elements, we provide report titles, metric descriptions and sometime we may add a text box with words that describes what users can see and interact with.

Working for more then 5 years in a marketing department of one company in the past I’ve seen many PowerPoint slides with graphs and tables where additional value was placed on what marketing managers could derive from visual elements of their slides and put into verbal sentiments besides them. And I know that sometimes it had been a very time-consuming task; I wish they had some help them during the time preparing those slides.

There is one company that actually does it, yes, you prepare your data set then create your report visualization and after adding its external component it begins textually telling stories about your data and users can see that live and interactive, isn’t that amazing!

This company is Narrative Science (https://www.narrativescience.com/) and it recently released a report component for the Power BI that does exactly this (https://powerbi.microsoft.com/en-us/blog/get-natural-language-narratives-in-power-bi-reports/).



This component only requires a data set with numeric measures and corresponding textual dimension attributes; after this its internal engine creates textual narratives in a form of a free text of set of bullet points. And the great thing is, if you start interact with your report visuals (filters, slicers, parameters, etc.), the Narrative Science component will reflect those changes and update its narratives in real time.



At the recent Toronto Open Data book club we were discussing a Learning Opportunities Index that is calculated for all the elementary and secondary schools of the Toronto District School Board. Here is their data set for 2014 that I had used to create this Power BI (http://www.tdsb.on.ca/research/Research/LearningOpportunitiesIndex.aspx). The higher this LOI score is for a particular school the more this school is needed in additional support to provide better learning opportunities for its students.

If you notice at the top right corner of the report there is a text box with a footnote “Powered by Narrative Science”, what you see inside is a product of this component, and I didn’t write any of those words.

Here is a slightly changed version of this text box after a clicked a bar that corresponded to the Ward 11, which holds the lowest LOI score value (which is very value good by the way).



and here is a data set that I had used for those narratives:






















Where only School name and associated LOI Score were used:











After this I just had to set visual attributes for this Narrative Science component. Where Type could have four different values (Discrete, Continuous, Percent of Whole, Scatterplot), Structure could take a form of Paragraph or a Bullet list, and Verbosity could be High, Medium and Low.












And the final step would be to adjust the Language settings (all other settings I left unchanged):















Now all numeric measures are properly displayed on my bar chart, report map component correctly displays school locations and Narrative Science component explains and talks about this all at the same time. I strongly believe that there is great opportunity to use this component in various analytical scenarios and user will appreciate this very much, because it does help when someone tells you a story behind a data that you try to visually analyze.

And now you don't have to guess who's talking, it's Narrative Science! :-)

Happy data adventures!