External SQL files vs. Stored procedures in SSIS development

[2017-Feb-21] I'm creating this particular blog post more for myself to remember and use it as a reference to related discussions in future.

One of my client's existent SSIS solution contains a use of multiple SQL execute tasks which are configured to use external files with a complicated SQL code. Such solution architecture design is not very sustainable and my recommendation was to replace those external SQL code files with a set of stored procedures on a SQL Server side.

This client further questions to outline benefits of using stored procedures vs. external files, along with preferable way to use hard coded SQL code vs. stored procedure calls in SSIS packages has led me to write this.

Stability and Security issues:
- By keeping data extract logic in external SQL files will make SSIS data flows tasks more vulnerable to a risk of undesirable SQL scripts change or even worse to a risk of deletion those files which may fail the whole data flow process. And by depositing the same SQL extract logic into stored procedures you now rely on a very stable SQL Server engine to execute your expected set based data mechanism.
- Security wise, externals SQL files requires an extra safekeeping methodology to prevent those files from harmful changes: folder with those scripts will have to be properly secured and additional security policy will have to be defined and maintained. With SQL logic in SQL Server stored procedure you don’t need to rely on something external to protect your data flow task: an existent application service account that already has read/write access to corresponding databases in SQL server will only need to be granted with execute permission for new SQL Server procedures.

Basically, porting SQL code from external files into stored procedures on a SQL Server will prevent your data driven application from a so called "SQL injection" security threat. When both your data and complex data load logic lives in a SQL Server environment, you feel safe and secure!

Recommendations for using SQL code in SSIS:
- There is no right or wrong of using a hard coded SQL code within a SSIS package, in some case it’s valid approach to run some simple SQL commands against a database engine. However when a logic of those SQL commands becomes complicated, it’s recommended to wrap this code into SQL Server procedures and then just execute them within your SSIS control/data flows.
- SQL Server procedure becomes then a better option from a development prospective with a rich set of tools to design and test your data extract code.  Plus, with any further changes you won’t need to re-deploy your SSIS packages and only a code of your stored procedure will have to be updated. 

So using stored procedures for SSIS development in most cases it’s a win-win solution for a developer: it saves development and deployment time for your SSIS solution!


Comments

  1. I'll argue that unless youre using BIML to generate tour SSIS packages, that the inevitable 'search for every piece of code that hits table X' will be easier to do with all that code in stored procs and functions.
    Finding things like that in plain Dtsx files sucks. While grepping through text file Sql scripts isnt that bad, now you have to find all the dependencies using those files, too. It's just easier doing this if the code is all in the database, especially with all the tools that help with this, and SSIS just runs the procs (or powershell script, etc).
    security, speed... those kind of go out the window too.

    Bossman should listen to those who've already gone through that phase...

    ReplyDelete
    Replies
    1. At leaat Bossman isnt insisting on doing all the data flows in command script tasks "because c# is just easier"

      Delete
    2. At leaat Bossman isnt insisting on doing all the data flows in command script tasks "because c# is just easier"

      Delete
  2. Thanks Corey for your comments, the use of the SQL stored codebase logic on a server was my recommendation in this post as well.

    ReplyDelete
  3. I always create an interfacing database on the remote server if it contains the source data, and wrap the extracting logic in store procedures. It's very efficient to let remote server work out initial transformation logic that might require full set of source data. Sometimes, you may need to save interim tables on the remote server and only return small aggregated data set. To be more efficient even on DW server, I also use SQL join statements in stored procedure instead of lookup components in SSIS. MERGE statement instead of SCD component.

    ReplyDelete

Post a Comment