Including reference data into a database deployment process or passing lookup tables to Production

(2022-Jun-20) As it takes a village to raise a child, it also takes a team to develop a good SQL Server database project. Previous efforts to maintain manual or custom SQL DDL scripts to create a database have fallen into oblivion. Git-based development practice and automation to deploy incremental database changes to target environments have become the norm. A development team now can easily segregate their efforts to deliver database solutions faster and more consistently using the SQL Server Database projects in Visual Studio - https://visualstudio.microsoft.com/vs/features/ssdt/.

Photo by Semiha Özkan: https://www.pexels.com/photo/monochrome-photo-of-people-on-a-library-11557276/

But what about actual data in your database tables, especially tables that hold reference or lookup lists of your solution. Could they become part of the automated deployment process along with the database schema object updates?

If it wasn't the case, there are several possible ways to bring data table changes to a target environment.

First, manually create DML SQL scripts for each table change and execute them during your next release deployment; this process can also be automated. Pros: table changes are scripted and a release manager has control and visibility of what goes to Production as the next release, Cons: creating those scripts is still a manual process and requested changes are unique to each data change use case, i.e. further updates for the same table will lead to multiple update scripts.

Second, an individual user with elevated or privileged access can run those DML SQL scripts to make data updates in a Production environment. Pros: no automation efforts are required. Cons: It’s hard to justify all the benefits of letting a human being make data changes in the Production environment, it’s still a very high risk for any mistakes to happen. I’ve worked in similar situations where a DBA was given a script to run; who knew how much trust that person had towards another person’s SQL writings that he was about to execute.

There might be some other ways to migrate or transfer “lookup” table data changes to a higher environment, like placing them in persistent file storage and having a regular ETL process to update target tables using the content of referenced files. Pros: easy process to update the files with the expected changes; Cons: efforts to develop and update an ELT pipeline that will loading data files are still required.

Is there another way that would let developers define both schema and data of lookup tables, include them all into the existing git-based continuous integration (CI) process and pass them towards the target deployment environment via continuous delivery (CD)? The main goal is to have full control of the data changes to make and not to freak out of the “rise of machines” with the implemented CI/CD workflow:- )

SQL Server project has just the thing, its Post-Deployment script (https://docs.microsoft.com/en-us/sql/ssdt/how-to-specify-predeployment-or-postdeployment-scripts?view=sql-server-ver16) that would allow defining a set of SQL commands to run at the very end of DACPAC-based database schema deployment.

Benefits of using Post-deployment scripts are:

SQL documents you make for post-deployment are included in the same SQL Server project, also saved into your git repository and become part of the same deployment DACPAC file.
Development team doesn’t access the target database deployment environment since they only update lookup table changes via git-based code.
It’s a best practice to automate the deployment of your SQL Server project code, thus removing the need to have a human manual interaction with the target database data.

Here is an example to make it real:

Let’s say I have a lookup table with breakfast options:

I’m not sure if people would line up in a breakfast bar for these :-)

This script will only add new records if they don’t exist in my target deployment environment, updates existing records if their git-based values are different from previously deployed changes and delete target table records if I decide to remove some of them from my script file.

-- [dbo].[lk_food_breakfast] TABLE

MERGE INTO [dbo].[lk_food_breakfast] as t
    USING (
        VALUES (1, 'apple')
              ,(2, 'banana')
              ,(3, 'cucumber')
    ) s (id, breakfast_item)
        ON t.id = s.id
WHEN MATCHED AND (
        HASHBYTES('SHA2_256', IsNull(t.breakfast_item,''))
        <> 
        HASHBYTES('SHA2_256', IsNull(s.breakfast_item,''))
    )
    THEN 
        UPDATE
            SET 
                breakfast_item = s.breakfast_item
WHEN NOT MATCHED BY TARGET
    THEN
        INSERT 
        (
            id, breakfast_item
        )
        VALUES 
        (
            id, breakfast_item
        )
WHEN NOT MATCHED BY SOURCE
    THEN 
        DELETE;

But what if I need to include several lookup tables for post-deployment, Microsoft doesn’t allow to have multiple post-deployment scripts only. This could be resolved by combining multiple MERGE statements into my post-deployment script. Or use this post-deployment script and reference additional SQL script files with MERGE statements in the required order.

Here is an example of configuring a MERGE SQL script for imaginary lunch options:

-- [dbo].[lk_food_lunch] TABLE

MERGE INTO [dbo].[lk_food_lunch] as t
    USING (
        VALUES (1, 'apple puree')
              ,(2, 'banana pie')
              ,(3, 'cucumber salad')
    ) s (id, lunch_item)
        ON t.id = s.id
WHEN MATCHED AND (
        HASHBYTES('SHA2_256', IsNull(t.lunch_item,''))
        <> 
        HASHBYTES('SHA2_256', IsNull(s.lunch_item,''))
    )
    THEN 
        UPDATE
            SET 
                lunch_item = s.lunch_item
WHEN NOT MATCHED BY TARGET
    THEN
        INSERT 
        (
            id, lunch_item
        )
        VALUES 
        (
            id, lunch_item
        )
WHEN NOT MATCHED BY SOURCE
    THEN 
        DELETE;

Here is my final solution that contains both individual look data table scripts along with the main post-deployment script that will execute them all.

This code-based approach to configure lookup tables’ data as post-deployment scripts of your SQL Server project (1) helps the development team to stay focused on their development tasks, (2) allows existing CI/CD pipelines to run this deployment process automatically and (3) help your DBA and Audit teams to have their piece of mind with the established access controls.

Data Adventures

Search This Blog

Including reference data into a database deployment process or passing lookup tables to Production

Comments

Post a Comment