System Variables in Azure Data Factory: Your Everyday Toolbox

Posted by Rayis Imayev on November 20, 2018

(2018-Nov-20) After working and testing the functionality of variables within Azure Data Factory pipelines, I realized that it's worth to explore existing system variables. That's basically could be my toolbox to collect and store control flow metrics of my pipelines.

Looking at the official Microsoft resource System variables supported by Azure Data Factory you're given with a modest selection of system variables that you can analyze and use both on a pipeline and pipeline trigger level. Currently, you have three ways to monitor Azure Data Factory: visually, with the help of Azure Monitor or using a code to retrieve those metrics.

But here is a case of how I want to monitor a control flow of my pipeline in Azure Data Factory:

This the same data ingestion pipeline from my previous blog post - Story of combining things together that builds a list of files from a Blob storage and then data from those files are copied to a SQL database in Azure. My intention is to collect and store event information of all the completed tasks, such as Get Metadata and Copy Data.

Here is a current list of pipeline system variable in my disposal:
@pipeline().DataFactory - Name of the data factory the pipeline run is running within
@pipeline().Pipeline - Name of the pipeline
@pipeline().RunId - ID of the specific pipeline run
@pipeline().TriggerType - Type of the trigger that invoked the pipeline (Manual, Scheduler)
@pipeline().TriggerId - ID of the trigger that invokes the pipeline
@pipeline().TriggerName - Name of the trigger that invokes the pipeline
@pipeline().TriggerTime - Time when the trigger that invoked the pipeline. The trigger time is the actual fired time, not the scheduled time.

And after digging a bit more and testing pipeline activity, I've discovered additional metrics that I can retrieve on the level of each individual task:
PipelineName,
JobId,
ActivityRunId,
Status,
StatusCode,
Output,
Error,
ExecutionStartTime,
ExecutionEndTime,
ExecutionDetails,
Duration

Here is my final pipeline in ADF that can populate all these metrics into my custom logging database table:

And this is how I made it work:

1) First I created dbo.adf_pipeline_log table in my SQL database in Azure:

2) Then I used [Append Variable] Activity task as "On Completion" outcome from the "Get Metadata" activity with the following expression to populate a new array type var_logging variable:

var_logging = 
@concat('Metadata Store 01|Copy|',
,pipeline().DataFactory,'|'
,activity('Metadata Store 01').Duration,'|'
,activity('Metadata Store 01').Error,'|'
,activity('Metadata Store 01').ExecutionDetails,'|'
,activity('Metadata Store 01').ExecutionEndTime,'|'
,activity('Metadata Store 01').ExecutionStartTime,'|'
,activity('Metadata Store 01').JobId,'|'
,activity('Metadata Store 01').Output,'|'
,pipeline().Pipeline,'|'
,activity('Metadata Store 01').ActivityRunId,'|'
,activity('Metadata Store 01').Status,'|'
,activity('Metadata Store 01').StatusCode)

where each of the system variables is concatenated and separated with pipe character "|".

I did a similar thing to populate the very same var_logging variable in the ForEach container where actual data copy operation occurs:

3) And then I used this final tasks to populate my dbo.adf_pipeline_log table using data from the var_logging variable by calling a stored procedure:

Where the whole trick is to split each of the text lines of the var_logging variable into another array of values split by "|" characters. Then by knowing the position of each individual system variables values, I can set them to their appropriate stored procedure parameters / columns in my logging table (e.g. @split(item(),'|')[0] for the ActivityTask).

This provided me a flexibility to see both Completed and Failed activity runs (to test a failed activity I had to temporarily rename the target table of my Copy Data task). I can now read this data and get more additional insights from the SQL Server table.

Let me know what you think about this, and have a happy data adventure!

Comments

AnonymousNovember 27, 2018 at 9:35 PM
Thank you! Very helpful blog with a more complete list of data factory pipeline system variables than I've seen anywhere else. Am working on building a similar logging approach for our data pipelines. This is going to save me time!
ReplyDelete
Replies
MikeJune 14, 2019 at 2:06 PM
This is awesome! Just what I am needing to do; can you please upload your Table script and stored proc? I would save me a ton of time...thanks! MIke
ReplyDelete
Replies
AnonymousJanuary 6, 2020 at 8:44 AM
what data are you taking from metadata activity
ReplyDelete
Replies
AnujFebruary 23, 2020 at 1:52 AM
In storeproc paramters @split(item(),'|')[0] from where item() comes??
ReplyDelete
Replies
JAYMay 28, 2020 at 4:07 PM
This is awesome.One doubt which is not related to the above mentioned process.How can we avoid certain values in a json output from get metadata task before sending it to foreach loop
ReplyDelete
Replies
SupriyaJune 16, 2020 at 4:16 AM
Hello, I am trying to create an email alert using Web Activity and below is the code. While I would like to make the activity name i.e.,Copy_cg4 dynamic in the below code. how do I get the activity name dynamically?

{"DataFactoryName":"@{pipeline().DataFactory}","PipelineName":"@{pipeline().Pipeline}","Subject":"@{activity('Copy_cg4').Status}","ErrorMessage":"The ADF pipeline has @{activity('Copy_cg4').Status}","EmailTo":"supriya.d@te.com"}
ReplyDelete
Replies
Arun SankarJune 17, 2020 at 11:22 AM
Hi Rayis,

When I try to use @pipeline().Pipeline as a parameter to a notebook, I see random numbers as a suffix.. Is there a way to pass the pipeline name alone.
ReplyDelete
Replies
FranJuly 29, 2020 at 8:06 PM
hi Ravis, do you have some idea of how get the status from DataBricks executions?
ReplyDelete
Replies
devendraAugust 19, 2020 at 2:38 AM
Hi Rayis, how can i pass table name?
like we pass pipeline name select '@{pipline().pipline}' as pipline_name
static or dynamically
ReplyDelete
Replies
PavanAugust 25, 2020 at 11:58 AM
how do i read activity name of the previous activity in the next activity.
ReplyDelete
Replies
JérômeAugust 27, 2020 at 9:04 AM
Hello Rayis,
Thanks a lot for your articles, it is very helpful.

Could you please reupload global pipeline screenshot, it seems picture is broken.
I would like to understand your global logic, and more specifically how much time you call "[dbo].[sp_adf_pipeline_log_update]" stored procedure.

Thanks !
ReplyDelete
Replies
Divya GuruprasadJune 4, 2021 at 4:54 AM
Hi,i would like to extract pipeline duration and store it in sql..can i follow the same procedure
ReplyDelete
Replies
Divya GuruprasadJune 5, 2021 at 2:35 PM
but here u capture duration of oly metadata activity right?
i would like to get end to end pipeline duration
ReplyDelete
Replies
Divya GuruprasadJune 7, 2021 at 5:12 AM
i could not get the correct time when i did end time -starttime.planning to do with restapi?any idea
ReplyDelete
Replies

Add comment

Data Adventures

Search This Blog

System Variables in Azure Data Factory: Your Everyday Toolbox

Comments

Post a Comment