Passing JSON arrays between pipelines in Azure Data Factory

(2021-Feb-15) This post a continuation of my personal experience working with arrays (or simply JSON structures) in Azure Data Factory (ADF). It just happened that the more I work with JSON in ADF, the more interesting cases I have a chance to discover.

In retrospect, these are some of my previous discoveries:
This next post came out of an error message during my attempt to pass a hard-coded array value between pipelines. Strangely, this use-case worked well in the pipeline that was already deployed in ADF, however, I was getting an error message while trying to test and execute this very same pipeline in a Debug mode.

Photo by Zen Chung from Pexels

To refresh my memory of what works well with JSON arrays in Data Factory, let's say we have a JSON list of imaginary breakfast items: ["Egg","Toast","Coffee"]

(1) I can set a default value both for an array type parameter or variable by simply passing a JSON text value as it is, with no problem:


(2) In a similar way I can pass the same JSON text ["Egg","Toast","Coffee"] to a pipeline parameter before its execution. It even has its one automatic JSON validator and will notify me if my JSON is not valid, and this functionality is very nice!

(3) However I can't set a variable array value in the ADF Set Variable activity by using the same JSON text structure ["Egg","Toast","Coffee"]. First, it will give me a warning message, "Expression of type: 'String' does not match the filed: 'value'", which is a bit confusing.
Then it will fail with an error message, "The variable of type 'Array' cannot be initialized or updated with value of type 'String'. The variable only supports values of types 'Array'.", when I try to execute this Set Variable activity in ADF.

(4) The same error message, "The variable of type 'Array' cannot be initialized or updated with value of type 'String'", will appear if I try to pass this JSON value to another sub-pipeline, even if it visually matches with its default value. Super strange!


The only way to resolve my cases #3 and #4 of referencing JSON ["Egg","Toast","Coffee"] value is to wrap it by the ADF JSON function, @json('["Egg","Toast","Coffee"]'). This will remove all error messages and show correctly passed JSON array.

Yes, I know that properly constructed arrays can be referred to as variables when you specify your sub-pipelines, or hard-coded JSON can be replaced by streaming data that is properly extracted, formatted, and referenced in sub-pipelines parameters.

However, I'm still confused that the original JSON value is allowed and automatically validated during the setting sourcing parameters prior to ADF pipeline execution or setting default array variable values, but this is not allowed during Set Variable activities or passing array parameters between pipelines.

This is one of the "gotchas" that I need to remember while working with JSON values in ADF pipelines.


Comments

  1. At the start of my pipelines I have a lookup function to read settings from a database.

    This ends up within the "firstRow" as JSON and is late schema (I can have whatever fields or name\value pairs and add more at any time without changing the code in the data factory).

    To set this as a variable or pass it up to another pipeline, I do this:
    @string(activity('LookupPipelineSettings').output.firstRow)

    To read a single value I do this:
    @json(variables('PIPELINE_SETTINGS')).source_constr_secret

    ReplyDelete

Post a Comment