Working with Arrays in Azure Data Factory

(2019-June-06) I remember that I had a professor at my university who would often encourage his students to learn and get more experienced with simple things first. Learn the basics in and out and then move forward to more complicated concepts and practices; that was his motto, and he really tried to share this idea with us.

Simple things sometimes can be overlooked as well. With the addition of Variables in Azure Data Factory Control Flow (there were not available there at the beginning), Arrays have become one of those simple things to me. 

Image by Magnascan from Pixabay

Currently, there are 3 data types supported in ADF variables: String, Boolean, and Array. The first two are pretty easy to use: Boolean for logical binary results and String for everything else, including the numbers (no wonder there are so many conversion functions in Azure Data Factory that we can use).

I've also blogged about using Variables in Azure Data Factory:
Setting Variables in Azure Data Factory Pipelines
Append Variable activity in Azure Data Factory: Story of combining things together  
System Variables in Azure Data Factory: Your Everyday Toolbox 
Azure Data Factory: Extracting array first element

Going back to my memory flashback of the professor guidance for learning and using simple things, I've finally realized that they are worth to get more experienced with! Why? Because arrays are everywhere in the Control Flow of Azure Data Factory:
(1) JSON output most of the activity tasks in ADF can be treated as multiple level arrays
(2) Collections that are required for the "ForEach" activity can be outsourced from the preceding (1) activity outputs
(3)  "Set Variable" and "Append Variable" activity could be used to store receding (1) activity outputs for further data transformation
(4) You can create Arrays manually by transforming existing linear values or setting them with hard-coded values (fixed collections).

Don't forget about various functions and expressions to support your work with Arrays in Azure Data Factory (https://docs.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions):
intersection Returns a single array or object with the common elements between the arrays or objects passed to it.
union Returns a single array or object with all of the elements that are in either array or object passed to it.
first Returns the first element in the array or string passed in.
last Returns the last element in the array or string passed in.
skip Returns the elements in the array starting at index Count.
length Returns the number of elements in an array or string.
jsonConvert the parameter to a JSON type value.
arrayConvert the parameter to an array.
createArrayCreates an array from the parameters. 
rangeGenerates an array of integers starting from a certain number, and you define the length of the returned array.

Just to show a quick example of some simple operations with arrays, I had created this ADF pipeline with 4 main components:



(1) Lookup task to read a CSV file with 2 columns of Syllabic music notes:

Where the JSON output of this activity task contains 7 elements:


(2) Set Variable task converts a text string of "C-D-E-F-G-A-B"

into an array variable Notes_Alpabet using this expression:
@split(variables('Notes_Alphabet_String'),'-')



(3) Then looping through a collection of array elements of the (1) activity task output:
@activity('Lookup Notes_Syllabic').output.value

I then append a combination of Syllabic and Alphabet music notes into the Notes_Combined array variable

using this expression
@concat(item().ID,'-',item().Note,'-',variables('Notes_Alphabet')[add(int(item().ID),-1)])
important part is that I can locate the Notes_Alphabet variable element with the index of the Lookup Notes_Syllabic collection index: 
variables('Notes_Alphabet')[add(int(item().ID),-1)] where item().ID value comes from the first column of my sourcing file.

(4) As a result, I'm copying the content of the array Notes_Combined variable into another array Notes_Combined_View variable for Debug purpose:


And this helped me to see how both Syllabic and Alphabet music notes correspond to each other.

The code of this ADF pipeline can be found here:
https://github.com/NrgFly/Azure-DataFactory/blob/master/Samples/pipeline/adf_arrays_sample_pl.json

And I hope you will find this blog post helpful in your journey to explore simple things of the Azure Data Factory!

Comments

  1. I accidentally deleted a comment, but I still had its copy in my email message: "Hello. Any idea how to specify default values for a parameter or a variable in a pipeline to be used later on in a ForEach element?
    Basically, trying to define a list of tables from where to pull data from. I know about the Lookup elements which can read data from a storage account (Data Lake, blob, etc) files or from a DB somewhere. But as a default value to test my pipeline?

    Thanks, Tudor "

    ReplyDelete
    Replies
    1. And this is my answer to this question, sorry Tudor, if it took long to answer it. I play a little with Array variable, and realized that we can actually set default values for Array variables in Data Factory. There are two requirements: (1) list of values need to wrapped in square brackets and (2) values need to be separated with coma. These are two examples for Text and Numeric array list:
      - ["C","D","E","F","G","A","B"]
      - [1,2,3,4,5,6,7]
      And then you can literally use value of this variable in a ForEach loop container. I've just tested it, it works.
      Thanks for a very good question!

      Delete
  2. Hi Rayis,

    I'm currently doing a parameterized pipeline that includes sql queries in the json parameter file. However, when the parameter value pass it to foreach parameters it passes as string and not an expressions since it was already an expression for getting the value of parameter from the pipeline. Hope you could help me with this. By the way, you're blogs are really amazing!

    Thank you!

    ReplyDelete
    Replies
    1. If I understand your use-case correctly, then I would either pass the output of json parameter file reading activity with your SQL value as an expression for my next SQL task to execute. Or save the SQL value as a variable and use this variable value expression in my SQL task in ADF.

      Delete
    2. Even SQL queries will be stored in a json file (parameter file), it will be read by a lookup activity and pass to foreach activity to get all the values for parameters. But the SQL queries when it was passed in another foreach activity it consider as string and not expression.

      sqlQuery: select * from @item().table_name

      but when it passes in sqlReaderQuery of copy activity, the value is still the same:

      sqlReaderQuery: select * from @item().table_name

      Thus, the error.

      Delete
    3. Change value of your SQL variable to this:
      @concat('select * from ', item().table_name).

      Delete

Post a Comment