Transforming JSON data with the help of Azure Data Factory - Part 3

(2020-Apr-06) Traditionally I would use data flows in Azure Data Factory (ADF) to flatten (transform) incoming JSON data for further processing. Recently I've found a very simple but very effective way to flatten incoming JSON data stream that may contain a flexible structure of data elements, and this won't require using data flow transformation steps.

Photo by Rodolfo Clix from Pexels

Part 1Transforming JSON to CSV with the help of Azure Data Factory - Mapping Data Flows
Part 2: Transforming JSON to CSV with the help of Azure Data Factory - Wrangling Data Flows

Here is my story :-)

Let's say I have the following JSON file that I want to parse one element (event) at the time:



A simple ADF pipeline can be created to read the content of this file and a stored procedure to call by passing individual JSON data elements as parameters for this procedure.

The output of the "Get JSON data" Lookup activity task is passed to the "ForEach container" with the following expression: @activity('Get JSON data').output.firstRow.events

Within my "ForEach" container I have also placed a Stored Procedure task and set 4 data elements from my incoming data stream as values for corresponding parameters.


However this approach will not work for all my incoming JSON events, it actually failed for the last one, since it didn't have both "stop_time" and "last_update" data elements.


Error message:


An easy way to fix this problem is to add missing data elements with empty values for the last event record, however, when we don't have control over incoming data, we need to adjust our data processing steps.

Solution:
We can check if "stop_time" and "last_update" data elements exist in the @item iteration dataset. If they don't exist, then we can replace them with other default values. 

I'm not aware if there are built-in operators in ADF to do this, however, it still can be done by converting @item output into a string and then do a simple text search within this converted text line.

My initial stop_time expression "@item().stop_time" can be replaced with "@if(contains(string(item()),'"stop_time":'),item().stop_time,null)",
and initial last_update expression "@item().stop_timecan be replaced with "@if(contains(string(item()),'"last_update":'),item().last_update,utcNow())"

After this quick fix, the whole ADF pipeline ran successfully.



So, using a simple data conversion with the help of String ADF function we can flatten JSON data into a string, and that explains the title of my blog post :-) 

Comments

  1. You could probably also write it like this:
    @coalesce(item()?.last_update,utcNow())

    Notice the question mark for the null check. I've never seen it documented specifically for ADF, but it does work.

    ReplyDelete
    Replies
    1. Yes, I've tested the coalesce function as well, it worked in most cases, but there was one where it still failed. And I looked for a more stable solution to extract and search if a particular data element existed in incoming data stream. Thanks for your comment!

      Delete
  2. Hi there, thanks for the post. Just a quick question, is there a way to read json data stored in the database?

    ReplyDelete
    Replies
    1. Technically you can store JSON value in a table column and then parse it either manually within the control flow or using Flatten transformation in Mapping Data Flows.

      Delete
  3. Great article! Thank you for sharing such usefull imformation, But i want to know more about Informatica Read Json please share more imformative article with us.

    ReplyDelete
    Replies
    1. Hi James, I would suggest to contact Informatica professionals on this.

      Delete
  4. Thank you for the useful article, is there a way to insert null in dataflow sink (Json output)?

    ReplyDelete
    Replies
    1. If by null in JSON you mean an empty string value (""), then you can define this value in your ADF as ('') and then pass it your your output sink.

      Delete

Post a Comment