(2020-Apr-06) Traditionally I would use data flows in Azure Data Factory (ADF) to flatten (transform) incoming JSON data for further processing. Recently I've found a very simple but very effective way to flatten incoming JSON data stream that may contain a flexible structure of data elements, and this won't require using data flow transformation steps.
Part 1: Transforming JSON to CSV with the help of Azure Data Factory - Mapping Data Flows
Part 2: Transforming JSON to CSV with the help of Azure Data Factory - Wrangling Data Flows
Here is my story :-)
Let's say I have the following JSON file that I want to parse one element (event) at the time:
A simple ADF pipeline can be created to read the content of this file and a stored procedure to call by passing individual JSON data elements as parameters for this procedure.
The output of the "Get JSON data" Lookup activity task is passed to the "ForEach container" with the following expression: @activity('Get JSON data').output.firstRow.events
Within my "ForEach" container I have also placed a Stored Procedure task and set 4 data elements from my incoming data stream as values for corresponding parameters.
However this approach will not work for all my incoming JSON events, it actually failed for the last one, since it didn't have both "stop_time" and "last_update" data elements.
Error message:
An easy way to fix this problem is to add missing data elements with empty values for the last event record, however, when we don't have control over incoming data, we need to adjust our data processing steps.
Solution:
We can check if "stop_time" and "last_update" data elements exist in the @item iteration dataset. If they don't exist, then we can replace them with other default values.
I'm not aware if there are built-in operators in ADF to do this, however, it still can be done by converting @item output into a string and then do a simple text search within this converted text line.
My initial stop_time expression "@item().stop_time" can be replaced with "@if(contains(string(item()),'"stop_time":'),item().stop_time,null)",
and initial last_update expression "@item().stop_time" can be replaced with "@if(contains(string(item()),'"last_update":'),item().last_update,utcNow())"
After this quick fix, the whole ADF pipeline ran successfully.
So, using a simple data conversion with the help of String ADF function we can flatten JSON data into a string, and that explains the title of my blog post :-)
Photo by Rodolfo Clix from Pexels
Part 1: Transforming JSON to CSV with the help of Azure Data Factory - Mapping Data Flows
Part 2: Transforming JSON to CSV with the help of Azure Data Factory - Wrangling Data Flows
Here is my story :-)
Let's say I have the following JSON file that I want to parse one element (event) at the time:
A simple ADF pipeline can be created to read the content of this file and a stored procedure to call by passing individual JSON data elements as parameters for this procedure.
The output of the "Get JSON data" Lookup activity task is passed to the "ForEach container" with the following expression: @activity('Get JSON data').output.firstRow.events
Within my "ForEach" container I have also placed a Stored Procedure task and set 4 data elements from my incoming data stream as values for corresponding parameters.
However this approach will not work for all my incoming JSON events, it actually failed for the last one, since it didn't have both "stop_time" and "last_update" data elements.
Error message:
An easy way to fix this problem is to add missing data elements with empty values for the last event record, however, when we don't have control over incoming data, we need to adjust our data processing steps.
Solution:
We can check if "stop_time" and "last_update" data elements exist in the @item iteration dataset. If they don't exist, then we can replace them with other default values.
I'm not aware if there are built-in operators in ADF to do this, however, it still can be done by converting @item output into a string and then do a simple text search within this converted text line.
My initial stop_time expression "@item().stop_time" can be replaced with "@if(contains(string(item()),'"stop_time":'),item().stop_time,null)",
and initial last_update expression "@item().stop_time" can be replaced with "@if(contains(string(item()),'"last_update":'),item().last_update,utcNow())"
After this quick fix, the whole ADF pipeline ran successfully.
So, using a simple data conversion with the help of String ADF function we can flatten JSON data into a string, and that explains the title of my blog post :-)
Thanks for this post :)
ReplyDeleteYou could probably also write it like this:
ReplyDelete@coalesce(item()?.last_update,utcNow())
Notice the question mark for the null check. I've never seen it documented specifically for ADF, but it does work.
Yes, I've tested the coalesce function as well, it worked in most cases, but there was one where it still failed. And I looked for a more stable solution to extract and search if a particular data element existed in incoming data stream. Thanks for your comment!
DeleteHi there, thanks for the post. Just a quick question, is there a way to read json data stored in the database?
ReplyDeleteTechnically you can store JSON value in a table column and then parse it either manually within the control flow or using Flatten transformation in Mapping Data Flows.
DeleteGreat article! Thank you for sharing such usefull imformation, But i want to know more about Informatica Read Json please share more imformative article with us.
ReplyDeleteHi James, I would suggest to contact Informatica professionals on this.
DeleteThank you for the useful article, is there a way to insert null in dataflow sink (Json output)?
ReplyDeleteIf by null in JSON you mean an empty string value (""), then you can define this value in your ADF as ('') and then pass it your your output sink.
DeleteHI, by using web activity we can able to pass one more REST API link sequentially
ReplyDeleteYes, I blogged about using REST API as a source in ADF recently, I hope it will be helpful to you as well: https://datanrg.blogspot.com/2023/04/using-azure-data-factory-to-read-and.html
Delete