Transforming JSON to CSV with the help of Flatten task in Azure Data Factory

Posted by Rayis Imayev on March 19, 2020

(2020-Mar-19) Recently, Microsoft introduced a new Flatten task to the existing set of powerful transformations available in the Azure Data Factory (ADF) Mapping Data Flows - https://docs.microsoft.com/en-us/azure/data-factory/data-flow-flatten.

What this new task does it helps to transform/transpose/flatten your JSON structure into a denormalized flatten datasets that you can upload into a new or existing flat database table.

2020-Mar-26 Update:
Part 2: Transforming JSON to CSV with the help of Flatten task in Azure Data Factory - Part 2 (Wrangling data flows)

I like the analogy of the Transpose function in Excel that helps to rotate your vertical set of data pairs (name : value) into a table with the column names and values for corresponding objects. And when this vertical JSON structural set contains several similar sets (array) then ADF Mapping Data Flows Flatten does a really good job by transforming it into a table with several rows (records).

Let's use this JSON data file as an example

{
 "id": "0001",
 "type": "donut",
 "name": "Cake",
 "ppu": 0.55,
 "batters":
  {
   "batter":
    [
     { "id": "1001", "type": "Regular" },
     { "id": "1002", "type": "Chocolate" },
     { "id": "1003", "type": "Blueberry" },
     { "id": "1004", "type": "Devil's Food" }
    ]
  },
 "topping":
  [
   { "id": "5001", "type": "None" },
   { "id": "5002", "type": "Glazed" },
   { "id": "5005", "type": "Sugar" },
   { "id": "5007", "type": "Powdered Sugar" },
   { "id": "5006", "type": "Chocolate with Sprinkles" },
   { "id": "5003", "type": "Chocolate" },
   { "id": "5004", "type": "Maple" }
  ]
}

and create a simple ADF mapping data flow to Flatten this JSON file into a CSV sink dataset.

On a high level my data flow will have 4 components:
1) Source connection to my JSON data file
2) Flatten transformation to transpose my Cake to Toppings
3) Further Flattent transformation to transpose my Cake > Toppings to Batters
4) Sink output Flatten result in a CSV file

(1) Source connection to my JSON data file
Connection to my JSON file is simple, however, it's interesting to see how the output of my consumed JSON file is shown in the Data Preview tab, which shows one row with several array objects.

(2) Flatten transformation to transpose my Cake to Toppings
My next Flatten transformation task transposes the JSON Topping array with 2 objects (id, type) into 7 flatten rows, where JSON Batter objects are now visible as individual arrays.

(3) Further Flattent transformation to transpose my Cake > Toppings to Batters
All 7 records that came out from the previous Flatten transformation task can now be used as input for my further Flatten transformation. This helps to convert (unroll) 2 additional fields from the Batter JSON subset (id, type).

(4) Sink output Flatten result in a CSV file
Data Preview option in my Sink doesn't get changed from its sibling in the previous task, so I thought to challenge the Data Factory and replaced my initial JSON file that contained one object with another file that would contain several similar objects.

[
 {
  "id": "0001",
  "type": "donut",
  "name": "Cake",
  "ppu": 0.55,
  "batters":
   {
    "batter":
     [
      { "id": "1001", "type": "Regular" },
      { "id": "1002", "type": "Chocolate" },
      { "id": "1003", "type": "Blueberry" },
      { "id": "1004", "type": "Devil's Food" }
     ]
   },
  "topping":
   [
    { "id": "5001", "type": "None" },
    { "id": "5002", "type": "Glazed" },
    { "id": "5005", "type": "Sugar" },
    { "id": "5007", "type": "Powdered Sugar" },
    { "id": "5006", "type": "Chocolate with Sprinkles" },
    { "id": "5003", "type": "Chocolate" },
    { "id": "5004", "type": "Maple" }
   ]
 },
 {
  "id": "0002",
  "type": "donut",
  "name": "Raised",
  "ppu": 0.55,
  "batters":
   {
    "batter":
     [
      { "id": "1001", "type": "Regular" }
     ]
   },
  "topping":
   [
    { "id": "5001", "type": "None" },
    { "id": "5002", "type": "Glazed" },
    { "id": "5005", "type": "Sugar" },
    { "id": "5003", "type": "Chocolate" },
    { "id": "5004", "type": "Maple" }
   ]
 },
 {
  "id": "0003",
  "type": "donut",
  "name": "Old Fashioned",
  "ppu": 0.55,
  "batters":
   {
    "batter":
     [
      { "id": "1001", "type": "Regular" },
      { "id": "1002", "type": "Chocolate" }
     ]
   },
  "topping":
   [
    { "id": "5001", "type": "None" },
    { "id": "5002", "type": "Glazed" },
    { "id": "5003", "type": "Chocolate" },
    { "id": "5004", "type": "Maple" }
   ]
 }
]

Source file Data Preview correctly showed me 3 rows. All next Flatten transformation tasks' outputs were tripled in their results, and my final Output file contained all 41 expected records!

Well done, Microsft team!

I really like this visual way to transform (flatten) a sourcing JSON stream into a CSV file.

Comments

AnonymousJune 3, 2020 at 10:09 AM
I am new to this and I cannot to get pass step 2 with your JSON file you provided. I had to manually add mapping topping.id and topping.type in flatten settings; and in Data Preview, it tells me those 2 columns are empty.
ReplyDelete
Replies
UnknownJune 4, 2020 at 12:36 PM
Could you do a similar step by step demo help converting a csv file to a json structure with array objects?
ReplyDelete
Replies
UnknownJune 22, 2020 at 10:48 AM
Can we do same in mapping data flow? because when i try to do i am not been successful.
ReplyDelete
Replies
LuisJuly 12, 2020 at 2:30 PM
Quiz? This dataflow is linked to a pipeline to execute with the transformations
ReplyDelete
Replies
Suprams Info SolutionsJuly 24, 2020 at 6:35 AM
The most grateful blog! Great site! It looks great! Continue in the same spirit!
ReplyDelete
Replies
AnonymousNovember 30, 2020 at 1:41 PM
If I have a text delimited file containing three columns - EmpId, Empname and Detail. The detail column contains the JSON string. Lets suppose this is the JSON:

{
"id": "0001",
"type": "Permanent",
"name": "ABC"
}

Would it be possible to create an output flat file using Data flow in such scenario? e.g. an output file containing columns- EmpId, Empname, id, type, name
ReplyDelete
Replies
NavilleApril 26, 2021 at 7:36 PM
Hi! Great tutorial, this is exactly what I was looking for to continue a particular job I am working on. However, I am noticing this is not working if I read a folder with several .json files over it (more that 500k) will this be the best approach for such project?
ReplyDelete
Replies
AnonymousDecember 19, 2022 at 2:52 AM
Hi Rayis, Thanks for helping us with different scenarios and examples in reading JSON files and flattening. I am having a huge JSON with 2 GB data. My Pipeline works fine with small JSON files with same pattern but when I initiate a 2 GB file it fails reading at the source itself. Could you please help me with this. Thank You. Ravi

ReplyDelete
Replies
AnonymousJuly 15, 2024 at 3:38 PM
Hello,

I have just opposite requirement, I have flat file and wanted to convert to json format which I will be pushing to SAP system. Any input would be appriciated.

input:
id,Name,contact
2,Test Customer,123-3456-987

expected output:
{
"Name": "Test Customer",
"Id": 2,
"to_AddressIndependentMobile": {
"results": [
{
"Contact": "123-3456-987"
}
]
}
}
ReplyDelete
Replies
AnonymousJuly 15, 2024 at 3:42 PM
I have just posted more details on stackoverflow https://stackoverflow.com/questions/78751151/converting-a-flat-file-into-json-structure-using-adf-azure-data-factory
ReplyDelete
Replies

Add comment

Data Adventures

Search This Blog

Transforming JSON to CSV with the help of Flatten task in Azure Data Factory

Comments

Post a Comment