(2019-Oct-27) Creation or deletion files in your Azure Storage account may initiate data ingestion process and will support your event-driven data platform architecture.
Image by Lars_Nissen_Photoart from Pixabay
Microsoft recently introduced an additional change to file-event triggers within Azure Data Factor. What this change does, it gives you a bit more control for files that you want to be used or not to be used in your data ingestion process.
Before we get into more details on how to use this new "Ignore empty blobs" feature, let's briefly review possible scenarios of using file event triggers in your data processing workflow.
Ingest new data in batches
A batch of incoming sourcing data may come as a set of files and sometimes those files could be archives with other files within them. Those incoming files don't arrive at once, usually, it's a sequential process and it may have some delays between starting and ending files of this set. In order to orchestrate a synchronized data ingestion process and start loading those files as a complete set, your data provider will generate an additional flag-file (or end-file) to indicate the end of file uploading for a particular batch. And only after receiving this flag-file, your data ingestion process starts.
Ingest new data as it comes
With this approach, you create your data ingestion framework to react to each incoming data file that may arrive at a particular location. And as soon as the new file arrives, it triggers your process to ingest just this new data file into your data store.
Where this new ADF triggers change is helpful, it places control on this particularly reactive process to load new data files. In case if your data vendor by mistake or other reasons sends you an empty file, then when you set this "Ignore empty blobs" setting to "Yes", your data ingestion pipeline wouldn't be triggered and you don't have to worry about creating a special logic to handle empty sourcing files in your data ingestion pipeline. Empty files won't be loaded at all.
Working with ADF triggers has become a bit easier ! :-)
Image by Lars_Nissen_Photoart from Pixabay
Microsoft recently introduced an additional change to file-event triggers within Azure Data Factor. What this change does, it gives you a bit more control for files that you want to be used or not to be used in your data ingestion process.
Before we get into more details on how to use this new "Ignore empty blobs" feature, let's briefly review possible scenarios of using file event triggers in your data processing workflow.
Ingest new data in batches
A batch of incoming sourcing data may come as a set of files and sometimes those files could be archives with other files within them. Those incoming files don't arrive at once, usually, it's a sequential process and it may have some delays between starting and ending files of this set. In order to orchestrate a synchronized data ingestion process and start loading those files as a complete set, your data provider will generate an additional flag-file (or end-file) to indicate the end of file uploading for a particular batch. And only after receiving this flag-file, your data ingestion process starts.
Ingest new data as it comes
With this approach, you create your data ingestion framework to react to each incoming data file that may arrive at a particular location. And as soon as the new file arrives, it triggers your process to ingest just this new data file into your data store.
Where this new ADF triggers change is helpful, it places control on this particularly reactive process to load new data files. In case if your data vendor by mistake or other reasons sends you an empty file, then when you set this "Ignore empty blobs" setting to "Yes", your data ingestion pipeline wouldn't be triggered and you don't have to worry about creating a special logic to handle empty sourcing files in your data ingestion pipeline. Empty files won't be loaded at all.
Working with ADF triggers has become a bit easier ! :-)
Hi,
ReplyDeleteMy pipeline is failing even if my file is 125 b size when "ignore empty blobs" set as true. When i set "ignore empty blobs" to false then my pipeline succeed.
I have different environment like dev, pre-prod.
In dev "ignore empty blobs" set as true working fine but not in pre-prod
What could be the reason.
If the same code with the save blob files behaves differently and you don't have any other environment specific configurations in your ADF instances, then connect with MSFT to troubleshoot.
DeleteIn my case, and I haven't shared this in my blog post, the "Ignore empty blobs" feature was developed my MSFT as a fix to their blob triggers. We had our ADF solution in Production working for months, after one of the 'silent' update to ADF by MSFT all our jobs failed.
MSFT initially told us that this was a built-in feature and empty files shouldn't trigger ADF pipelines. The reversed their ADF code base specifically for our company to enable our pipelines.
After some lengthy talks in a senior level, MSFT agreed to fix this issue and eventually introduced this "Ignore empty blobs" feature. Kudos to their dev team and custom support.