Logical crossroads in Azure Data Factory (IF and Filter operations)

(2021-Jan-19) I was raised listening and reading fairy tales where the main character would reach a crossroad with a large stone that had some directions written on it – turn right and you will lose your horse, turn left and you will lose your life, walk straight and you will find your happiness. 

Also, growing up in a small Ukrainian industrial city, closely situated to a railroad hub, I was always fascinated to see many colorful rail traffic lights, trying to imagine where a myriad of rail tracks would lead trains on them.

Image by MichaelGaida from Pixabay

Similarly, Azure Data Factory (ADF) provides several ways, to control/direct/filter your pipeline workflows; it’s all conditioned and constrained to the boundaries of my “crossroad stone” writings.

ADF IF activity allows me to have two containers for a set of activities; which container activities are executed is decided by a condition evaluation to True or False (multiple evaluations can happen with the Switch activity too).

ADF Filter activity acts as a funnel for your incoming flow (i.e. set of data elements, an array, etc.), and this sifted activity output can be referenced directly or semi-directly by other activities in your pipeline, whilst those activities succeed their predecessor.

However, with Filter activity, we can also direct the flow of actions by allowing different filtering conditions to lead your pipeline execution in one or another way.

As well as, when your IF activity has only one condition, then it starts acting as a Filter. In this role, it no longer directs a flow but filters each element of my flow array.

As a ponderous and heavy thoughtful process that might be, and thinking how elegant Precedence Constraints were developed in SSIS (a predecessor to ADF), the ADF Activity dependencies are still not ideal, since they don’t represent the actual flow of your pipeline activities. In addition to that, IF containers don’t allow to have other loop containers inside of them (ForEach or Until). Try to iterate your conditioned array of elements inside the IF task, you can’t unless that task is another pipeline. Also, nested loops are still not possible in Azure Data Factory.

Frustrated and confused, I was able to find a comforting thought that Filter operation in ADF is much better both for refining your flow of data elements as well as for controlling the execution of the flow: the filtered output is still available in the same layer of your pipeline (you don’t need to rely on the compartmentalized settings of the IF or similar constructions), and this will allow me to have a natural iteration process.

This approach is still not perfect, but will easily allow me to process an array of arrays if needed :-)