Using Durable Functions in Azure Data Factory - Support for long running processes in Azure Functions

(2020-Oct-14Ok, here is my problem: I have an Azure Data Factory (ADF) workflow that includes an Azure Function call to perform external operations and returns output result, which in return is used further down my ADF pipeline. My ADF workflow (1) depends on the output result of the Azure Function call; (2) plus a time efficiency of the Azure Function call is another factor to consider, if its time execution hits 230 seconds or more, ADF Azure Function will fail with a time-out error message and my workflow is screwed.

Image by Ichigo121212 from Pixabay 

I either have some high hopes that my Azure Function calls in a data factory pipeline will stay within 230 seconds or I need to make a change and replace a generic Azure Function call with something else, something more stable and reliable.

The time of 230 seconds is the maximum amount of time that an HTTP triggered function can take to respond to a request and Microsoft recommends either to refactor your serverless code execution or try and use Durable Functions, which is an extension of Azure Functions - https://docs.microsoft.com/en-us/azure/data-factory/control-flow-azure-function-activity#timeout-and-long-running-functions

Back in April of 2020, I have already blogged about the use of Azure Functions in Data Factory pipelines - http://datanrg.blogspot.com/2020/04/using-azure-functions-in-azure-data.html. I had already described possible variations of using Web, Webhook, and Azure Function activities to execute your Function App code and my frustration with the 230 seconds time limit.

So, I decided to check if a Durable Function could be a remedy for a long-running process that Azure Data Factory tries to govern. The official documentation describes Durable Functions as, “stateful functions in a serverless compute environment… they let you define stateful workflows by writing orchestrator functions and stateful entities by writing entity functions using the Azure Functions programming model”. I’m still confused by this definition, let I will be the only one confused. But for me the term “durable” for a function, means that it should provide a stable execution of long-running processes and support for a reliable orchestration of my serverless Function App code.

The first thing, I did, I searched online if anyone else had already shared their pain points and possible solutions of using Durable Functions in Azure Data Factory:

The first two ADF posts gave me some confidence that Durable Functions could be used in ADF, however, they only provided some screen-shots, no code examples, and no pattern to pass input to a Durable function and process its output in the end, which was critical to my real project use-case; but I still give credit to both guys for sharing this information. The third post is one of many very detailed and well written about Durable Functions, but they didn’t contain information about ADF and PowerShell code for my Function App that I was looking for. So, this was my leap of faith to do further exploration and possibly create an ADF solution with the Durable Functions that I needed.

Initial Information and Tutorial for Azure Durable Functions
Microsoft provides some very good examples and tutorials to start working with Durable Functions in the Azure Portal - https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=powershell. You have a way to create three types of Durable Functions or components; all of them will be necessary to build a single durable Function App workflow:
- Starter: to “start” a durable function “orchestrator
- Orchestrator: to “orchestrate” execution of an “activity” function
- Activity: actual serverless code of your function app that you want to perform

Then you can create sample durable functions in your Azure Function App:

The sample code is a simple solution to write the output of different city's names:









You can even test this whole solution and see the results. First, you will need to trigger the Starter function. In a few seconds if will provide you with a JSON output that will contain several URIs, one of them will be the statusQueryGetUri to supply a timely status of my further Orchestration execution. 
You can pass the value of this URI to your web browser to see how your durable function will operate in time.

First, its status will be "Pending", then it will change to "Running" and finally if all things go well, your durable function will be "Completed" and it also will return the output result back.


It all looked good, but this still wasn't my use case, I really needed to be able to pass a JSON request to my Durable function with a parameter value and expect that my durable function output will be based on that input parameter. I needed to make a change to my durable function code and I haven't started working in a data factory environment yet.

Changes made to my Azure Durable Functions
Alright, I made the following changes in my proof of concept Durable Function (my real use-case Azure Function business logic is far more complex).

DurableFunctions-Starter
First, I’ve added an HTTP request as an input parameter for my orchestration function. This $Request would also include a JSON message body to incorporate the whole amalgam of incoming parameters.
using namespace System.Net

param($Request, $TriggerMetadata)

$FunctionName = $Request.Params.FunctionName
#$InstanceId = Start-NewOrchestration -FunctionName $FunctionName
$InstanceId = Start-NewOrchestration -FunctionName $FunctionName -Input $Request
Write-Host "Started orchestration with ID = '$InstanceId'"

$Response = New-OrchestrationCheckStatusResponse -Request $Request -InstanceId $InstanceId
Push-OutputBinding -Name Response -Value $Response

DurableFunctions-Orchestrator
Second, I made a total makeover of the orchestration function by replacing references to hard-coded parameters to the Request body ($Context.Input.Body) that will be passed from my initial DurableFunctions-Starter function. It actually took a while to understand how to get to the Input reference from the incoming $Context parameter (time-saving tip :-)
using namespace System.Net

param($Context)

$output = @()

$output += Invoke-ActivityFunction -FunctionName 'DurableFunctions-Activity' -Input $Context.Input.Body

$output

DurableFunctions-Activity
The final change took place in the activity function, where now instead of outputting the incoming $name parameter, it takes the new $Request parameter (you can keep the @name or rename it any way you want to, just don't forget to change this in the function bindings or function.json supporting file).
 

The current Activity function now performs some processing by taking the timezone element from the incoming @Request parameter and then generating the @response with the output of UTC converted time to a requested timezone. Again, this wasn't a real project use case of my Azure Function that caused me to write this blog post, I'm just using this proof-of-concept code to show what can be done, the rest of the Azure Function/Durable Function coding is up to us to explore. 
using namespace System.Net

param($Request)

$timezone = $Request.timezone

if ($timezone) {
    $timelocal = Get-Date
    $timeuniversal = $timelocal.ToUniversalTime()

    $converted_time = [System.TimeZoneInfo]::ConvertTimeBySystemTimeZoneId($timeuniversal, [System.TimeZoneInfo]::Local.Id, $timezone)
    Write-Host "Requested TimeZone: $timezone"
    Write-Host "Local Time: $timelocal"
    Write-Host "UTC Time: $timeuniversal"
    Write-Host "Converted Time: $converted_time"
    $response = "{""Response"":"""+$converted_time+"""}"
}
else {
    $response = "{ ""Response"":""Please pass a request body""}"
}

# Outputing the result back to the Orchestrator
Write-Output $response


In the end, if you're developing your code in a Visual Studio Code environment or directly in Azure Portal (like me :-), it would help to test its functionality before attempting to use it in Azure Data Factory. Let’s do it.

I move myself to the DurableFunctions-Starter function, then I pass the JSON body request as ‘{"timezone": "Eastern Standard Time"}’ and click the Run button.

As a result, a few seconds later I get the following JSON response from my Starter durable function:

The Orchestration durable function returns the following output with the JSON Response value.

The Activity durable function provided me with more details on how it had processed the incoming "Eastern Standard Time" timezone parameter.

But, more than that, I can now use the highlighted "statusQueryGetUri" and check the status of my Durable function processing along with the returned output result in a browser:


This Uri can be used in my data factory workflow logic, more ADF notes are coming...

Woking model of an ADF pipeline to execute Azure Durable Functions
This is a working model of how I would create an ADF process to execute Azure Durable Function and process the function output result.
1) Prepare JSON request body message
2) Trigger a durable function execution
3) Check the status of my durable function execution and move further when the status is “Completed”
4) Process the returned output result if needed

Now, I would like to inject a few more details to explain how it works:
(1) Prepare JSON request body message
This could be a simple programmatic Set Value activity task to define a String variable of your HTTP JSON message body request. In my testing case it was a simple text: ‘{"timezone": "Eastern Standard Time"}’.

(2) Trigger a durable function execution
Here becomes an interesting part: usually, I thought that to execute a regular Azure Function I could use just its name which works with no issues. However Durable Function is a different thing, referencing my DurableFunctions-Starter as a name won’t work, and it will fail the Function App is not found error message.  Looking at the existing binding information of my Starter durable function, I can use the route parameter which I need to construct as a full name of my Durable Function (i.e. Orchestrator, because Starter durable function can call multiple Orchestrator functions if needed).


So, instead of using DurableFunctions-Starter as a function name to trigger, I had to set it to orchestrators/DurableFunctions-Orchestrator in my ADF Azure Activity task to execute.

(3) Check the status of my durable function
This should be easy since I already know that the output of my (2) Azure Function call will produce the "statusQueryGetUri" which I can check with a simple Web call. If it's not "Completed" yet, then I can wait a few seconds/minutes and repeat this web call again.

The web activity is configured to execute GET method for the following URL: @activity('Azure Durable Function call').output.statusQueryGetUri. The statusQueryGetUri value will be different every time when you execute your Azure Function, so a dynamic setting is very helpful. 

This chase to check function call status and then wait and do it again, will be repeated until the output of the web activity equals to “Completed” or this leads to a similar expression in my Until activity task setting configured to @not(or(equals(activity('Get Current Function Status').output.runtimeStatus, 'Pending'), equals(activity('Get Current Function Status').output.runtimeStatus, 'Running'))).

(4) Process the returned output result
As a result of a successful ADF pipeline run, I can extract my Durable Function output Response value from what comes out of the "Get Current Function Status" activity task, and this expression @activity('Get Current Function Status').output.output.Response returns the Response with a converted time based on the initially requested time-zone:


Mission accomplished!

Closing notes
I think this has been the longest journey to create a blog post for me so far: a couple of weekends and a few more nights to create a working prototype of durable functions which can be executed in my Azure Data Factory workflow, and I can also pass incoming JSON formatted parameters and process returned results at the end.

I’m happy that it has worked in the end, I’m also glad that this leap of faith was worth to try and experience to struggle with some not well-documented concepts of durable functions and how they can be incorporated in Azure Data Factory along with many hours of trials and errors, eventually resulted to a successful “green” outcome :-)

Feel free to share this post and send me your comments if your experience with Durable Functions was different than mine. 

After finishing writing this blog post, I can remember this whole journey of using Durable Functions in Azure Data Factory better: long-rinning Azure Function code can be supported in ADF! Well done, Microsoft, good job! :-)

Part 1: Using Azure Functions in Azure Data Factory

Part 3: Using Azure Durable Functions with Azure Data Factory - HTTP Long Polling

Comments

  1. You should use the webhook activity of ADF, it will be much cheaper. Durable functions already provides a webhook which can be called from ADF. See this blob post: https://datanrg.blogspot.com/2020/04/using-azure-functions-in-azure-data.html

    ReplyDelete
    Replies
    1. Thanks, Anonymous, for sharing this blog post! This is a first when I get recommended to read a blog post that I wrote myself :-) Durable functions in my case gave me an option to have Azure Function activities running for hours and ADF wouldn't complain about it. Having just a webhook activity in ADF was still limited to 1 minute, correct if I'm wrong.

      Delete
    2. D'oh! Yeah sorry, realized that after posting, had the wrong link... I'm amazed that you are the only one with the solution to this, even the MS docs show your first implementation.

      I've managed to do this using a Logic App using the WebHook activity, so I've been trying to find out if the same is could be possible with Durable Functions and avoid using the loop.

      Delete
    3. No worries :-) We learn from each other. Where in MS docs did you see my implementation?

      Delete
    4. The MS docs only mention the implementation, light weight (barely 5 lines) as usual:
      https://docs.microsoft.com/en-us/azure/data-factory/control-flow-azure-function-activity#timeout-and-long-running-functions

      Delete
    5. Oh, yes, I struggled with this a lot and very glad that the approach of Durable Function use case worked for me, which was still in preview for the PowerShell function. But it's stable now in my work project.

      Delete
  2. Until activity will not work inside for each loop so in my scenarios this is not working as i have foreach loop inside azure function

    ReplyDelete
    Replies
    1. You can always create sub-pipeline to avoid current lack of nested loops functionality in ADF.

      Delete
  3. by any chance can someone please direct me of achieving this using python language pls . SOS

    ReplyDelete
    Replies
    1. https://docs.microsoft.com/en-us/azure/azure-functions/durable/quickstart-python-vscode

      Delete
  4. Rayis, Thanks you for taking the time to write this up. Amazing walkthrough and super helpful! I can finally say my least favorite number is 230 :p

    ReplyDelete
    Replies
    1. I feel your pain :-) and I am very glad that my blog post has been helpful to you as well!

      Delete
  5. Hey! I got a question when it comes to the Orchestrator function.

    I would like to include an IF statement in there based off of a specific value that comes in the body.

    Sample of the code I want to achieve:

    using namespace System.Net

    param($Context)

    $ActivityName = $Context.Input.Body.ActivityName

    IF($ActivityName -eq "Activity1"){
    Invoke-ActivityFunction -FunctionName $ActivityName -Input $Context.Input.Body
    }
    IF($ActivityName -eq "Activity2"){
    Invoke-ActivityFunction -FunctionName $ActivityName -Input $Context.Input.Body
    }

    Otherwise, don't run anything.

    When I try to get the outpuf of $ActivityName or try to get any of the values from the body, I get this error:

    (Failed, Id=d1f3cd94-793d-489a-87f2-96498b97f794, Duration=10ms)Unable to cast object of type 'System.String' to type 'Microsoft.Azure.WebJobs.DurableOrchestrationContext'.

    If I just leave out the IF statements and have the Orchestrator run one function everytime, there are no issues and the activity runs no problem.

    I am currently using the Azure Portal for this and was wondering if you came across this at all.

    ReplyDelete
    Replies
    1. Thanks, Roman for your question, and it's good that you were able to find a solution for your question.

      Delete
  6. Hi! You are the only one with a solution for calling a durable function in azure function activity. The problem is that I am using orchestrators/ as you have explained in this tutorial and still I get the "Not Found Error"

    ReplyDelete
    Replies
    1. Yes, I feel your pain. This is what worked for me. Create a PowerShell Function App. Add a new Function using Azure portal, follow a template solution for Durable Function. Get the code as it is, test and understand how it works, and then adjust according to your needs.

      Delete

Post a Comment