Extracting data from Common Data Services (Microsoft Dynamics 365) using Azure Data Factory

(2020-Mar-30) Little did I know that I can use Azure Data Factory (ADF) to extract data from the Common Data Services (CDS). This data platform enables you to securely store and manage data that can be consumed by other business applications. You are totally free to rely on Standard CDS datasets (entities) or challenge yourself and create additional custom entities with cloud-based storage options for your data. 

Microsoft also has a very good set of documentation to clarify some of your knowledge gaps (those gaps are mine too :-) in using Azure Data Factory with CDS. This blog post is more to save my notes that I can use later to help and remind myself about some other additional aspects in creating data integration projects with CDS. 

I also would like to mention my coworker's name Dennian Clarke who helped me with finding my way around among myriads of standard and system CDS entities and showing me how to dynamically extract data from CDS using FetchXML queries.

Common Data Services (CDS) entity names:
If you have never worked with Microsoft Dynamics 365 CRM system, look at it as a scalable and secure relational database with tables (called entities). You have a UI interface to access and manage your data entities; in addition to that, data connectors are available if you want to use CRM data in your external applications.

Each CDS entity can be described by several attributes, some of those attributes are:
  • SchemaName: Typically, a Pascal cased version of the logical name. i.e. Account
  • CollectionSchemaName: A plural form of the Schema name. i.e. Accounts
  • LogicalName: All lower-case version of the schema name. i.e. account
  • LogicalCollectionName: All lower-case version of the collection schema name. i.e. accounts
  • EntitySetName: Used to identify collections with the Web API. By default, it is the same as the logical collection name.
  • DisplayName: Typically, the same as the schema name, but can include spaces. i.e. Account
  • DisplayCollectionName: A plural form of the Display name. i.e. Accounts
  • Description: A short sentence describing the entity 
And that's where I was confused the most, let me give a few examples.
If I want to extract the CRM system [role] entity via ADF data connector then I will need to look for [Security Role]:


and if the point of my interest would be to check the internal system [audit] dataset, then this could only be possible if I search for the [Auditing] entity in ADF data connector:


Strange world, I know :-), and the XrmToolBox is a great tool to explain the reason seeing different names for entities. Go ahead install it, and don't forget to enable "Metadata Browser" plugin there.

It will show you that whoever worked on ADF data connector for CDS had used "DisplayName" attribute for referencing entities.




One more tip: if you try to search for particular CDS entities in Power BI, then you will need to use the "EntitySetName" attribute value to locate them, it took me some time to understand this as well :-(

The rest is easy!

(A) Creating CDS Linked Service in ADF
To create a linked service to CDS Dynamics online using Office365 authentication you will need to set several properties: (1) Service Uri, (2) Authentication type, (3) User name, (4) Password.




(B) Creating CDS entity dataset in ADF
Then you have a choice either to create a direct reference to CDS entity (table) at the dataset level in Azure Data Factory by selecting its name in a dropdown menu; don't forget that "DisplayName" values will populate this list.

(C) Submitting dynamic XML fetch queries to CDS via Copy Data ADF activity


Or get more flexibility and submit a Fetch XML query to your CDS Dynamics 365 data repository. In this particular case, I'm attempting to extract all System [audit] entity records generated in March of 2020 and later.

Like I said at the beginning of my blog post that little did I know about Azure Data Factory support of Common Data Service (CDS) connectivity to Dynamics 365 data. It's flexible and it works! 

My next request for such flexibility is to have support for Common Data Models (CDM) as a sink in Azure Data Factory. I hope it will become a (Private Preview > Public Preview > General Availability) reality one day! :-)

Comments

  1. Is it possible to use Azure Data Factory in place of an SSIS package for integration? I haven't messed around with it yet but I wonder how it compares to SSIS for ETL.

    ReplyDelete
    Replies
    1. I haven't tested it yet, but you can execute SSIS packages within your Azure Data Factory as well: https://docs.microsoft.com/en-us/azure/data-factory/how-to-invoke-ssis-package-ssis-activity

      Delete
  2. Is it possible to update XML query to have dynamic value (using variable) for createdon field?

    ReplyDelete
    Replies
    1. Absolutely, the Query is an open field and it should allow to pass variables or expressions instead of hard-coded values.

      Delete
    2. Hi Rayis imayec : Can you please give an example to pass the data flow parameters to fetch xml. We are using this FetchXML query :
      Dataflow parameters: startdate and enddate with values something in format like "2015-01-01"




      We see the below error.
      The date-time format for $startdate is invalid, or value is outside the supported range."}}))

      Delete
    3. (1) Get your hard-coded XML query working first to make sure that you can pass it to your CDS store, (2) Then paste the query content without any changes to a variable and then use this variable as source for your Data Copy task, (3) Then adjust the value of this variable and construct it with the help of additional parameters, it is less CDS XML task, but more a correct work with variables in ADF. Please check my other blogs on that.

      Delete
  3. It's possible to delete/clear dataverse data before copy data activity ? because i think when copying data each time the pipeline will insert and add the same rows

    ReplyDelete
    Replies
    1. You can always create any steps in your incremental ETL pipeline, whether to bring new data only or delete some of the data in your target environment before running the Data Copy activity. You always have a choice.

      Delete
  4. Can you share please how can i delete dataverse in ADF, which activity to use ?

    ReplyDelete
    Replies
    1. You can create a Logic App to modify data in CDS and then call this Logic App via Web Activity task in ADF if you want to.

      Delete
  5. Hello i have a for each activity with Copy data inside, and i want to have a continious insertion in my target table, but what i discrovered is once the for each switch to the second line, it's replace each tima the record already exist in my target table

    ReplyDelete
    Replies
    1. Then you it's less about CDS dataset and ADF framework to create your ETL pipeline. You need to design a logic how you data needs to flow from a source to destination and what should happen with the data that already exists in your destination environment. Once all these logical rules are defined, then you can decided what tools you can use to create your ETL process, if it can be ADF, so be it; if it can be something else, it's totally your choice.

      Delete
  6. Hello there! We're trying to move from SSIS to ADF for Dynamics integration. When inserting records into Dynamics using SSIS (account entity), we receive as a response the newly created records with the GUID assigned by Dynamics (which we save in a staging database). Is there any way in ADF to receive that response as well, using the Copy Activity or Dataflow? Many thanks!!

    ReplyDelete
    Replies
    1. Hello, I haven't worked on the data push scenario to CDS using ADF. The MSFT documentation doesn't provide much information about the output properties for the CDS sink operations. It hard to tell if any operational IDs are returned without trying. I would suggest to run simple write operation in ADF to CDS first and validate the output of the Data Copy activity task.

      Delete
  7. Hi, when I am using foreach loop to bring multiple tables with one copy activity, many of the columns are missing and new false columns are getting generated. Any help on this ?

    ReplyDelete
    Replies
    1. Each of the CDS tables is a unique entity, I would be very careful building a generic ETL approach and loop it through in the ForEach container. Check individual CDS tables, validate their schema with your office 365 admins and decide how to build your queries.

      Delete

Post a Comment