(2025-July-06) Very often, real-time, high-speed streams of events come from IoT devices, social media logs, website user interactions, and financial transactions. So, working with streaming datasets can place you in a different category of data engineering professionals, where you tend to understand the difference between Lambda and Kappa architectures, Kafka is more than a book writer to you, and topics are not only found in human conversations.
In my post last week (https://datanrg.blogspot.com/2025/06/salesforce-cdc-data-integration.html), I talked about Salesforce Change Data Capture (CDC) event data streaming, where the initial event destination was file storage in Azure. But what if we anticipate a higher volume of incoming Salesforce source data or the addition of a new data feed? This could create the need for an alternative method of managing incoming events.
Image by Florian Kurz from Pixabay
Microsoft Fabric provides an easy way to incorporate streaming data flows into your organizational data ecosystem with the help of the Eventstream component. It allows you to visually create a streaming data flow by defining source data connections and destinations, along with data transformations if required by your business logic.
I don’t want to get into the territory of Microsoft’s clear and well-structured documentation on Eventstream in Fabric, so I’ll let you explore and learn more about it directly - https://learn.microsoft.com/en-us/fabric/real-time-intelligence/event-streams/overview?tabs=enhancedcapabilities
However, there are a few things I want to point out to help you decide whether to try Eventstream for your new or existing data streaming scenarios.
1) Existing Change Data Capture (CDC) source data connectors
If you’re familiar with the concept of CDC you’ll be pleasantly surprised by the number of built-in CDC connectors that Fabric Eventstream offers for easy integration; and, if there had been a Salesforce CDC connector in Fabric Eventstream, I wouldn’t have had to build the Python-based middleware within an Azure Function App that I described in my previous blog post.
2) Custom endpoint backed by Event Hub
As soon as a new Eventstream is created, an Event Hub is provisioned to support a custom endpoint connector within your streaming flow. While you can also access and connect to other external Event Hubs in Azure, among many other options (25 in total as I’m writing this post), the custom endpoint acts like your own invisible event hub repository, with all the bells and whistles of a regular Azure Event Hub. It has its own adjustable retention period, as well as the option to use the Apache Kafka endpoint on the Eventstream item. This enables you to connect and consume streaming events through the Kafka protocol without having to set up a Kafka cluster yourself.
For me, this is the biggest addition that makes Eventstream a very appealing offering for my new data engineering workloads!
Eventstream Retention Settings
Each Eventstream can have its own retention policy, allowing you to retain incoming data for a period between 1 and 90 days (1 day is the current default setting). Just click Settings at the Eventstream level and then choose Retention to specify your desired retention period.
Apache Kafka Protocol Support
Support for the Apache Kafka protocol in Eventstream is provided automatically. In addition to the usual Event Hub connectivity via SAS Key, for example:
Endpoint=sb://{cdc_fabric_eh_namespace}.servicebus.windows.net/;SharedAccessKeyName={cdc_fabric_eh_keyname};SharedAccessKey={cdc_fabric_eh_key};EntityPath={cdc_fabric_eh_hubname}
you can simply switch to the Kafka tab to get your Eventstream Kafka endpoint information.
The ability to connect to the same event repository via Event Hub, AMQP, or Kafka protocols is similar to using different charging adapters, providing multiple ways to establish a reliable flow, depending on your needs.
3) Streaming Data Insights
There is a saying about three things people like to watch: how a river flows, how a fire burns, and how other people count their money. Putting aside the humour in this observation, the idea of an ongoing stream, like an event flow, naturally sparks curiosity to watch it visually. Sometimes this is simply to validate the incoming data or to confirm that any subsequent transformations are processing and forwarding the data correctly.
Data Insights In
Data Insights Out
Salesforce CDC Events Streaming to Fabric
from azure.eventhub import EventHubProducerClient, EventData def publish_events(events: List[Dict[str, Any]]) -> None: #Safely publish events to Azure Event Hub without duplication. if not events: logger.info("No events to publish") return producer = EventHubProducerClient.from_connection_string(EH_CONNECTION_STR) try: event_batch = producer.create_batch() for event_body in events: try: serialized = json.dumps(event_body) payload_size = len(serialized.encode("utf-8")) logger.info("Event payload size: %d bytes", payload_size) event_data = EventData(body=serialized) try: event_batch.add(event_data) except ValueError: # Batch is full, send it logger.info("Sending full batch with %d events", len(event_batch)) producer.send_batch(event_batch) # Try adding to new batch event_batch = producer.create_batch() try: event_batch.add(event_data) except ValueError: logger.critical( "Event too large (%d bytes) for empty batch. Skipping.", payload_size, ) continue # Skip this event except Exception as e: logger.error("Error processing event: %s", str(e)) continue # Send any remaining events if len(event_batch) > 0: logger.info("Sending final batch with %d events", len(event_batch)) producer.send_batch(event_batch) except Exception as e: logger.error("Publishing failed: %s", str(e)) raise finally: producer.close() logger.info("Producer connection closed.")
Sometimes, when you complete an interaction with a service provider, whether it’s your local bank or an insurance company, you’re given the option to fill out a survey or provide feedback on the service you received. Very often, part of this feedback format includes a question about your willingness to recommend their service to others.
In the same way, I can imagine giving feedback to myself: Will I be using Eventstreams in the future? Maybe. Would I recommend Eventstream to others? Probably. The only major constraint here is the requirement to have an existing Fabric workspace; otherwise, a similar streaming architecture could be implemented using a combination of connected components. The thing I liked most about Eventstream in Fabric is the ability to create custom endpoints, where you can have built-in Event Hubs in your stream flow that also support Kafka protocols if needed.
Please also check out the good section of tutorials for designing solutions with Eventstream in Fabric from Microsoft. Good technology is nothing without a good set of use cases, and when you can learn from others, it's gold!
Comments
Post a Comment