The Microsoft Azure Incubations team recently unveiled Drasi, an innovative event-driven, open-source data processing system that takes a novel approach to tracking and processing rapidly changing data.

To track and process rapidly changing data, developers frequently poll the database to respond to changes. For example, when multiple temperature sensors’ telemetry data is ingested into a time-series database, a query must be run on a regular basis to track it. Similarly, an e-commerce application may need to trigger an event when a large transaction occurs. To track changes in both scenarios, the original database source must be polled on a regular basis. This puts additional strain on the database, which is already under stress from the frontend application’s intensive read/write operations.

Drasi takes a different approach to data collection by monitoring the logs generated by the database system. Every operation on a table, such as an insert, update, or delete, generates a log entry that is stored externally. Rather than querying the source, Drasi tracks changes using database logs. Because read operations are performed on log entries, the source database is less burdened.

Drasi has three key components:

Sources

This continuously monitors for critical changes by connecting to a variety of data sources within your systems. A source monitors system metrics, database updates, or application logs and collects pertinent information in real time.

Continuous Queries

Drasi employs continuous queries in place of manual, point-in-time queries, in order to continuously assess incoming changes in accordance with predetermined criteria. These queries, which are written in Cypher Query Language, integrate data from multiple sources without needing prior collation.

Reactions

Drasi runs registered automated reactions when changes finish a continuous query. These reactions can trigger a workflow that can send alerts, update other systems, or take corrective action.

In essence, Drasi converts the logs into a graph that can be queried continuously using CQL. It supports the Azure Cosmos Gremlin API database and PostgreSQL in the initial release, with additional sources to be added later.

Similar to the source databases, Drasi also supports a variety of reaction providers that include Azure EventGrid, SignalR and Debezium, an open source distributed platform for capturing changes in data.

What’s intriguing is that Drasi runs on a Kubernetes cluster with dependencies on Dapr, Redis and MongoDB. The definitions of sources, continuous queries and reactions are applied through a YAML specification to Kubernetes. There is also a command-line interface to deal with the Drasi controller running in Kubernetes.

Microsoft has already submitted Drasi to the Cloud Native Computing Foundation as a Sandbox project. The source code is available on GitHub and the documentation along with tutorials is available on the Drasi website.

It is worth noting that Microsoft has developed a platform for converting system logs and audit trails into a graph database and extending that to an event-driven environment. This opens up new opportunities for developers to create modern data processing systems based on large language models. Drasi has the potential to play an important role in RAG by providing real-time data changes as context for LLMs, enabling users and AI agents to make better decisions.

Share.

Leave A Reply

Exit mobile version