Architecture

Event-Driven Architecture explained

Let’s explore and analyse the pros and cons of event-driven architecture (EDA) and service-oriented architecture (SOA) to have a good foundation upon which we can build scalable and reliable distributed systems. Next time when we have to design a system, we will be qualified to make informed decision once we learn about EDA.

Introduction

Event-Driven architecture is a software architecture that promotes the production, detection, consumption and reaction to events. An event can be defined as a significant change in the applications state, which is simply a new information of something that has occurred. In other words an Event is a past action that other components reacts to it.

For example we could have an event called UserEvent.SignedIn that indicates that the user has signed in into the system, allowing consumers of such event to do something with this information, like increasing the number of users online.

Producers are the ones who are responsible for creating the events and they are decoupled from the consumers, that means they don’t know anything about them and vice versa. The producers writes the events to a message bus and consumers reads from it.

An event-driven architecture can be implemented on top of a publish-subscribe message paradigm or an event streaming fashion.

What problem does it solve?

Let’s take as a simple example an application that does nothing other than authenticate a User, the application does the following tasks:

  1. Receive HTTP request with user credentials
  2. Read user from the database to validate credentials
  3. Update user sign-in timestamp in the database
  4. Increase the number of users online
  5. Notify registered devices about login activity
  6. Return HTTP response

Let’s suppose that we have implemented each task into a separate microservice that communicate between each other over HTTP. What response code should we return to the user when the sign-in succeeds but the notification to registered devices fails? Do we return 200 (OK) and schedule step 5 to retry later? or do we return 500 (Internal server error) even though the sign-in action succeeded?

In an Events-Driven System, the events are the source of truth, as opposed to synchronous HTTP communication.

We should strive to design services with clear responsibilities and always plan for scalability and fault-tolerance, which are two of the advantages of microservices.

In order to decouple the functionalities, the sign-in service could become a Producer of a UserEvent.SigneddIn to be consumed by other features and react to such events, which inherently enables parallel processing.

In an event-driven application, it is essential to keep track of event IDs to uniquely identify them and event timestamps for observability purposes. Moreover, we could also add a CorrelationId, which will help us associate different system events in a particular transaction across services boundaries.

When to use it? When not?

Even though Event-driven Architecture is being used and embraced by big tech companies, like Microsoft and AWS, we as Software engineers, must be aware of the pros and cons when designing a system and assess whether a system architecture could help us deal with the business requirements.

For example, if we are designing a system for a client, we would probably get a service-level agreement (SLA), which could specify the minimum required throughput (eg. requests per second), the maximum downtime allowed (availability), etc.

When we talk about availability, reliability, scalability and so on, we are talking about a system quality attributes. Thus, if our example of a sign-in application can fulfill the functional and non-functional requirements and align with the SLAs using a monolithic approach, introducing an event-driven architecture will more likely be an over-kill. Moreover, the time and effort needed to build such a system could be better assigned to deliver business features.

We may consider EDA when the SLA is demanding. For instance, achieving high-availability such as 99.9% uptime, may require our system to be salable and tolerant to failures. Decoupling functionalities increases our chances of being SLA-compliant. Also auditability and observability could also be good reasons to adopt EDA.