This comparison of how Apache Kafka and Solace PubSub+ each enable event streaming across multi-site systems is part of a series of posts comparing Kafka with PubSub+. Check out the rest of the posts in the series:
- Solace PubSub+ vs Kafka: The Basics
- Solace PubSub+ vs Kafka: Filtering
- Solace PubSub+ vs Kafka: Implementation of the Publish-Subscribe Messaging Pattern
- Solace PubSub+ vs Kafka: High Availability
Why is Multi-Site Architecture Important?
Years ago, it was common for small or medium-sized organizations to host all of their IT systems in a single datacenter. Large enterprises were the only ones who wanted and could afford to run systems in two or more geographically dispersed datacenters. Having a secondary datacenter allowed enterprises to direct load to the closest datacenter to approve their ability to recover from system or network failures, natural disasters, etc., all with the goal of business continuity in mind.
Nowadays, with the boom of cloud computing, it has become a very common pattern for organizations to use cloud-native services that, among many other things, enable them to start with a small footprint and scale quickly as their needs increase. As a result, even small businesses deploy their applications and microservices across a mixture of datacenters and/or cloud regions of one or many cloud vendors.
Consider the example of an equities broker that trades on markets in New York, London, and Tokyo. Having just one datacenter in one of these 3 locations would introduce unacceptable latency and bandwidth costs for operations that span two or all three locations.
Having datacenters in each of the locations would greatly reduce latency between the algorithmic engines and the local exchanges, but would still pose a challenge for their next generation retail trading platform that needs to be able to dynamically scale as demand for it grows:
Multi-site architecture can help them maintain low latency, keep bandwidth costs in check, and restrict/secure access for sensitive on-premises services, all while take advantage of the scalability of the cloud for public-facing services like a retail trading platform.
Multi-site architecture offers several advantages over the legacy-centralized datacenter pattern, but it also introduces new challenges for the IT architects who must make use of the right technologies to ensure that the overall system remains optimal in multiple aspects (like latency or bandwidth usage), without having to sacrifice ease of deployment and management.
How Does Kafka Enable Multi-Site Architecture?
Apache Kafka recommends the deployment of an external component called MirrorMaker for communications between two applications across two different locations. MirrorMaker allows a Kafka cluster to asynchronously “replicate” unidirectionally selected topics to another Kafka cluster (“active-active” model). A “stretched cluster” model that doesn’t require the use of MirrorMaker is also available, nevertheless but it’s not recommended due to the high latency intrinsic to synchronous replication between brokers within the same Kafka cluster but located on separated geographical regions.
Here are some highlights:
- MirrorMaker is an external component to the Kafka cluster.
- MirrorMaker 2 is based on the Kafka Connect framework.
- One Connect cluster is needed between a pair of datacenters (unidirectional communication).
- In order to achieve bi-directional communication (replicate remote topics to the local cluster), instances of MirrorMaker should also be deployed on the remote Kafka cluster (remote-consume and local-produce is the recommended approach).
- High availability is achieved by deploying at least 2 instances of MirrorMaker within the Connect Cluster.
- Message offsets are maintained amongst Kafka clusters (Only using MirrorMaker2).
- Topic Configurations are sync’d Partitions, ACLs (Only using MirrorMaker2).
- High potential for cyclic repetition of data during bidirectional mirroring, on topics that are named the same on both clusters.
- Eventual consistency due to asynchronous mirroring between clusters
- Possible data loss in case of a cluster failure due to asynchronous mirroring
- MirrorMaker message looping prevention logic is achieved by renaming topics prepending the name of the source site to them, an action that leads to tightly coupling producers and consumers to the topology.
How Does Solace Enable Multi-Site Architecture?
Solace PubSub+ uses an approach called dynamic message routing (DMR), to route messages between the various messaging nodes – whether they’re in the cloud or on premises. Data is only routed where and when it is requested, rather than being replicated to every location regardless of need. The routing behavior is coordinated by the brokers. No additional applications or components are necessary.
Here are some highlights:
- DMR is an internal feature of the event broker.
- A single DMR link allows bi-directional communication.
- Subscriptions to topics are automatically propagated through all the Solace brokers.
- DMR HA is built in into the Solace HA Group.
- Data loss can be avoided with the use of persistent messaging even on scenarios of an HA Group failure.
- Built-in message looping prevention logic – Cyclic repetition is not possible even in topics that are named the same in both clusters, and topic renaming is not required at all.
Conclusion
Both Kafka and Solace provide mechanisms for implementing multi-site systems , but it’s important to understand how each does so, including the implications of on performance, simplicity, reliability, and manageability.
Some of the Solace PubSub+ Event Broker features that enable multi-site architecture (like DMR and message looping detection) require add-ons or workaround configurations when using Apache Kafka, which can increase the overall event streaming platform complexity, leading to a greater difference in both cost and risk in the long run.