This comparison of how Apache Kafka and Solace PubSub+ each enable event streaming across multi-site systems is part of a series of posts comparing Kafka with PubSub+. Check out the rest of the posts in the series:
Years ago, it was common for small or medium-sized organizations to host all of their IT systems in a single datacenter. Large enterprises were the only ones who wanted and could afford to run systems in two or more geographically dispersed datacenters. Having a secondary datacenter allowed enterprises to direct load to the closest datacenter to approve their ability to recover from system or network failures, natural disasters, etc., all with the goal of business continuity in mind.
Nowadays, with the boom of cloud computing, it has become a very common pattern for organizations to use cloud-native services that, among many other things, enable them to start with a small footprint and scale quickly as their needs increase. As a result, even small businesses deploy their applications and microservices across a mixture of datacenters and/or cloud regions of one or many cloud vendors.
Consider the example of an equities broker that trades on markets in New York, London, and Tokyo. Having just one datacenter in one of these 3 locations would introduce unacceptable latency and bandwidth costs for operations that span two or all three locations.
Having datacenters in each of the locations would greatly reduce latency between the algorithmic engines and the local exchanges, but would still pose a challenge for their next generation retail trading platform that needs to be able to dynamically scale as demand for it grows:
Multi-site architecture can help them maintain low latency, keep bandwidth costs in check, and restrict/secure access for sensitive on-premises services, all while take advantage of the scalability of the cloud for public-facing services like a retail trading platform.
Multi-site architecture offers several advantages over the legacy-centralized datacenter pattern, but it also introduces new challenges for the IT architects who must make use of the right technologies to ensure that the overall system remains optimal in multiple aspects (like latency or bandwidth usage), without having to sacrifice ease of deployment and management.
Apache Kafka recommends the deployment of an external component called MirrorMaker for communications between two applications across two different locations. MirrorMaker allows a Kafka cluster to asynchronously “replicate” unidirectionally selected topics to another Kafka cluster (“active-active” model). A “stretched cluster” model that doesn’t require the use of MirrorMaker is also available, nevertheless but it’s not recommended due to the high latency intrinsic to synchronous replication between brokers within the same Kafka cluster but located on separated geographical regions.
Here are some highlights:
Solace PubSub+ uses an approach called dynamic message routing (DMR), to route messages between the various messaging nodes – whether they’re in the cloud or on premises. Data is only routed where and when it is requested, rather than being replicated to every location regardless of need. The routing behavior is coordinated by the brokers. No additional applications or components are necessary.
Here are some highlights:
Both Kafka and Solace provide mechanisms for implementing multi-site systems , but it’s important to understand how each does so, including the implications of on performance, simplicity, reliability, and manageability.
Some of the Solace PubSub+ Event Broker features that enable multi-site architecture (like DMR and message looping detection) require add-ons or workaround configurations when using Apache Kafka, which can increase the overall event streaming platform complexity, leading to a greater difference in both cost and risk in the long run.