This comparison of how Apache Kafka and Solace PubSub+ each enable event streaming across multi-site systems is part of a series of posts comparing Kafka with PubSub+. Check out the rest of the posts in the series:

Why is Multi-Site Architecture Important?

Years ago, it was common for small or medium-sized organizations to host all of their IT systems in a single datacenter. Large enterprises were the only ones who wanted and could afford to run systems in two or more geographically dispersed datacenters. Having a secondary datacenter allowed enterprises to direct load to the closest datacenter to approve their ability to recover from system or network failures, natural disasters, etc., all with the goal of business continuity in mind.

Nowadays, with the boom of cloud computing, it has become a very common pattern for organizations to use  cloud-native services that, among many other things, enable them to start with a small footprint and scale quickly as their needs increase. As a result, even small businesses deploy their applications and microservices across a mixture of datacenters and/or cloud regions of one or many cloud vendors.

Consider the example of an equities broker that trades on markets in New York, London, and Tokyo. Having just one datacenter in one of these 3 locations would introduce unacceptable latency and bandwidth costs for operations that span two or all three locations.

Having datacenters in each of the locations would greatly reduce latency between the algorithmic engines and the local exchanges, but would still pose a challenge for their next generation retail trading platform that needs to be able to dynamically scale as demand for it grows:

multi-site architecture with PubSub+

Multi-site architecture can help them maintain low latency, keep bandwidth costs in check, and restrict/secure access for sensitive on-premises services, all while take advantage of the scalability of the cloud for public-facing services like a retail trading platform.

Multi-site architecture offers several advantages over the legacy-centralized datacenter pattern, but it also introduces new challenges for the IT architects who must make use of the right technologies to ensure that the overall system remains optimal in multiple aspects (like latency or bandwidth usage), without having to sacrifice ease of deployment and management.

How Does Kafka Enable Multi-Site Architecture?

Apache Kafka recommends the deployment of an external component called MirrorMaker for communications between two applications across two different locations. MirrorMaker allows a Kafka cluster to asynchronously “replicate” unidirectionally selected topics to another Kafka cluster (“active-active” model). A “stretched cluster” model that doesn’t require the use of MirrorMaker is also available, nevertheless but it’s not recommended due to the high latency intrinsic to synchronous replication between brokers within the same Kafka cluster but located on separated geographical regions.

Here are some highlights:

  • MirrorMaker is an external component to the Kafka cluster.
  • MirrorMaker 2 is based on the Kafka Connect framework.
  • One Connect cluster is needed between a pair of datacenters (unidirectional communication).
  • In order to achieve bi-directional communication (replicate remote topics to the local cluster), instances of MirrorMaker should also be deployed on the remote Kafka cluster (remote-consume and local-produce is the recommended approach).
  • High availability is achieved by deploying at least 2 instances of MirrorMaker within the Connect Cluster.
  • Message offsets are maintained amongst Kafka clusters (Only using MirrorMaker2).
  • Topic Configurations are sync’d Partitions, ACLs (Only using MirrorMaker2).
  • High potential for cyclic repetition of data during bidirectional mirroring, on topics that are named the same on both clusters.
  • Eventual consistency due to asynchronous mirroring between clusters
  • Possible data loss in case of a cluster failure due to asynchronous mirroring
  • MirrorMaker message looping prevention logic is achieved by renaming topics prepending the name of the source site to them, an action that leads to tightly coupling producers and consumers to the topology.

How Does Solace Enable Multi-Site Architecture?

Solace PubSub+ uses an approach called dynamic message routing (DMR), to route messages between the various messaging nodes – whether they’re in the cloud or on premises. Data is only routed where and when it is requested, rather than being replicated to every location regardless of need. The routing behavior is coordinated by the brokers. No additional applications or components are necessary.

Here are some highlights:

  • DMR is an internal feature of the event broker.
  • A single DMR link allows bi-directional communication.
  • Subscriptions to topics are automatically propagated through all the Solace brokers.
  • DMR HA is built in into the Solace HA Group.
  • Data loss can be avoided with the use of persistent messaging even on scenarios of an HA Group failure.
  • Built-in message looping prevention logic – Cyclic repetition is not possible even in topics that are named the same in both clusters, and topic renaming is not required at all.

Conclusion

Both Kafka and Solace provide mechanisms for implementing multi-site systems , but it’s important to understand how each does so, including the implications of on performance, simplicity, reliability, and manageability.

Some of the Solace PubSub+ Event Broker features that enable multi-site architecture (like DMR and message looping detection) require add-ons or workaround configurations when using Apache Kafka, which can increase the overall event streaming platform complexity, leading to a greater difference in both cost and risk in the long run.

Manual Moreno
Manuel Moreno

As a solutions architect for Solace, Manuel helps customers analyze, design, and implement their digital transformation projects, specializing in event-driven and hybrid cloud architecture. Manuel has over 16 years of experience with real-time systems in the financial & equity market.

Prior to his current role at Solace, he spent 4 years on the Professional Service Team, and previously he spent 10 years working on the algo trading team of one of the 10 largest investment banks in the world.

Manuel holds a Bachelor of Computer Science Engineering degree from the Universidad Nacional Autonoma de México. When he's not designing next-generation real-time systems, he enjoys playing tennis and scuba diving.