If you can’t route messages where they need to go or if you can’t filter to route only the messages your application requires or if you can’t maintain temporal order within a flow, then you should probably look for a different messaging system.
In many use cases, a messaging system that only supports exact topic match pub/sub is not sufficient because different types of consumers want to receive events based on different criteria depending on the application – not only on a fixed topic. Solace PubSub+ messages brokers handle this with hierarchical topics and wildcard subscriptions – Apache Kafka does not. Apache Kafka wildcard is simply a periodic check for new pattern match topics to subscribe to, i.e. if there are 100 topics that match the pattern, Apache Kafka injects 100 subscriptions where as Solace would inject a single subscription.
Most distributed applications also have a need for request/reply messaging – pub/sub alone is not sufficient. PubSub+ supports request/reply whereas Apache Kafka does not.
A summary comparison of routing features is provided in this table and explained in more detail below.
You can find a few examples here of applications and how they use message routing features.
Security | Apache Kafka | Solace |
---|---|---|
Message Routing | Lightweight, stateless topics Hierarchical topics Filtering within a topic Queue-based routing Filtering within a queue | Lightweight, stateless topics Hierarchical topics Filtering within a topic Queue-based routing Filtering within a queue |
Subscriptions | Exact topic match subscriptions Wildcard subscriptions | Exact topic match subscriptions Wildcard subscriptions |
Message Exchange Patterns | Publish/Subscribe Point to Point Request/Reply | Publish/Subscribe Point to Point Request/Reply |
Qualities of Service | Persistent Non-persistent | Persistent Non-Persistent |
In publish-order delivery across topics | ||
Native support for topic partitioning |
Apache Kafka Message Routing, Filtering, Ordering
Within a Apache Kafka cluster, each topic is stored in a partitioned log that looks like this:
Apache Kafka topics are flat (no hierarchy) and consist of coarse constructs that are defined on the Apache Kafka broker and consume state on the broker. Topics are expected to be so coarse that Apache Kafka provides a mechanism to split a single topic into multiple partitions, where each partition of a topic can be served by a different Apache Kafka broker for scalability. Partitions do not allow filtering/routing, they provide a load distribution capability. Order can only be maintained within a topic partition.
Each topic partition uses a directory of files created by the broker to store its messages. These partition files are treated as “log files” in that new messages are appended to the end, they are closed at a given size and a new one started, they are deleted over time by age or data size policies – whether all consumers have received the messages or not. There is no monitoring or alerting of “disk full” state by Apache Kafka.
There is discussion of “wildcard subscriptions” in Apache Kafka, but they don’t do what most of us might expect a typical messaging system to do: Apache Kafka wildcards allow a consuming application to discover newly created matching topics by periodically making metadata requests for a topic list, filtering for topics matching their wildcard subscription, detecting new matching topics then issuing new subscriptions for those topics. This is all the consumer’s responsibility, not the broker’s, and is really a management discovery function – not a message routing function. Most developers would expect wildcard subscriptions to be matched by the broker in real-time against newly-published messages and cause messages with matching topics to be immediately routed to the consumer without the consumer doing anything – as supported by MQTT, AMQP and many JMS providers.
This is another example of the simple broker, sophisticated client architecture of Apache Kafka: literal topic matching requires much less processing than matching large quantities of per-consumer wildcard subscriptions but provides much less functionality to the consumer application.
As a result of the scalability challenges with topics and the lack of wildcard subscriptions, topics are not expected to contain dynamic data such as customerID, orderID, pokerTableNum since that would explode the topic space (and directories, files & file handles) and therefore clients cannot filter on this type of information. You might not need this filter with big data, but for many other use cases you do.
Since each topic partition results in a directory of files, more topics or partitions lead to other scalability issues such as increased unavailability, increased end-to-end latency, fewer client connections and so requires significant planning. To overcome this limitation Confluent recommends using KSQL, Apache Kafka Stream or Single message transforms to filter received messages on message meta-data. This is not an efficient use of network bandwidth as you will receive and discard. It is also extra load on the consumer applications as they will be required to receive, filter, messages that they do not have interest in.
Partitions are used to horizontally scale consumer processing of an event stream. An application that needs to horizontally scale to, say, 5 instances requires that there be at least 5 partitions for each topic this application consumes defined in the broker. If you have 5 partitions now and another (slower) application requires 10 then that doesn’t work. You need to repartition to create 10 partitions for each topic. The application with 5 instances would consume from the 10 partitions in some potentially uneven manner. Adding or rebalancing partitions is a complex operational activity with Apache Kafka.
Queue-based addressing is not supported by Apache Kafka – only topics.
Non-persistent messaging is not supported by Apache Kafka since all published messages are always written to disk. As a result, Apache Kafka is not appropriate for applications that require low, deterministic latency, such as market data and odds distribution or some status update use cases.
Filtering messages within a topic, as is provided by JMS Selectors, is not supported and is not possible without significant performance impact due to the datapath architecture.
Solace PubSub+ Message Routing, Filtering, Ordering
PubSub+ supports topics for pub/sub, queues (a la JMS) for point-to-point, a special topic for request/reply as well as filtering within a topic or queue using JMS SQL92 selectors on message properties.
PubSub+ topics are byte strings that can have a hierarchical structure, they are dynamically created by publishers and are not defined on the message router – so they consume no state in the system. This means they can be populated with completely dynamic information such as orderID or clientID or “pass/fail” or other metadata describing the event that consumers can filter on. If you like you can literally have tens of millions of topics – like database keys in a real-time flow of information.
Subscriptions can have various forms of wildcards or can specify an exact topic. PubSub+ message brokers match messages against subscriptions in real-time as messages enter the broker and queue them to consumers without involvement from the consumer.
So, when you have a dynamic data field in your topic, such as OrderID, then if you are monitoring events on a particular order you include that OrderID in your subscription; if you don’t care about a particular order you would wildcard it with “*”.
Since publishers deliver all messages they publish to a single message router (not to several message routers based on the topic+key of the message) and because consumers can have a single “queue” to which they add many subscriptions for the messages they want to receive, message delivery is always in publish-order even when messages are published with different topics – where these different topics allow for different types of consumer filtering. The value of this in EDAs is explained further below. The combination of hierarchical topics that consume no state along with wildcard subscriptions and in-publish order delivery enable fine-grained message routing: publishing applications “annotate” the messages they send with rich metadata about this event and subscriber applications filter using a variety of criteria on the metadata to receive only the messages they want. This is critical in event-driven and microservices architectures.
You can find more details on PubSub+ topics here.
You can horizontally scale a consumer application without needing to create partitions and in a way that doesn’t affect any other consumers or publishers using non-exclusive queues. Non-exclusive queues act like an HTTP load balancer in that messages that enter the queue due to its configured topic subscriptions are distributed in a load balanced manner to all consumers bound to the queue. Need more capacity for a given app? Bind more application instances to the queue without any message router changes. Decreasing the number of consumers on the queue results in the load being distributed to the smaller set of consumers — still in a load-balanced manner. One queue could have 10 consumers, another 5, another 1 — no changes to the message router, no rebalancing.
Need to parallelize delivery to consumers in a group but maintain order within a key? No problem: put the hash of the key into an element of the topic and use subscriptions to distribute these hash values among exclusive queues to the consumers with complete control over the hash and grouping to consumers.
Application Examples of Message Routing
Let’s map a few application examples to these routing capabilities.
Capital Markets Post Trade Pub/Sub Event Bus
In Equities platforms in capital markets, trading applications in the Font Office generate order and trade events which need to be distributed to post trade applications. The consuming applications all want some subset, often a different subset, of the event flow depending on their function.
A PubSub+ topic structure might be something like this:
region/OMS-ID/DeskID/symbol/event/Cust-OrderID
Where:
Region
is one of{NY, LDN, SING, TKY}
OMS-ID
is the ID of the Order Management System that generated the eventDeskID
is the ID of the trading desk that originated the original orderSymbol
is the stock symbolEvent
is one of{NO (New Order), MOD (Modify), CAN (Cancel), FIL (Fill)}
Cust-OrderID
is the customer’sorderID
Every event published onto the post trade bus contains a topic with this structure. Now let’s look at some typical consumer filtering needs:
- A head trader keeping a blotter of all trade activity for his trading desks, where each desk starts with
DesktID = EQ-DSK-5
, subscribe to:
*/*/EQ-DSK-5*/*/*/*
Note that in this case, even though the above subscription receives messages from many different topics due to the use of wildcards, they are received at the consumer in the order they were published by the publisher. Very important since the sequence New Order, Modify, Cancel produced different results than New Order, Cancel, Modify!
Also, this subscription captures events for all orders from EQ-DSK-5* with a single subscription regardless of how many desks there are.
- We have a reporting function that only wants to process Fill events from NY and scales horizontally by striping the data by the first letter of the symbol. To receive Fill events for all equity symbols starting with A from the NY region, the subscription is:
NY/*/*/A*/FIL/*
- Monitor all new order activity for FaceBook on the day of its IPO:
*/*/*/FB/NO/*
- To monitor fill events for a particular customer order:
*/*/*/*/FIL/my-order-ID
With Apache Kafka, how would you satisfy all these diverse filtering needs on a flat topic structure with no wildcard subscriptions?
- Topic = region; key = symbol does not allow filtering on the symbol – just partitioning of all events for a region across all symbols in uncontrolled groupings
- Topic = region+symbol, would require consumers to subscribe to each and every symbol without missing any. With 3,100 symbols on Nasdaq and 2,800 on NYSE alone, IPOs, de-listings, etc. this is an operational challenge
And still in all these cases, you cannot filter on the originating OMS, Desk, event type or customer order ID or any other aspect of the data.
Payment Platform EDA
A payment platform could have a simple topic structure like:
Event/Acct ID/merchant ID
Where
Event
is one of {AO (account open), AC (account close), PA (payment authorization), PM (payment), PD (payment declined)}Acct ID
is the customer account IDmerchant ID
is the ID of the merchant where the event is taking place, or 000000 if none applies
- To receive all account open events to send to marketing, use
AO/*/*
orAO/>
- To monitor all activity of a client on a watch list
*/my-client-account/*
Note that this filtering is different than partitioning all data by client-account.
- To monitor the activity of a particular merchant:
*/*/merchantID
Again with Apache Kafka, such routing and filtering is not possible.
Conclusion
The hierarchical topics with wildcard subscriptions provided by PubSub+ let different consumers have different filtering criteria on a single event stream so they can receive only and exactly the messages they are interested in – which goes far beyond the ability to subscribe to a fixed topic string, as provided by Apache Kafka. Combined with queues and message selectors, PubSub+ provides much richer message routing and filtering capabilities than Apache Kafka.