Kafka was born as a stream processing project of LinkedIn’s engineering team, so a recent blog post by two current members of that team titled, How LinkedIn customizes Apache Kafka for 7 trillion messages per day, caught my attention. In it, the authors give a detailed look at what it takes to keep a large-scale Kafka system running, and they talk about some steps they’ve taken to address scalability and operability challenges.
In short:
The originators of Kafka have found it necessary to customize and create additional tooling to achieve the scalability they need in today’s technology landscape, which has changed a lot since its inception. The reality is that there are other ways to optimize your system for unlimited scalability and easy operability.
Before I get to that, let’s take a look at Cruise Control, which on the surface seems like a cool tool that niftily rebalances topics for you. There are two aspects to rebalancing topics:
To summarize:
A smarter broker maintains state (such as “last event read”), which simplifies the overall architecture by allowing for stateless clients (desirable for those building microservices). The trade-off is a more complex broker, which must maintain state.
How does this affect scaling? Well, the broker must replicate state. Kafka is already replicating state (the topic), so that isn’t a problem. Obviously, the clients need to connect to wherever this state is held.
This does, however, introduce the need to constantly re-distribute the load (hence Cruise Control), and makes monitoring and management tricky. How do I know how many extra brokers I will need? What is the granularity of my load? How do I ensure my replica is not in the same availability pool?
It’s better to take a more static approach: replicating to a known hot spare. Note that this does not preclude clustering; the hot spare is simply for data replication, not broker load balancing. This approach means that state-sharing isn’t a problem since the clients simply re-connect to the hot spare. Unfortunately, you need to provision twice as many brokers as you otherwise would.
One of the cited advantages of Kafka is its scalability. For instance, in this post on DZone, Matija Gobec describes Kafka’s scalability as “one of the key features” and notes that “this is where Kafka excels.” But as I discussed earlier, even the creators of Kafka found it necessary to create their own tools to address scalability.
Let’s take a step back and think about the title of their post from an architectural standpoint. 7 trillion events a day? That’s 81M events a second. Over Twitter’s 325M monthly active users, each would have to tweet every 4 seconds, every minute, every day. LinkedIn has 645M users, which means that during my 8-hour working day, I am responsible for nearly an event a second.
Does this correspond to 7T ingress events? In other words, are users generating all of these events directly? This doesn’t seem likely, so how can we account for this? Well, every interaction with LinkedIn probably generates a tidal wave of events – think of every mouseover, click, and update request.
Our first clue is the need to write events to remote clusters using Brooklin. In this case, a single ingress event may be counted multiple times. Check out the clue in their blog: LinkedIn uses 100,000 topics. Let’s explore that further.
100,000 topics sounds like a lot, but over 645 million users, that’s an extremely coarse-grained topic structure. I can only divide a stream over an average of 6,450 users. These topics are also static. Remember, I can’t easily use topics with dynamic data – such as a username, or a short-lived artifact like an update.
It might seem like this would work quite well if the topic acts as an aggregator – say a clickstream – since only one topic is needed no matter how many users there are. But what happens if you are looking for a specific, important event within that clickstream, such as delete my account? If I wish to create an application to monitor these special events, I have some choices to make:
It takes engineering work to balance loads when creating a new topic. You have to change publishers to duplicate publish to the new topic (which breaks decoupling), or have existing processors republish the specific events (which reduces less coupling but quickly multiplies the event stream). You could use an index key, but only have one key, which severely restricts the filtering this can help with, so you end up having to republish events multiple times. More republishing!
Monitoring the existing clickstream is clean architecturally, but it means your application is ignoring the majority of the events it receives.
Another problem with a static topic architecture is that publishers change over time, as do consumers. How do you define a static topic structure that accommodates the needs of your current applications and those of applications you haven’t thought of yet? The topics can be generic to allow wide re-use, but the expense of being generic is inefficiency.
What I’m describing here is topic filtering – a function of an implementation of the publish/subscribe pattern that allows subscribers to listen only to events of interest. The coarser a topic, the less likely they represent only events of interest, and static topics make filtering even more difficult.
Another function of publish/subscribe is topic routing – the ability to route events to specific remote destinations selectively. This is important for inter-cluster and inter-region communication because by selecting events you can ensure that only events that are needed get moved, which reduces load on brokers, networks, and clients. A static topic structure condemns us to duplication, republishing, or some other form of inefficiency.
You can also exchange subscription information among brokers, in much the same way that IP routing shares network segment information. With the right subscription propagation facility, the brokers can move events only to where they are needed. Load balancing is dynamic, so you can scale consumer counts and move consumers around brokers as needed.
A rich, hierarchical topic structure with wildcard filtering has proven to be easy to use, high performance, scalable, and robust. We know that a bank that tested Kafka’s static topic structure found that by moving to a dynamic topic structure their throughput and storage requirements were reduced by over 50x.
The conclusion I draw from this is that while the Kafka architecture addresses the need to scale throughput on a broker, it has ignored the need to scale the topic structure, which results in:
If the topic structure is to act as an aggregation channel, this trade-off works well. However, as derived events are generated in reaction to the aggregated stream, the trade-off breaks down. Scaling becomes non-linear as the number of events rises. For example, instead of having two brokers dealing with ten events, you have 10 Kafka brokers dealing with 100 events.
A dynamic, rich, hierarchical topic structure allows topic routing and filtering, which avoids the multiplication of events that you get with a flat topic structure. Such a hierarchical, dynamic structure does not, however, come without challenges.
Since topics are dynamic, it takes more up-front work to define a topic structure or schema that defines which fields go where in the topic hierarchy, what data is suitable for inclusion in the topic, what is not suitable, etc.
You also need to consider governance. You can’t point at a topic and decide who can and can’t access it; you need to use the topic structure to create rules and policies. This approach allows fine-grained control over who accesses what data so that effort yields more flexibility.
Lastly, the broker has to do more work, like perform wildcard and hierarchical topic lookups. Carefully constructed, this can be restricted to simple string matching, which can perform the majority of filtering required, including geo-fencing. And you must propagate subscription information around the brokers, a non-trivial task.
Put simply, dynamic, hierarchical topics add complexity and load to the broker and require careful architectural design (tooling for topic design is now emerging). Static, flat topic structures, on the other hand, simplify the broker but cost more in ancillary broker tooling and architectural complexity.
Kafka excels at high-rate data aggregation, but there are downsides to applying it to the wrong use cases. For example, microservices aren’t readily served by Kafka due to the client complexity, difficulties in scaling consumers across topic partitions, and state-keeping requirements.
It is tempting to use Kafka to distribute data derived from your aggregate stream, but that frequently amounts to moving dis-aggregated data, which Kafka isn’t good at. A smart broker takes derived data and routes it in a filtered manner to appropriate destinations, eliminating the multiplier effect that static, flat topic structures have on this kind of data. A hierarchical topic structure allows filters on multiple fields (instead of just a single key), which eliminates the event multiplier effect often seen with Kafka as developers struggle to make sure consumers get data of interest – which can lead to a hair-raising number of messages.
Perhaps the question architects and developers should be asking is not “How many messages can I move?” to determine scaling capabilities and requirements, but rather “How many messages should I be moving?”
Talk to Solace: we can help you understand this trade-off and show you how a smart event broker can optimize your architecture.