Kafka was born as a stream processing project of LinkedIn’s engineering team, so a recent blog post by two current members of that team titled, How LinkedIn customizes Apache Kafka for 7 trillion messages per day, caught my attention. In it, the authors give a detailed look at what it takes to keep a large-scale Kafka system running, and they talk about some steps they’ve taken to address scalability and operability challenges.
In short:
- They created a tool called Brooklin to improve upon MirrorMaker, Kafka’s default cross-site replication tool.
- They created a monitoring and allocation tool called Cruise Control to make it easier to deal with daily broker failures.
- Finally, and ironically, they’ve written their own branch of Kafka.
The originators of Kafka have found it necessary to customize and create additional tooling to achieve the scalability they need in today’s technology landscape, which has changed a lot since its inception. The reality is that there are other ways to optimize your system for unlimited scalability and easy operability.
Losing Balance – A Cascading Failure
Before I get to that, let’s take a look at Cruise Control, which on the surface seems like a cool tool that niftily rebalances topics for you. There are two aspects to rebalancing topics:
- Consumer Scaling: Adding consumers to a consumer group to enable load balancing requires the consumer groups to have the topic partitions rebalanced to ensure the load is distributed to the consumers. This part of the Kafka architecture keeps the broker very simple, but it puts the emphasis on the client to maintain state and negotiate with other consumers to determine which partitions to consume from.
- High Availability: For high availability, should a broker fail, the broker load must be rebalanced to avoid “hot spots” of load throughout the cluster. If the rebalancing is done poorly, the failure of one broker causes an asymmetric distribution of load to the other brokers. This can cause one of the remaining brokers to become overloaded and fail, leading to a cascading failure. This is where Cruise Control comes in; I suspect LinkedIn found that with so many brokers, they were spending too much time rebalancing topics.
To summarize:
- The Kafka architecture simplifies the broker to increase performance and eliminate state on the broker;
- This forces application load-balancing logic onto the client, and the client manages state; and,
- Replica clustering with a static topic structure requires constant monitoring and maintenance of the cluster to ensure load is evenly balanced.
Alternative Architectural Approaches to Consider
A smarter broker maintains state (such as “last event read”), which simplifies the overall architecture by allowing for stateless clients (desirable for those building microservices). The trade-off is a more complex broker, which must maintain state.
How does this affect scaling? Well, the broker must replicate state. Kafka is already replicating state (the topic), so that isn’t a problem. Obviously, the clients need to connect to wherever this state is held.
- A smarter broker deals with client load balancing, which enables automated, dynamic application load balancing without any interruptions so there’s no more waiting for acquiesce.
- Replication clustering distributes data replication throughout brokers in a cluster, which improves flexibility because as long as the pool of brokers is available to absorb the load, replication can be distributed.
This does, however, introduce the need to constantly re-distribute the load (hence Cruise Control), and makes monitoring and management tricky. How do I know how many extra brokers I will need? What is the granularity of my load? How do I ensure my replica is not in the same availability pool?
It’s better to take a more static approach: replicating to a known hot spare. Note that this does not preclude clustering; the hot spare is simply for data replication, not broker load balancing. This approach means that state-sharing isn’t a problem since the clients simply re-connect to the hot spare. Unfortunately, you need to provision twice as many brokers as you otherwise would.
The Secret to Scaling: Number of Brokers or Number of Messages?
One of the cited advantages of Kafka is its scalability. For instance, in this post on DZone, Matija Gobec describes Kafka’s scalability as “one of the key features” and notes that “this is where Kafka excels.” But as I discussed earlier, even the creators of Kafka found it necessary to create their own tools to address scalability.
Let’s take a step back and think about the title of their post from an architectural standpoint. 7 trillion events a day? That’s 81M events a second. Over Twitter’s 325M monthly active users, each would have to tweet every 4 seconds, every minute, every day. LinkedIn has 645M users, which means that during my 8-hour working day, I am responsible for nearly an event a second.
Does this correspond to 7T ingress events? In other words, are users generating all of these events directly? This doesn’t seem likely, so how can we account for this? Well, every interaction with LinkedIn probably generates a tidal wave of events – think of every mouseover, click, and update request.
Our first clue is the need to write events to remote clusters using Brooklin. In this case, a single ingress event may be counted multiple times. Check out the clue in their blog: LinkedIn uses 100,000 topics. Let’s explore that further.
The Thing About Static Topic Architecture…
100,000 topics sounds like a lot, but over 645 million users, that’s an extremely coarse-grained topic structure. I can only divide a stream over an average of 6,450 users. These topics are also static. Remember, I can’t easily use topics with dynamic data – such as a username, or a short-lived artifact like an update.
It might seem like this would work quite well if the topic acts as an aggregator – say a clickstream – since only one topic is needed no matter how many users there are. But what happens if you are looking for a specific, important event within that clickstream, such as delete my account? If I wish to create an application to monitor these special events, I have some choices to make:
- I can have my application monitor all events;
- I could create a new topic to which my publishers must duplicate their publishing; or,
- I could have my existing clickstream applications republish the events of interest.
Republishing, Republishing, Republishing
It takes engineering work to balance loads when creating a new topic. You have to change publishers to duplicate publish to the new topic (which breaks decoupling), or have existing processors republish the specific events (which reduces less coupling but quickly multiplies the event stream). You could use an index key, but only have one key, which severely restricts the filtering this can help with, so you end up having to republish events multiple times. More republishing!
Monitoring the existing clickstream is clean architecturally, but it means your application is ignoring the majority of the events it receives.
Another problem with a static topic architecture is that publishers change over time, as do consumers. How do you define a static topic structure that accommodates the needs of your current applications and those of applications you haven’t thought of yet? The topics can be generic to allow wide re-use, but the expense of being generic is inefficiency.
What I’m describing here is topic filtering – a function of an implementation of the publish/subscribe pattern that allows subscribers to listen only to events of interest. The coarser a topic, the less likely they represent only events of interest, and static topics make filtering even more difficult.
Another function of publish/subscribe is topic routing – the ability to route events to specific remote destinations selectively. This is important for inter-cluster and inter-region communication because by selecting events you can ensure that only events that are needed get moved, which reduces load on brokers, networks, and clients. A static topic structure condemns us to duplication, republishing, or some other form of inefficiency.
You can also exchange subscription information among brokers, in much the same way that IP routing shares network segment information. With the right subscription propagation facility, the brokers can move events only to where they are needed. Load balancing is dynamic, so you can scale consumer counts and move consumers around brokers as needed.
A rich, hierarchical topic structure with wildcard filtering has proven to be easy to use, high performance, scalable, and robust. We know that a bank that tested Kafka’s static topic structure found that by moving to a dynamic topic structure their throughput and storage requirements were reduced by over 50x.
Flat Topic Structures Often Do Not Scale
The conclusion I draw from this is that while the Kafka architecture addresses the need to scale throughput on a broker, it has ignored the need to scale the topic structure, which results in:
- Excess client state keeping requirements, making client scaling hard.
- A multiplier effect on events, as they must be republished or duplicated to avoid overwhelming clients with events they are not interested in receiving.
If the topic structure is to act as an aggregation channel, this trade-off works well. However, as derived events are generated in reaction to the aggregated stream, the trade-off breaks down. Scaling becomes non-linear as the number of events rises. For example, instead of having two brokers dealing with ten events, you have 10 Kafka brokers dealing with 100 events.
A dynamic, rich, hierarchical topic structure allows topic routing and filtering, which avoids the multiplication of events that you get with a flat topic structure. Such a hierarchical, dynamic structure does not, however, come without challenges.
The Challenges of Dynamic Topic Structure
Since topics are dynamic, it takes more up-front work to define a topic structure or schema that defines which fields go where in the topic hierarchy, what data is suitable for inclusion in the topic, what is not suitable, etc.
You also need to consider governance. You can’t point at a topic and decide who can and can’t access it; you need to use the topic structure to create rules and policies. This approach allows fine-grained control over who accesses what data so that effort yields more flexibility.
Lastly, the broker has to do more work, like perform wildcard and hierarchical topic lookups. Carefully constructed, this can be restricted to simple string matching, which can perform the majority of filtering required, including geo-fencing. And you must propagate subscription information around the brokers, a non-trivial task.
Put simply, dynamic, hierarchical topics add complexity and load to the broker and require careful architectural design (tooling for topic design is now emerging). Static, flat topic structures, on the other hand, simplify the broker but cost more in ancillary broker tooling and architectural complexity.
Conclusion: Don’t Use a Hammer on a Screw
Kafka excels at high-rate data aggregation, but there are downsides to applying it to the wrong use cases. For example, microservices aren’t readily served by Kafka due to the client complexity, difficulties in scaling consumers across topic partitions, and state-keeping requirements.
It is tempting to use Kafka to distribute data derived from your aggregate stream, but that frequently amounts to moving dis-aggregated data, which Kafka isn’t good at. A smart broker takes derived data and routes it in a filtered manner to appropriate destinations, eliminating the multiplier effect that static, flat topic structures have on this kind of data. A hierarchical topic structure allows filters on multiple fields (instead of just a single key), which eliminates the event multiplier effect often seen with Kafka as developers struggle to make sure consumers get data of interest – which can lead to a hair-raising number of messages.
Perhaps the question architects and developers should be asking is not “How many messages can I move?” to determine scaling capabilities and requirements, but rather “How many messages should I be moving?”
Talk to Solace: we can help you understand this trade-off and show you how a smart event broker can optimize your architecture.