In this Post

    Google’s Agent2Agent (A2A) protocol has emerged as a solution for enabling AI agents to interact and collaborate. We appreciate Google’s design of A2A; indeed, it closely resembles the internals of Solace Agent Mesh—our open, natively event-driven agentic AI framework. However, the true potential of enterprise grade agentic collaboration can only be realized if the underlying communication architecture is robust, scalable, and flexible as I discussed in this post.

    Subscribe to Our Blog
    Get the latest trends, solutions, and insights into the event-driven future every week.

    Thanks for subscribing.

    We believe event-driven architecture in general, and an event mesh in particular, can make A2A much more powerful and help those of us in the software industry avoid going through another round of discovering the drawbacks of point-to-point architectures. An event mesh is a network of event brokers that uses a decoupled architecture and the publish/subscribe exchange pattern to facilitate the real-time distribution of information between applications, cloud services, connected devices and AI agents across environments and geographies. Remember, we learned about the drawbacks of point-to-point integration when we adopted microservices –many of us have the scars to show the heavy burden of production incidents, operational complexities, and brittle spaghetti like systems that were easy to topple one another.

    The Limitations of Point-to-Point Communications

    At first glance, directly connecting AI agents through point-to-point communications (client <-> server as designed by A2A) might appear straightforward and simple, and it works fine if you only have a couple of agents.  However, this approach introduces several significant drawbacks as you scale the system:

    • Complexity Explosion: As more agents join, the number of direct connections required grows exponentially, turning the architecture into a tightly coupled “spaghetti” system that is difficult to manage, scale, and debug.
    • Brittle Integrations: Changing one agent often necessitates adjustments in other agent it interacts with directly, slowing down deployments and increasing the risk of cascading failures.
    • Operational Overhead: Maintenance and scaling of point-to-point integrations create heavy operational burdens due to frequent manual interventions and reconfigurations.

    As discussed earlier, this is not new—we’ve seen this when we went through the adoption of microservices and have discussed the challenges in depth. We’ve also talked about the benefits of event-driven microservices. I, however, believe these problems could be way worse in agentic networks because we will have many more agents than microservices. Agents are by definition supposed to be way more specialized in their skillsets and more like nano-services. In other words, there will be many more of them and therefore many more point-to-point connections.  Think about 10s, 100s, or thousands of agents in a collaborative network of agents where the following formula for the number of point-to-point connections applies to them – N is the number of agents. By the time you get to 10 agents you are already managing 45 point-to-point connections.

    So if this is the problem, how can it be addressed in production environments? Some believe that the solution is a log streaming type architecture can be used to handle messaging, so of course Kafka emerges as an apparent solution. However, the nature of Kafka’s topic architecture and operational burden doesn’t address some of the challenges, and in fact can compound them if Kafka topics are used as the transport between the client and servers in A2A.

    Event Mesh: A Better Solution

    An event mesh provides an event-driven, asynchronous, decoupled communication layer that addresses all these challenges, and Solace provides the best tooling for building one.

    1. Decoupling Through Smart Topic Routing

    Solace’s smart topics use hierarchical topic structure so agents can publish and subscribe to precisely targeted events. Rather than agents being directly coupled to one another, agents simply subscribe to topics describing the type of events they are interested in. Unlike Kafka, these topics do not need configuration, and using them is as simple as using hashtags in social media posts. You tag your event with your topic, which is a simple string in the form of Noun/Verb/Properties, and you are off to the races.

    This model allows:

    • Loose Coupling: Agents can be independently developed, deployed, scaled, and modified without needing to understand the entire ecosystem, which makes it easy to add or upgrade agents since agents do not rely on direct knowledge of one another..
    • Simplified Maintenance:. Since topics can be created by publishing applications as a way of describing whatever they’re sending, you don’t need to spend a lot of time setting up or configuring topics.
    • Registry & Discovery: If topic routing is done right, it can facilitate discovery. For example, Solace Agent Mesh uses the above to facilitate the registry and discovery of AI agents. Agents can send an event to the orchestrator (which is like the Agent Card concept described by A2A) which lets the entire system know about their existence and how to use them.

    Why Solace Smart Topics are Better than Kafka Topics

    • Hierarchical Structure: Unlike Kafka’s flat topic structure, Solace smart topics allow hierarchical, wildcard-based subscriptions, enabling finer-grained control and precision in routing messages. This allows wild card subscriptions of agents, or non-agent consumers.
    • No config & dynamic routing: Kafka requires pre-configured topics for new interactions, leading to higher operational complexity. In contrast, Solace dynamically routes messages based on their topic without pre-configuration, significantly simplifying system evolution.
    • Scaling with guaranteed ordering: Solace brokers efficiently manage and route messages at scale, providing guaranteed message delivery and in-order processing across distributed agents, whereas Kafka’s partitioned topics can introduce additional complexity when maintaining message order.

    If you want to dive deeper into the difference between Solace and Kafka topics, this is a good starting point.

    2. Scalability and Resilience

    Solace’s platform supports horizontal scaling by allowing multiple instances of agents to subscribe to the same topic, effectively distributing the workload. Events buffer in queues, protecting the system from sudden load spikes, and enabling agents to process events asynchronously at their own pace. This approach naturally provides fault isolation, ensuring that if one agent is temporarily unavailable, others continue seamlessly without disruption. As an example, you can monitor event queues for agents and dynamically increase/decrease the number of serving agents if the queue gets too deep, optimizing cost and system performance.

    3. Real-Time and Asynchronous Communication

    Solace’s platform excels in handling real-time enterprise events and managing asynchronous interactions with variable response time, which is essential for AI-driven applications. Agents could respond immediately to events or for long-running tasks, perform their tasks asynchronously, and emit events when complete. This continuous, non-blocking responsiveness significantly enhances the user experience and operational efficiency. We go deeper into this topic in this post.

    4. Enhanced Observability and Monitoring

    Using an event-driven approach significantly simplifies observability compared to point-to-point architectures. Since interactions between agents are event-based, logging, tracing, and monitoring become straightforward. Each event can be easily traced, facilitating deep visibility into agent activities. This makes diagnosing issues faster, supports operational transparency, and enhances trust in the reliability of the AI system.

    Wildcard subscriptions allow for various activities, such as:

    • Capturing all or subsets of agent interactions into a data lake for analytics and governance
    • Visualizing inter-agent interactions in real-time using a communication visualizer dashboard

    These subscriptions can be integrated with numerous operational systems, which are essential for maintaining the performance of large enterprise deployments.

    5. Distributed Network of Agents

    The distributed nature of an event mesh provides significant advantages when dealing with applications that agents interact with across different clouds or on-premises environments. This architecture allows agents to be co-located with the applications they work with, optimizing security, performance, and robustness. By ensuring agents are positioned close to their corresponding apps, this setup minimizes latency and enhances the overall efficiency of inter-agent communication.

    An event mesh handles networking and security, providing seamless connectivity among agents regardless of their deployment locations. This eliminates the need for complex configurations and security measures on an agent-by-agent basis, offering a simplified and streamlined approach to agent communication. This feature ensures that the system remains efficient and secure, no matter where agents are deployed and enables global scale.

    Future-Proof Your Agent-to-Agent Communications

    Google’s Agent2Agent protocol, complemented by an event mesh built with Solace’s platform, creates an enterprise-ready foundation for sophisticated multi-agent systems. The combination enables a highly scalable, reliable, and flexible ecosystem where agents to agent, agent to data sources, and agent to tools communications becomes effortless, secure, agile and efficient.

    We love the concepts of A2A, and believe pairing it with an event mesh can transform it into a powerful enterprise-grade solution.

    Ali Pourshahid
    Ali Pourshahid
    Chief Engineering Officer

    Ali Pourshahid is Solace's Chief Engineering Officer, leading the engineering teams at Solace. Ali is responsible for the delivery and operation of Software and Cloud services at Solace. He leads a team of incredibly talented engineers, architects, and User Experience designers in this endeavor. Since joining, he's been a significant force behind the PS+ Cloud Platform, Event Portal, and Insights products. He also played an essential role in evolving Solace's engineering methods, processes, and technology direction.
     
    Before Solace, Ali worked at IBM and Klipfolio, building engineering teams and bringing several enterprise and Cloud-native SaaS products to the market. He enjoys system design, building teams, refining processes, and focusing on great developer and product experiences. He has extensive experience in building agile product-led teams.
     
    Ali earned his Ph.D. in Computer Science from the University of Ottawa, where he researched and developed ways to improve processes automatically. He has several cited publications and patents and was recognized a Master Inventor at IBM.