Home > Blog > Solace’s Self-Driving Ticketing System: A Cognitive Mesh Success Story
At Solace, we’re fortunate to have a dedicated AI team that’s been at the forefront of researching and building cutting-edge AI platforms. Their innovative work, which includes groundbreaking projects like our AI connector and explorations into Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs), has inspired our organization.
Subscribe to Our Blog
Get the latest trends, solutions, and insights into the event-driven future every week.
Thanks for subscribing.
Recently, our production engineering team, inspired by these advancements in AI and event-driven architecture, decided to tackle a complex challenge in our Jira ticket management and upgrade window workflow using AI-based solutions.
The Challenge of Managing Complex Workflows at Scale
Before delving into the technical aspects, it’s crucial to understand the business problem we were addressing. At a high level, we were facing the challenge of handling a large volume of incoming tickets in unstructured text format.
The complexity of these tickets demanded meticulous coordination, while manual processing introduced the risk of human error. This led to potential miscommunications, missed deadlines, and incorrect upgrades. Additionally, the constant need to keep all stakeholders informed about upgrade statuses, issues, and schedule changes became an overwhelming task, essentially a full-time job in itself.
Our ideal system would intelligently parse and understand Jira tickets without human intervention. It would create and manage upgrade windows and minimize errors in our processes. We also envisioned a communication wizard that could provide relevant updates to all parties involved.
Understanding Agentic AI
Agentic AI is a field gaining significant traction in the tech community. But what exactly is it? Imagine an AI system that doesn’t just follow pre-programmed instructions, but thinks on its feet, makes decisions, and adapts to new situations. That’s agentic AI in a nutshell.
These AI systems work independently without constant supervision. They quickly adapt to new challenges as they arise, make informed decisions to achieve specific goals, and learn from their experiences to improve over time.
In essence, agentic AI possesses its own sense of agency – the ability to act independently and make choices.
Our Initial Approach to Implementing Agentic AI
With a clear understanding of agentic AI’s potential benefits, we implemented a system for our Jira ticket management and upgrade processes.
Our initial architecture was based on a traditional request-response model. We set up a central server that managed communication between our AI agents and Jira. This server would periodically poll Jira for updates, process the information through our AI agents, and then push any changes back to Jira. The AI agents were designed as separate modules, each with its own specific function, but all relying on the central server for data and coordination.
We deployed multiple AI agents, each specializing in tasks like ticket parsing, upgrade scheduling, risk assessment, and stakeholder communication. A key feature was the ability to transform unstructured text into properly formatted objects and REST API payloads, significantly improving data handling and integration.
Goal-oriented decision-making algorithms empowered our AI agents to make autonomous choices based on predefined objectives.
This AI-driven approach promised to automate many manual processes, reduce human error, and ensure more consistent, timely communication with stakeholders. As we progressed with the implementation, our enthusiasm grew, fueled by the transformative possibilities this system offered for our workflow.
The Roadblocks that Arose
As we began to scale our system, we encountered several significant challenges that tested the limits of our initial implementation:
- The Polling Problem: Our system was constantly querying Jira for updates, leading to inefficiency and rate-limiting issues. We found ourselves caught between polling too frequently and hitting rate limits, or polling too infrequently and missing critical updates.
- State Management Complexity: As the number of processes grew, tracking the state of multiple workflows became increasingly difficult. It was like trying to keep track of multiple complex conversations simultaneously – overwhelming and prone to errors.
- AI Agent Coordination: While our AI agents performed well individually, they struggled to work cohesively as a team. Each agent was making decisions in isolation, sometimes leading to conflicting actions.
- Scalability Bottlenecks: As the volume of tickets and the complexity of our workflows increased, our traditional architecture struggled to keep pace. We needed a more responsive and efficient approach to handle the dynamic nature of our agentic AI system.
These challenges made it clear that while the core AI technologies were powerful, we needed a more sophisticated architecture to fully leverage their potential in our specific use case.
The Solution: Event-Driven Architecture
While Solace’s expertise lies in event-driven architecture (EDA), our initial approach to the agentic AI system focused primarily on the AI components themselves. This AI-first strategy allowed us to rapidly prototype and test the core functionalities of our system.
However, as we grappled with scaling our agentic AI system, we recognized the need for a more robust and flexible architecture. It was at this juncture that we turned to what we knew best: EDA.
EDA is a design paradigm in which the flow of the program is determined by events such as user actions, sensor outputs, or messages from other programs. In our context, it offered several key advantages:
- Real-time Responsiveness and Asynchronous Communication: Instead of constantly polling for updates, our system could react immediately to events as they occurred. Components can interact without waiting for immediate responses, enhancing scalability and system resilience in the face of varying loads or component failures.
- Decoupling and Flexibility: Event-driven systems allow components to operate independently, reducing the complexity of state management and improving overall system flexibility. This makes it easier to add new features or modify existing ones, as components are loosely coupled.
- Scalability: This architecture can handle increasing loads more effectively by distributing event processing across multiple components. As our system grows, we can easily add new event handlers or scale existing ones to manage increased demand.
To implement EDA in our AI workflows, we leveraged the Solace AI Connector, our in-house tool designed to orchestrate event-driven AI systems. This innovation provided the perfect framework for building the event-driven AI workflows we envisioned, offering pre-built components for common AI tasks and the flexibility to integrate custom components.
Implementing the Event-Driven Agentic AI System
After recognizing the potential of combining our event-driven expertise with agentic AI, we set out to redesign our Jira ticket management and upgrade processes. This new approach allowed us to leverage the strengths of both technologies, creating a more efficient and scalable solution.
Implementation Details
Our implementation focused on five key areas:
- Event Streaming Platform: We used Solace PubSub+ Event Broker to build an event mesh that handles the real-time event distribution across our system. This allowed us to move away from constant polling and towards a more efficient, event-driven model.
- AI Workflow Orchestration: The Solace AI Connector became the backbone of our AI workflow. We defined our workflows using its YAML-based configuration, which allowed for easy modification and maintenance of our AI processes. (Here’s a simple example of solace ai connector config file)
- Event-Driven Agents: We redesigned our AI agents to be event-driven, subscribing to relevant event streams and publishing their outputs as new events. This significantly improved agent coordination and reduced the complexity of state management.
- Jira Integration: Instead of polling Jira, we set up webhooks to generate events for ticket creation, updates, and status changes. These events were published to our event streaming platform, triggering relevant AI workflows.
- Scalable Processing: We leveraged the distributed nature of our event-driven architecture to scale our processing capabilities, allowing us to handle increasing loads more effectively.
Typical Workflow
Now, let’s walk through how these agents work together in practice.
- The process initiates when a new upgrade request is submitted through JIRA, triggering a webhook event. This event is then published to the PubSub+ Event Broker, which efficiently routes it via the event mesh to the appropriate subscribers within our system.
- Upon receiving the event, the JIRA Ticket Parser Agent immediately begins its analysis, extracting key information from the request such as the desired broker version and other relevant details.
- Next, the Risk Assessor Agent. It evaluates the extracted information, particularly focusing on the requested broker version and associated upgrade parameters. This agent analyzes potential risks, considering factors such as compatibility issues, known vulnerabilities, and historical performance data. If no significant issues are identified, the Risk Assessor passes the validated request along to the next stage.
- Following the risk assessment, the Upgrade Scheduler Agent comes into play. It utilizes the validated information to interact with the necessary PubSub+ Cloud REST APIs, creating a proposed upgrade window based on the extracted and assessed information. Importantly, this upgrade window is initially set to a “Pending Approval” status, ensuring human oversight before any actions are taken.
- Throughout this process, the Knowledge Base Agent provides valuable input, offering relevant historical data to both the Risk Assessor and Upgrade Scheduler to enhance their decision-making capabilities.
- The Notification Agent maintains its crucial role in stakeholder communication, alerting relevant parties about the proposed upgrade and the need for human approval.
- At this stage, a human operator reviews the AI-generated upgrade proposal. This critical step allows for a final validation of the AI’s recommendations, mitigating the risk of potential AI errors. Once the human approves the JIRA ticket, the Upgrade Scheduler Agent is notified and changes the upgrade window status from “Pending Approval” to “Approved,” making it executable.
This workflow exemplifies our commitment to leveraging AI for efficiency while maintaining human oversight for critical decisions.
The Role of PubSub+
At the center of this intricate ballet is PubSub+ Event Broker, the stage upon which our agents perform. The event mesh we built with it allows our agents to work independently yet in perfect harmony.
In essence, PubSub+ turns what could be a chaotic mess of information into a streamlined, efficient process. It’s the difference between a congested network and a well-optimized data flow system.
Results and Benefits
Our new event-driven agentic AI system has transformed our operations in three keyways:
- Responsiveness: The real-time communications it enables has dramatically reduced the lag between ticket updates and actions. This improvement, combined with our enhanced capacity to manage high volumes of Jira tickets, has streamlined our workflow considerably.
- Manageability: The shift to an event-driven approach has simplified our architecture, making it more maintainable and extensible. Our AI components now work in sync, communicating effectively through event streams. By eliminating constant polling, we’ve achieved a notable reduction in resource consumption, boosting our overall efficiency.
- Agility: Perhaps the most significant advantage is our newfound agility. We can now adapt quickly to changing requirements, adding features or modifying workflows with ease. This flexibility is crucial in our fast-paced tech environment. With the success we’ve seen in broker ticket upgrades, we’re now considering expanding to include cluster upgrades. Thanks to our event-driven architecture and modular design, this expansion is easier than ever.
Conclusion: Your Turn to Shine
We’ve come a long way from our AI workflow nightmares. Event-driven architecture, powered by Solace PubSub+ and the Solace AI Connector, turned our chaotic AI circus into a well-oiled machine.
Don’t just take our word for it. The tools are out there, waiting for you to grab them and revolutionize your own AI workflows. Ready to join the event-driven AI revolution?
Check out the Solace AI Connector on GitHub and start building the future of AI today. Trust us, your future self will thank you!
As the technical lead of Solace's infrastructure team, Luay is at the forefront of transforming how Solace approaches infrastructure development. His mission is to envision and implement scalable, self-serving infrastructure solutions that benefit Solace's internal teams and external clients.
He's passionate about pushing the boundaries of infrastructure as software development, moving beyond traditional “infrastructure as code” to create more dynamic and responsive systems. This involves setting new standards and best practices at an organizational level, and influencing teams to adopt innovative approaches to infrastructure management.
Luay says one of the most rewarding aspects of his role is mentoring and growing the team, both technically and personally. He believes in fostering an environment where creativity and continuous learning are encouraged, enabling Solace to stay ahead in the rapidly evolving tech landscape.
Subscribe to Our Blog
Get the latest trends, solutions, and insights into the event-driven future every week.
Thanks for subscribing.