Most messaging middleware vendors claim their products offer high availability, but the means by which they handle fault tolerance, high availability and disaster recovery don’t meet real-world requirements. Outages or delays in your messaging infrastructure can have serious repercussions on your business in terms of customer-visible downtime, delayed delivery of information and data loss, so it’s important to ask some hard questions about how your messaging system ensures high availability, isolates slow consumers and enables disaster recovery.

This whitepaper explains the requirements for robustness in your messaging infrastructure, introduces several ways that messaging products’ HA behavior can bring your business to its knees, and introduces some ways Solace avoids these problems. Here are five primo examples:

1) Slow Failover to Backup Brokers

Software-based messaging products store all messages (and their state) on disk while they’re being delivered. That means if a broker fails, message delivery is stalled out for as long as it takes the backup broker to retrieve all of that data from disk. That process can take minutes or even hours depending on how much data is persisted at the time.

Solace continuously synchronizes “hot standby” appliances so messages are flowing again in just seconds.

2) Head of Line Blocking in DR Replication

Many messaging products rely on third party storage replication tools, that only allow one publisher to synchronize messages to the DR site at a time. All other publishers are blocked until that message has been received by the remote site. This serial transfer of information causes serious bottlenecks in replication for disaster recovery.

Solace does not rely on third party tools to perform DR, and its built-in DR functionality lets all publishers simultaneously replicate as many messages as they want to the DR site.

3) Message Loss During Replication

Storage replication systems move messages to DR sites in periodic batch transfers, which means they have large risk windows of thousands of messages at a time. If things go hayware while a block of messages is being replicated the message spool in the DR site can be irreparably corrupted.

Solace immediately and individually forwards each message to the backup site eliminating replication risk.

4) Ripple Effect of Slow Consumers

When one subscriber is disconnected or can’t keep up with the flow of information being sent to it, the resulting backlog can drag down the entire broker’s performance and keep it from routing messages as quickly as high-speed publishers and other problem-free consumers require.

Solace isolates slow consumers so other subscribers and high-speed publishers are never affected.

5) Systemic Impact of Reconnecting Disconnected Consumers

Many times administrators can’t let disconnected subscribers tap back into the message flow during the day because delivering that backlog of messages would kill the broker’s ability to keep routing real-time messages. As a result they intentionally leave systems offline until after business hours.

Solace appliances have sufficient throughput to quickly catch reconnected subscribers up without affecting anybody else.

Larry Neumann

From 2005 to 2017, Mr. Neumann was responsible for all aspects of strategic, corporate, product and vertical marketing. Before Solace, he held executive marketing positions with TIBCO and Oracle, and co-founded an internet software company called inCommon which was acquired by TIBCO. During his tenure at TIBCO, Mr. Neumann played a key role in planning company strategic direction relating to target markets and candidate acquisitions.