An incident at one of our investment banking customers prompted me to write this blog. Solace was recently called for help when one of their application silently died. Solace has made High Availability (HA) & Disaster Recovery (DR) very simple and built into the product itself. The objective of this blog is to help the reader understand those capabilities, how to set them up including configuration of things like reconnection attempts and timeouts.
I have used Java as the programming language. If you’re using another programming language you’ll want to refer to the appropriate user manual.
The “HOST” property is used by the application to specify the IP address (or host name) of the Solace message router to connect to. A host entry has the following form:
[Protocol:]Host[:Port]
Protocol is the protocol used for the transport channel. The valid values are:
tcp
Use a TCP channel for communications between the application and its peers. If no protocol is set, tcp is used as a default.tcps
Use a TLS channel over TCP for communications between the application and its peers. Encryption with compression is not supported.JCSMPProperties properties = new JCSMPProperties(); properties.setProperty(JCSMPProperties.HOST, "tcp:10.20.30.1"); …
With Solace Guaranteed Messaging you deploy Solace message routers in HA pairs which appear as a single HOST to the client applications. Hence you have only one IP address, as only one host entry is required in the session HOST property.
For DR scenarios, the host list feature of the Solace messaging APIs provides messaging clients with the IP addresses or host names of the Solace message routers in both of the Replication sites. This enables clients to successfully failover to a disaster recover site. By default, only a Solace message router with Message VPNs that have a Replication active state will allow the clients to connect. So during a temporary loss of connectivity to the routers at one Replication site, client applications won’t inadvertently connect to the routers at the other site as they traverse the host list while attempting to reestablish a connection.
Multiple host entries (up to four) separated by commas are allowed. With multiple entries, each is tried in turn until one succeeds.
JCSMPProperties properties = new JCSMPProperties(); properties.setProperty(JCSMPProperties.HOST, "10.20.30.1:55555, 10.20.30.2:55555"); …
When a connection is attempted, the API first attempts to connect to 10.20.30.1. If that connection fails for any reason, it attempts to connect to 10.20.30.2. This process is repeated until all other entries in the host list are attempted.
After each entry has been attempted, if all fail, the channel properties ConnectRetries
, ReconnectRetries
, and ReconnectRetryWaitInMillis
determine the behavior of the API. If ConnectRetries
is anything other than zero, the API waits for the amount of time set for ReconnectRetryWaitInMillis
, then starts its connection attempts again from the beginning of the list. When traversing the list, each entry is attempted the number of times set for the ConnectRetriesPerHost
property + 1.
If an established session to any host in the list fails, when ReconnectRetries
is non-zero, the API automatically attempts to reconnect, starting at the beginning of the list.
Notes:
Before configuring your reconnection and timeout settings, you should have a solid understanding of JCSMPChannelProperties class which includes the set of properties required to create a channel connection with Solace routers.
For the scope of this blog post, an application must have the following reconnection properties correctly set so the Solace APIs can automatically reestablish the connection with the Solace messaging router. Therefore, it is important you understand the correct usage of the below-mentioned reconnection properties.
The value of this property corresponds to the number of times the APIs should attempt to reconnect to the Solace message router (or the list of Solace message routers) after the initial connected session goes down.
The default value for this property is 3, which means the APIs will automatically attempt to reconnect 3 times before giving up. Valid values are >= -1. -1 means “retry forever” which obviously isn’t a good setting as detection of failure is better than trying to connect indefinitely. “0” means no automatic reconnection retries (that is, try once and give up).
JCSMPProperties properties = new JCSMPProperties(); properties.setProperty(JCSMPProperties.HOST, conf.getHost()); … // Channel properties JCSMPChannelProperties cp = (JCSMPChannelProperties) properties .getProperty(JCSMPProperties.CLIENT_CHANNEL_PROPERTIES); … cp.setReconnectRetries(5); …
The Connect Retries property sets the number of times to retry to establish an initial connection for a Session to a host router. For example, setting the connect retries value to 3 in the Java API results in a maximum of three connection attempts: the initial attempt and two retries.
Valid values are >= -1. Zero means no automatic connection retries (that is, try once and give up). -1 means “retry forever”.
JCSMPProperties properties = new JCSMPProperties(); properties.setProperty(JCSMPProperties.HOST, conf.getHost()); … // Channel properties JCSMPChannelProperties cp = (JCSMPChannelProperties) properties .getProperty(JCSMPProperties.CLIENT_CHANNEL_PROPERTIES); … cp.setConnectRetries(5); …
The value of this property corresponds to the number of milliseconds to wait between each attempt to connect or reconnect to a host. If a connect or reconnect attempt to host is not successful, the API waits for the amount of time set for reconnectRetryWaitInMillis, and then makes another connect or reconnect attempt.
The default value for this property is 3000, which means by default, the APIs will wait for 3 seconds between each attempt to connect/reconnect to a host. Valid values are 0 – 60000.
Note that connectRetriesPerHost
sets how many connection or reconnection attempts can be made before moving on to the next host in the list.
JCSMPProperties properties = new JCSMPProperties(); properties.setProperty(JCSMPProperties.HOST, conf.getHost()); … // Channel properties JCSMPChannelProperties cp = (JCSMPChannelProperties) properties .getProperty(JCSMPProperties.CLIENT_CHANNEL_PROPERTIES); … cp.setReconnectRetryWaitInMillis(3000); …
The value of this property corresponds to the number of times reconnection to a single host will be attempted before moving to the next host in the list.
The default value for this property is 0 which means the APIs will make a single connection attempt. Valid values are >= -1. -1 means attempt an infinite number of reconnect retries, meaning the API will only ever try to connect or reconnect to first host listed. Note that this property works in conjunction with the connect and reconnect retries settings; it does not replace them.
JCSMPProperties properties = new JCSMPProperties(); properties.setProperty(JCSMPProperties.HOST, conf.getHost()); … // Channel properties JCSMPChannelProperties cp = (JCSMPChannelProperties) properties .getProperty(JCSMPProperties.CLIENT_CHANNEL_PROPERTIES); … cp.setConnectRetriesPerHost(20); …
When using HA redundant Solace message router pairs, a failover from one Solace message router to its mate will typically occur in seconds, but applications should attempt to reconnect for at least five minutes. To allow for a reconnect duration of 5 minutes for HA redundant Solace message routers, set the following session property values:
JCSMPProperties properties = new JCSMPProperties(); properties.setProperty(JCSMPProperties.HOST, conf.getHost()); … // Channel properties JCSMPChannelProperties cp = (JCSMPChannelProperties) properties .getProperty(JCSMPProperties.CLIENT_CHANNEL_PROPERTIES); cp.setConnectRetries(1); cp.setReconnectRetries(5); cp.setReconnectRetryWaitInMillis(3000); cp.setConnectRetriesPerHost(20);
In the case I mentioned above, the customer’s application had been configured with incorrect session reconnect properties so the application died silently after just a few reconnection attempts. Unfortunately the application had no logging and no one was monitoring its health, so it went unnoticed. (This highlights the importance of monitoring applications via logs or other mechanisms which you can learn about here.)
If you’re a developer, architect or QA person responsible for leveraging, setting up or testing HA and DR within a Solace environment, I recommend you go through product documentation to fully understand the relevant features and functions. Here are some links to get you started: