When architecting a system, it’s important to make sure it’s redundant, resilient, and fault-tolerant so it can carry on if a node fails, or even if all the nodes in one datacenter fail.

PubSub+ Event Broker can be deployed in a high availability (HA) group, and configured for disaster recovery (DR). In this post, I’ll focus on deploying for HA in AWS. There is an AWS CloudFormation template available for you to easily spin up your HA group, but I’d like to show you how to do it manually so you understand how it works.

How does HA work with PubSub+ Event Broker?

A typical HA deployment of Solace’s PubSub+ Event Broker consists of three nodes:

  1. Primary
  2. Backup
  3. Monitor

The primary and backup brokers are set up in active-standby configuration, while the third acts as a monitoring node. The two brokers have their own storage, so they share nothing and are completely independent. The monitoring node is just responsible for maintaining quorum between the primary and backup brokers.

>How does HA work with PubSub+ Event Broker?

When a message is published to the primary broker, it is persisted locally and synchronously pushed to the backup broker. Once the backup broker receives the message, it acknowledges receipt to the primary broker, at which point the primary broker sends an acknowledgement back to the publisher.

If there is a subscriber interested in this message, it will be forwarded to that subscriber. Otherwise, the message will be forwarded later. Once the subscriber receives the message, it will send a confirmation back to the primary broker. Upon receiving this receipt, the primary and standby brokers will both delete their copies of the message.

Note that at any given time, only the active broker will accept connections. For example, let’s say your primary broker is your active broker. If and when the primary broker fails, the backup broker will become the active broker and start accepting connections. When the primary broker becomes available again, you can either configure your system to automatically make it the active broker again, or leave the secondary broker active. It’s recommended that you run the primary and secondary brokers on different servers within the same datacenter.

This HA setup makes systems very resilient, because if the primary broker fails, the backup will quickly take over and minimize the impact to your system.

Now that we know how an HA group works, let’s spin up three instances of PubSub+ Event Broker on AWS and configure them to be part of an HA group.

Launching EC2 instances with PubSub+ Event Broker

Let’s see how we can configure three broker instances to be our primary broker, secondary broker, and monitoring node. We will be running three instances of PubSub+ Event Broker on three different EC2 instances on AWS. See my blog post on how to launch an EC2 instance with PubSub+Event Broker.

Once you have an EC2 instance up and running, you should use the security group that was created for this instance and attach it to the backup and monitoring EC2 instance. Do not create all 3 instances with 3 different security groups.

I also edited the security group that was created for the first EC2 instance to allow all incoming traffic from any instance that is attached to this security group. You can do this by going to the security group and under Inbound section, clicking Edit, then clicking Add Rule, selecting All Traffic and in the Source field, enter the security group Id. Here is what it should look like:

Launching EC3 instances with PubSub+ Event Broker

Again, this is just for demo purposes to make sure all our instances can speak to each other.

Configuring HA group

Now that we have all 3 EC2 instances with PubSub+ Event Broker up and running, we can begin to configure them as part of one HA group. The following steps have been thoroughly documented in the HA Group Configuration section of  Solace PubSub+ Technical Documentation.

Before we begin, we need to update our router name on each instance. For an HA group to be configured, the router names must match the node name. When we launched EC2 instance from Solace AMI, we were given a default router name (it’s usually the private IP address). We can use the given router name as our node name, but I would rather give our nodes more descriptive names such as ha-demo-primaryha-demo-backup, and ha-demo-monitor.

Primary node

Log in to your primary node and change the router name to ha-demo-primary:

/ Activate Solace CLI
[sysadmin@ip-172-31-26-146 ~]$ solacectl cli

Solace PubSub+ Standard Version 9.3.1.5

Operating Mode: Message Routing Node

ip-172-31-26-146> enable
ip-172-31-26-146# configure

/ Change router name to primary
ip-172-31-26-146(configure)# router-name ha-demo-primary
This command causes a reload of the system.
Do you want to continue (y/n)? y

Note that a router name change requires a reboot of PubSub+ Event Broker so you will have to wait a minute or so before it comes back up. Run the following command once the broker has been rebooted to confirm the router name was changed:

ip-172-31-26-146> show router-name

Router Name:          ha-demo-primary
Mirroring Hostname:   No

Deferred Router Name: ha-demo-primary
Mirroring Hostname:   No

Once you have updated the router name, you will need to run some commands to:

  1. Create and designate different instances as primarybackup,and monitor
  2. Provide the authentication key

From Solace docs: pre-shared-key is 44 to 344 characters (which translates into 32 to 256 bytes of binary data encoded in base 64). It’s used to provide authentication between nodes in a HA Group and must be the same on each node. I used Base64 Encode to create my key. Once you have your key, run the following commands and replace my key with the one you created.

ip-172-31-26-146(configure)# hardware message-spool shutdown
All message spooling will be stopped.
Do you want to continue (y/n)? y

ip-172-31-26-146(configure)# redundancy
ip-172-31-26-146(configure/redundancy)# switchover-mechanism hostlist
ip-172-31-26-146(configure/redundancy)# exit
ip-172-31-26-146(configure)# redundancy
ip-172-31-26-146(configure/redundancy)# group
ip-172-31-26-146(configure/redundancy/group)# create node ha-demo-primary
ip-172-31-26-146(configure/redundancy/group/node)# connect-via 172.31.26.146
ip-172-31-26-146(configure/redundancy/group/node)# node-type message-routing-node
ip-172-31-26-146(configure/redundancy/group/node)# exit
ip-172-31-26-146(configure/redundancy/group)# create node ha-demo-backup
ip-172-31-26-146(configure/redundancy/group/node)# connect-via 172.31.26.208
ip-172-31-26-146(configure/redundancy/group/node)# node-type message-routing-node
ip-172-31-26-146(configure/redundancy/group/node)# exit
ip-172-31-26-146(configure/redundancy/group)# create node ha-demo-monitor
ip-172-31-26-146(configure/redundancy/group/node)# connect-via 172.31.22.77
ip-172-31-26-146(configure/redundancy/group/node)# node-type monitor-node
ip-172-31-26-146(configure/redundancy/group/node)# exit
ip-172-31-26-146(configure/redundancy/group)# exit
ip-172-31-26-146(configure/redundancy)# authentication
ip-172-31-26-146(configure/redundancy/authentication)# pre-shared-key key c29sYWNlaXNhZ3JlYXRldmVudHBsYXRmb3Jtd2hpY2h5b3VzaG91bGR1c2Vmb3JhbGx5b3VyZXZlbnRpbmduZWVkcw==
ip-172-31-26-146(configure/redundancy/authentication)# exit
ip-172-31-26-146(configure/redundancy)# active-standby-role primary
ip-172-31-26-146(configure/redundancy)# no shutdown

You will mostly run the same commands on your backup node that you ran on your primary node. The only difference is that you will designate it as a backup node by running this command instead:

active-standby-role backup

Here are the commands you need to run on your backup node:

ip-172-31-26-208> enable
ip-172-31-26-208# configure
ip-172-31-26-208(configure)# router-name ha-demo-backup
This command causes a reload of the system.
Do you want to continue (y/n)? y

ip-172-31-26-208(configure)# [sysadmin@ip-172-31-26-208 ~]$ solacectl cli

Solace PubSub+ Standard Version 9.3.1.5

Operating Mode: Message Routing Node

ip-172-31-26-208> show router-name

Router Name:          ha-demo-backup
Mirroring Hostname:   No

Deferred Router Name: ha-demo-backup
Mirroring Hostname:   No

ip-172-31-26-208> enable
ip-172-31-26-208# configure
ip-172-31-26-208(configure)# hardware message-spool shutdown
All message spooling will be stopped.
Do you want to continue (y/n)? y

ip-172-31-26-208(configure)# redundancy
ip-172-31-26-208(configure/redundancy)# switchover-mechanism hostlist
ip-172-31-26-208(configure/redundancy)# group
ip-172-31-26-208(configure/redundancy/group)# create node ha-demo-primary
ip-172-31-26-208(configure/redundancy/group/node)# connect-via 172.31.26.146
ip-172-31-26-208(configure/redundancy/group/node)# node-type message-routing-node
ip-172-31-26-208(configure/redundancy/group/node)# exit
ip-172-31-26-208(configure/redundancy/group)# create node ha-demo-backup
ip-172-31-26-208(configure/redundancy/group/node)# connect-via 172.31.26.208
ip-172-31-26-208(configure/redundancy/group/node)# node-type message-routing-node
ip-172-31-26-208(configure/redundancy/group/node)# exit
ip-172-31-26-208(configure/redundancy/group)# create node ha-demo-monitor
ip-172-31-26-208(configure/redundancy/group/node)# connect-via 172.31.22.77
ip-172-31-26-208(configure/redundancy/group/node)# node-type monitor-node
ip-172-31-26-208(configure/redundancy/group/node)# exit
ip-172-31-26-208(configure/redundancy/group)# exit
ip-172-31-26-208(configure/redundancy)# authentication
ip-172-31-26-208(configure/redundancy/authentication)# pre-shared-key key c29sYWNlaXNhZ3JlYXRldmVudHBsYXRmb3Jtd2hpY2h5b3VzaG91bGR1c2Vmb3JhbGx5b3VyZXZlbnRpbmduZWVkcw==
ip-172-31-26-208(configure/redundancy/authentication)# exit
ip-172-31-26-208(configure/redundancy)# active-standby-role backup
ip-172-31-26-208(configure/redundancy)# no shutdown
ip-172-31-26-208(configure/redundancy)#

Monitor node

The commands you need to run on your monitor node are slightly different but for the most part, you are doing the same sort of configuration that you did on primary and backup nodes.

Before changing the router name on the monitor node, you need to run this command:

[sysadmin@ip-172-31-22-77 ~]$ solacectl cli

Solace PubSub+ Standard Version 9.3.1.5

Operating Mode: Message Routing Node

ip-172-31-22-77> enable
ip-172-31-22-77# reload default-config monitoring-node
This command causes a reload of the system which will discard all configuration
and messaging data stored on this system.
Do you want to continue (y/n)? y

This will restart your node with default configs and designate it as a monitoring node (instead of a message routing node).

Now, you can change the router name like we did with our primary and backup nodes.

ip-172-31-22-77(configure)# router-name ha-demo-monitor
This command causes a reload of the system.
Do you want to continue (y/n)? y

Confirm the name was changed by running this command:

ip-172-31-22-77# show router-name

Router Name:          ha-demo-monitor
Mirroring Hostname:   No

Deferred Router Name: ha-demo-monitor
Mirroring Hostname:   No

You can now go ahead and run the following commands (just like you did with primary and backup nodes).

ip-172-31-22-77(configure)# redundancy
ip-172-31-22-77(configure/redundancy)# switchover-mechanism hostlist
ip-172-31-22-77(configure/redundancy)# group
ip-172-31-22-77(configure/redundancy/group)# create node ha-demo-primary
ip-172-31-22-77(configure/redundancy/group/node)# connect-via 172.31.26.146
ip-172-31-22-77(configure/redundancy/group/node)# node-type message-routing-node
ip-172-31-22-77(configure/redundancy/group/node)# exit
ip-172-31-22-77(configure/redundancy/group)# create node ha-demo-backup
ip-172-31-22-77(configure/redundancy/group/node)# connect-via 172.31.26.208
ip-172-31-22-77(configure/redundancy/group/node)# node-type message-routing-node
ip-172-31-22-77(configure/redundancy/group/node)# exit
ip-172-31-22-77(configure/redundancy/group)# create node ha-demo-monitor
ip-172-31-22-77(configure/redundancy/group/node)# connect-via 172.31.22.77
ip-172-31-22-77(configure/redundancy/group/node)# node-type monitor-node
ip-172-31-22-77(configure/redundancy/group/node)# exit
ip-172-31-22-77(configure/redundancy/group)# exit
ip-172-31-22-77(configure/redundancy)# authentication
ip-172-31-22-77(configure/redundancy/authentication)# pre-shared-key key c29sYWNlaXNhZ3JlYXRldmVudHBsYXRmb3Jtd2hpY2h5b3VzaG91bGR1c2Vmb3JhbGx5b3VyZXZlbnRpbmduZWVkcw==
ip-172-31-22-77(configure/redundancy/authentication)# exit
ip-172-31-22-77(configure/redundancy)# no shutdown

Once you are done configuring, you can run the following command on any of the nodes to confirm that all three nodes are Online and part of your HA group:

ip-172-31-26-146> show redundancy group
Node Router-Name   Node Type       Address           Status
-----------------  --------------  ----------------  ---------
ha-demo-backup     Message-Router  172.31.26.208     Online
ha-demo-monitor    Monitor         172.31.22.77      Online
ha-demo-primary*   Message-Router  172.31.26.146     Online
* - indicates the current node

That’s it! You now have a functional HA group with your primary and backup brokers and monitoring node.

Note that by default, guaranteed messaging is disabled in HA group and can only be enabled once you have an HA group setup. It is recommended that you enable guaranteed messaging but for the purpose of this post, I will leave it as disabled.

More info on your HA group

You can get more information about your redundancy settings by running this command:

ip-172-31-26-146> show redundancy
Configuration Status     : Enabled
Redundancy Status        : Up
Operating Mode           : Message Routing Node
Switchover Mechanism     : Hostlist
Auto Revert              : No
Redundancy Mode          : Active/Standby
Active-Standby Role      : Primary
Mate Router Name         : ha-demo-backup
ADB Link To Mate         : Up
ADB Hello To Mate        : Down

Primary Virtual Router  Backup Virtual Router
----------------------  ----------------------
Activity Status                Local Active            Shutdown
Routing Interface              intf0:1                 intf0:1
Routing Interface Status       Up
VRRP Status                    Initialize
VRRP Priority                  250
Message Spool Status           AD-Disabled
Priority Reported By Mate      Standby

You can get even more information by running this command: show redundancy detail (I am not going to show the output here).

I hope you found this post useful. For more information, visit PubSub+ for Developers. If you have any questions, post them to the Solace Developer Community.

Himanshu Gupta

As one of Solace's solutions architects, Himanshu is an expert in many areas of event-driven architecture, and specializes in the design of systems that capture, store and analyze market data in the capital markets and financial services sectors. This expertise and specialization is based on years of experience working at both buy- and sell-side firms as a tick data developer where he worked with popular time series databases kdb+ and OneTick to store and analyze real-time and historical financial market data across asset classes.

In addition to writing blog posts for Solace, Himanshu publishes two blogs of his own: enlist[q] focused on time series data analysis, and a bit deployed which is about general technology and latest trends. He has also written a whitepaper about publish/subscribe messaging for KX, publishes code samples at GitHub and kdb+ tutorials on YouTube!

Himanshu holds a bachelors of science degree in electrical engineering from City College at City University of New York. When he's not designing real-time market data systems, he enjoys watching movies, writing, investing and tinkering with the latest technologies.