Summary of Results

Detailed descriptions of test methodology, parameters and results are provided below, but here is a summary of the results:

Table 1 – Summary of Results

Solace PubSub+
Software
(ESXi)
Solace PubSub+
Software
(AWS m5.xlarge)
Solace PubSub+
Appliance
(3560)
Messages Per Second
Non-persistent (Direct) point-to-pointMessages Per Second: 1,800,000
(2,600,000 w/o TLS)
Messages Per Second: 1,100,000Messages Per Second: 12,500,000
Non-persistent (Direct) fan-outMessages Per Second: 7,500,000Messages Per Second: 1,200,000Messages Per Second: 29,000,000
Persistent message point-to-pointMessages Per Second: 55,000Messages Per Second: 45,000Messages Per Second: 620,000
Persistent message fan-out Messages Per Second: 450,000Messages Per Second: 350,000Messages Per Second: 5,500,000

These results show that Solace PubSub+ Event Broker: Software offers very fast, scalable performance. In general, it behaves a lot like the 3560 appliances, just at lower message rates. That isn’t surprising since PubSub+ Software is based on the same underlying code that runs our appliances. And while the PubSub+ performance numbers in AWS are lower than in ESXi, this is not so much a limitation of the software, but the network and storage bandwidth limits of the AWS EC2 instance, and the IOPS limits of the EBS volume.

We chose to perform our testing in AWS using an m5.xlarge EC2 instance, which offers a good balance between performance and cost. An EC2 instance with higher network bandwidth capacity would yield higher performance numbers in high fan-out usecases, and in point-to-point non-persistent messaging usecases with larger message sizes. Similarly, we chose a gp2 EBS storage volume with 1000GB capacity which offers good performance at a reasonable cost for the persistent message store. A notable difference between the testing performed on the PubSub+ Event Broker: Appliance and the PubSub+ Event Broker: Software is that the tests against the Software Broker were performed using TLS encryption, which will further decrease the performance relative to the appliance, but represents a more realistic scenario as most choose to encrypt their traffic when using software (especially when running in the public cloud).

PubSub+ Event Broker: Appliance offers the highest possible throughput and performance, so if you want to support massive messaging capacity or scale across many applications in a compact footprint, this purpose-built hardware is for you. If you need less performance, want to deploy messaging into small offices or IoT, or scale your system horizontally in a private or public cloud, then the software or managed service is for you. The beauty is you can mix and match, or switch from one to another, without application or management impact.

The following is a brief introduction to the performance scenarios to help you understand the results.

The first set of scenarios cover Non-Persistent messaging using the Solace Direct message delivery mode. The results cover point-to-point and fan-out message traffic.

The second set of scenarios cover Persistent messaging. Here again the results cover point-to-point and fan-out message traffic.

Methodology

These performance results look at message throughput for point-to-point and fan-out scenarios. The results measure PubSub+ in terms of messages per second. These message exchange patterns are explained in more detail in our core concepts page but this section briefly describes these two patterns as they apply to the performance scenarios.

Point-to-Point

A point-to-point flow is a single publishing client sending messages to the PubSub+ message broker which are received by a single subscribing client.

Point-to-Point flow

Fan-Out

Fan-out is the typical publish/subscribe scenario where a message is published once to the PubSub+ message broker and forwarded to multiple clients who have expressed interest in receiving messages of that kind through topic subscriptions. So the message is then fanned out to the clients by the PubSub+ message broker, and each client receives a copy of the message.

Point-to-Point flow

SDKPerf Test Tool

For all scenarios, the Solace test tool SDKPerf was used as the traffic source and sink. Because we take performance very seriously, over many years we have built SDKPerf into a very capable tool that can be used to test many different messaging APIs. You can find out more about the tool on our SDKPerf overview and you can download the tool and try it yourself as well. For these tests, sdkperf_c was used.

For each scenario, SDKPerf measures the following:

  • The received message rate
  • Message loss using message ordering information that publisher instances of SDKPerf embed in the messages they send
  • Message payload integrity by confirming the received message payloads are unmodified by the messaging system

This ensures that the messaging system is running fast and correctly delivering all messages to the correct consumers without corruption.

Message Delivery Modes

The scenarios in this article cover two types of messaging delivery modes:

  • Non-Persistent – Which is “at most once” messaging. PubSub+ message brokers provide two options for “at most once” messaging. Both direct messaging and non-persistent messaging achieve this level of service. The results in this article use direct messaging which is the higher performance option.
  • Persistent – Which is “at least once” messaging. To achieve full persistence, the messaging system must issue an acknowledgement to the message producer once both PubSub+ message brokers have fully persisted the message to non-volatile storage. And messages for consumers must not be removed from the message broker until an acknowledgement is received. Fully persisted means the message is flushed to disk. Often this option is called failsafe messaging in other message brokers. Simply caching the message in the filesystem is not equivalent and can lead to message loss in failure scenarios. Therefore, in this article when results are listed for Persistent messages this is for fully failsafe messaging with no risk of message loss.

Performance Scenarios

In the performance scenarios below, the goal is to measure the maximum performance of the PubSub+ message broker in each scenario. To achieve this, we use groups of point-to-point or fan-out clients all running in parallel. Each group of clients is monitored to make sure it is running at a rate which it can handle to avoid an individual client becoming the bottleneck and skewing the performance results artificially.

For point-to-point flows, this means you end up with the scenario illustrated in the following figure.

Solace SDKPerf - Advanced Event Broker

In this figure, client groups are depicted using colors. The figure shows a green, grey and blue group. Each group has a single publishing client sending to a single receiving client. The test uses enough groups of clients such that the broker becomes the bottleneck, not the clients. When you monitor the PubSub+ broker and observe the aggregate messages per second across all clients, you determine the maximum message rate a broker can sustain for the point-to-point scenario.

Similarly for fan-out scenarios, you will end up with the scenario illustrated in the following figure.

Throughput in point-to-point scenarios -- advanced event broker

Again, in this figure, client groups are depicted using colors. The figure shows a green, grey and blue group. And again, the number of client groups is scaled as required until the Solace message broker maximum fan-out rate is observed. The maximum fan-out rate is the total output message rate for the Solace message brokers. Within each group, the number of subscribing clients is dictated by the fan-out factor being measured. So, a fan-out factor of 10 will require 10 subscribing clients per group. To figure out how many messages per second are being sent by clients to the Solace message router, simply take the output message rate and divide by the fan-out factor. This is the input rate.

At Solace we believe in full transparency for all aspects of performance results, so comprehensive details on the exact hardware, setup, and methodology are described below in the section Appendix A: Test Setup.

Point-to-Point Non-Persistent (Direct) Messaging

The following table shows the aggregate output message rate (messages/sec) for Solace direct messages (non-persistent) as the message payload size is varied. In this test, groups of clients are sending messages in a point-to-point fashion. Twenty parallel sets of clients are used to generate the required traffic. Note that the AWS measurements were taken over TLS while the ESXi (and Appliance) measurements were not. Most users prefer to use TLS in public cloud environments and the ratio of CPU cores to network bandwidth is such that it does not affect the message rate in most cases.

Table 2 – Point-to-Point Non-Persistent Message Throughput (Output Msgs/Sec)

Message SizeMessage Size 100BMessage Size 1KBMessage Size 2KBMessage Size 10KBMessage Size 20KBMessage Size 50KB
PubSub+ Software (ESXi)2.6M850K450K100K*50K*20K*
PubSub+ Software (AWS m5.xlarge)1.1M140K**75K**15K**8K**3K**
PubSub+ Appliance12.5M6.8M3.8M875K*438K*160K*

* The results measured reached the bandwidth limit of the NAB on the 3560 Appliance, or the (v)NIC on the software broker.

** The AWS m5.xlarge instance type has a baseline network bandwidth of 1.25Gbps. It is possible for the m5.xlarge to burst up to 10Gbps using a burst credit system. However, these measurements were taken under non-burst conditions.

PubSub+ Software (EXSi) Point-to-Point Non-Persistent

PubSub+ Software (AWS) Point-to-Point Persistent

3560 Appliance Point-to-Point Non-Persistent

Broadly speaking, two resource limitations are associated with non-persistent messaging: CPU and bandwidth. CPU cycles are generally consumed per message. Bandwidth limits come into play as the message size increases; larger messages consume more bandwidth. From these results you can see that as expected the highest message rate is for small message sizes.

It is also interesting to graph these results and add in calculated bandwidth in Gbps. From this you can see that bandwidth utilization is increasing as the message size increases, until the bandwidth limit of the network is reached. The ESXi server has a 10Gbps NIC and the bandwidth limit is reached at 10K byte message payload size. In AWS, the broker quickly hits the network bandwidth limit of the m5.xlarge EC2 instance. Larger EC2 instances will perform better. For example, an m5.8xlarge instance, with its 10Gbps of network bandwidth, has a performance profile very similar to the ESXi results.

For PubSub+ Event Broker: Appliance, the results for the bandwidth are similar. The hardware I/O card, called the Network Acceleration Blade (NAB), has a messaging capacity of around 80 Gbps. So as the message size increases, the NAB is able to saturate the network bandwidth and reaches the maximum bandwidth at around the 10K byte message payload size. From there the output message rate is governed more by the bandwidth capabilities of the NAB than other factors.

Fan-out Non-Persistent (Direct) Messaging

The following table shows aggregate output message rate (messages/sec) for Solace Direct messages (non-persistent) as the message fan-out is varied. At a fan-out of 1, this scenario is equivalent to the point-to-point scenario above. Then more clients are added to consume the published messages. This increases the fan-out for each message. The broker’s maximum output is recorded for each fan-out of each message size as shown below. Again, twenty parallel groups of fan-out clients are used to avoid any individual client becoming the bottleneck in the test. Note that these measurements were taken without TLS encryption.

Table 3 – Fan-out Non-Persistent Message Throughput (Output Msgs/Sec)

Endpoints/Subscribers 1Endpoints/Subscribers 2Endpoints/Subscribers 5Endpoints/Subscribers 10Endpoints/Subscribers 50
Msg Size
(Bytes)
Endpoints/Subscribers
PubSub+ Software (ESXi)Msg Size (Bytes) 1002.6M3.8M5.8M6.5M7.5M
PubSub+ Software (ESXi) Msg Size (Bytes) 1K850K1.0M*1.0M*1.0M*1.0M*
PubSub+ Software (ESXi) Msg Size (Bytes) 2K450K540K*540K*540K*540K*
PubSub+ ApplianceMsg Size (Bytes) 10012M22M27M29M30.00M
PubSub+ Appliance Msg Size (Bytes) 1K6.8M7.4M8.2M8.3M*8.3M*
PubSub+ Appliance Msg Size (Bytes) 2K3.8M4.0M4.2M*4.2M*4.2M*

* The results measured reached the bandwidth limit of the NAB on the 3560 Appliance, or the (v)NIC on the software broker.

Graphing these results lets you analyze the overall impact of message delivery fan-out across message payload sizes. At low message sizes, the overhead of processing the small messages dominates. However, as message sizes increase, the limiting factor shifts to handling the bandwidth of the messages. This is seen on the graph when the curves flatten. The measurements for AWS were excluded because they are limited by the 1.25Gbps network bandwidth limit of the m5.xlarge instance in almost all cases (a bunch of flat curves are not interesting to look at). In the case of the 3560 appliances, output increases to the point of eventually being able to saturate the 80 Gbps bandwidth capacity of the NAB.

PubSub+ Software (EXSi) Fan-out Non-Persistent

3560 Appliance Fan-out Non-Persistent

Point-to-Point Persistent Messaging

The following table shows aggregate output message rate (messages/sec) for fully persistent messages as the message payload size is varied. The PubSub+ brokers are deployed in high availability (HA) groups. When running in HA mode, the publisher does not receive an ACK for a message until it is synchronously stored by two brokers. In this test, groups of clients are sending messages in a point-to-point fashion. Parallel sets of clients are used to generate the required traffic.

Table 4 – Point-to-Point Persistent Message Throughput (Output Msgs/Sec)

Message SizeMessage Size 512BMessage Size 1KBMessage Size 2KBMessage Size 4KBMessage Size 20KB
PubSub+ Software (ESXi)65K55K45K30K10K
PubSub+ Software
(AWS m5.xlarge)
50K45K38K28K5.5K
PubSub+ Appliance620K620K460K265K59K

Note: The PubSub+ Event Broker: Software measurements were taken while clients were connecting over TLS.

As with the non-persistent scenario, from these results you can see that the performance is highest for small message sizes as these are easier to process efficiently. Each of these messages must be saved to non-volatile storage prior to acknowledging the publisher so the overhead is much higher compared to non-persistent messages. Software performance for persistent messaging becomes a function not only of the compute power and network bandwidth, but also the IOPS, storage bandwidth, and latency of the platform on which the software broker is running.

It is also interesting to graph these results and add in calculated bandwidth in Gbps. From this you can see that bandwidth utilization is always increasing as the message size increases (network bandwidth saturation is not hit as it is with non-persistent messaging).

PubSub+ Software (ESXi) Point-to-Point Persistent

PubSub+ Software (AWS M5.xl) Point-to-Point Persistent

3560 Appliance Point-to-Point Persistent

For the PubSub+ Event Broker: Appliance, the results for the bandwidth are a little different. The hardware-guaranteed messaging card, called the Assured Delivery Blade or ADB, has a messaging capacity of around 9.5 Gbps. So as the message size increases, the ADB becomes the bottleneck, and the system reaches the maximum bandwidth at approximately the 4K Byte message payload size. From there the output message rate is governed more by the bandwidth capabilities of the ADB card than other factors.

Fan-out Persistent Messaging

The following table shows aggregate output message rate (messages/sec) for fully persistent messages as the message fan-out is varied. At a fan-out of 1, this test is equivalent to the point-to-point test above. Then more clients are added to consume the published messages. This increases the fan-out for each message. Maximum output is recorded for each fan-out and message size as shown below. Again, twenty parallel groups of fan-out clients are used to avoid any individual client becoming the bottleneck in the test.

Table 5 – Fan-out Persistent Message Throughput (Output Msgs/Sec)

Msg Size
(Bytes)
Endpoints/Subscribers 1Endpoints/Subscribers 2Endpoints/Subscribers 5Endpoints/Subscribers 10Endpoints/Subscribers 50
Endpoints/Subscribers
PubSub+ Software
(ESXi)
1K55K95K170K230K450K
PubSub+ Software
(ESXi)
2K45K80K140K200K435K
PubSub+ Software
(ESXi)
20K10K18K35K50K54K*
PubSub+ Appliance1K620K1.2M2.6M3.7M5.5M
PubSub+ Appliance2K465K880K2.0M3.4M4.1M*
PubSub+ Appliance20K60K115K290K430K*430K*

* The results measured reached the bandwidth limit of the NAB on the 3560 Appliance, or the (v)NIC on the software broker.

Note: The PubSub+ Event Broker: Software measurements were taken while clients were connecting over TLS.

Graphing these results lets you analyze the overall impact of message delivery fan-out across message payload sizes. In this scenario, the work of fanning out the messages is handled by the Network Acceleration Blade (NAB) in the 3560 appliance or the I/O software in the software version. This offloads the processing from the guaranteed messaging components and enables the message rate to continue to increase as fan-out increases. In the case of the appliance, output increases to the point of eventually being able to saturate the 80 Gbps bandwidth capacity of the NAB still all fully persistent.

PubSub+ Software (ESXi) Fan-out Persistent

3560 Appliance Fan-out Persistent

Appendix A: Test Setup

Performance metrics are only useful if you understand how they were measured and what test equipment was used. The following two figures outline the lab setup when running the tests. Instances of the SDKPerf tool were run on test host servers which were running Linux. Depending on the test requirements, different numbers of servers were used such that the clients never became the bottleneck. For the appliance scenarios, we set up a pair of 3560 appliances in the typical fault tolerant configuration with attached external storage. Up to 16 test hosts were required to generate the appropriate test load.

PubSub+ appliance -- advanced event broker

For the software results, a VMware ESXi server was used to host the PubSub+ VM image. For these scenarios, fewer test hosts were required for the traffic generation due to the difference in the performance capabilities.

https://solace.com/wp-content/uploads/2021/06/pubsub-vmware-esxi-sdkperf.png

In both cases, a Mellanox SX1016 10GigE Ethernet switch was used to provide the LAN connectivity.

The following tables give the full details of each of the hardware used in testing.

Solace PubSub+ Software (4 core, 12 GB, ESXi)

PlatformSolace PubSub+ Enterprise Edition
HypervisorVMWare, Inc. VMware Virtual Platform
CPUsIntel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz with Hyperthreading
Memory32 GB
Datastore40 GB on an Intel SSD 900P Series
Network InterfaceIntel Corporation 82599EB 10-Gigabit SFI/SFP+
VM Settings4 cores, 12 GB RAM
APIAll tests use Java clients (Solace JCSMP API)

Solace PubSub+ Software (Amazon EC2 Instance)

PlatformSolace PubSub+ Enterprise Edition
Amazon EC2 Instance TypeM5.xlarge
Datastoregp2 EBS Volume – 1000GB
NetworkingEnhanced Networking enabled

Solace PubSub+ Appliances

Solace Message Router Appliances
PlatformSolace 3560
NAB (I/O Card)NAB-0810EM-01-A (8x10GE)
ADB (Guaranteed Messaging Card)ADB-000004-01-A (ADB-4)
HBA ConnectivityHBA-0208FC-02-A (8Gb)
Product KeysGM650K Performance Key

Storage Area Network (SAN) Characteristics
ControllerRedundant controllers and write-back cache
RAIDRAID 10, 4 + 4 Stripe Size
DrivesSerial Attached SCSI (SAS), 10K RPM
Connectivity8 Gbps Fibre Channel

Required Test Equipment

Up to 16 performance test hosts were used to generate and sink the required message rate for the various scenarios. They all had characteristics that matched the following specifications:

Performance Test Hosts (Load Generation Hosts)
CPUIntel Core i7-3930K CPU (6 cores) @ 3.20GHz (HT Disabled)
Memory16 GB
Network InterfaceEthernet Converged Network Adapter X520-DA2 (10 GigE)
Host OSCentOS 7

Connecting all the equipment together with a Mellanox 10GigE Ethernet Switch.

Ethernet Switch
ModelMellanox SX1016
Description64-port non-blocking 10GigE Ethernet Switch