Summary of Results
Detailed descriptions of test methodology, parameters and results are provided below, but here is a summary of the results:
Table 1 – Summary of Results
Solace PubSub+ Software (ESXi) | Solace PubSub+ Software (AWS m5.xlarge) | Solace PubSub+ Appliance (3560) |
|
---|---|---|---|
Messages Per Second | |||
Non-persistent (Direct) point-to-point | Messages Per Second: 1,800,000 (2,600,000 w/o TLS) | Messages Per Second: 1,100,000 | Messages Per Second: 12,500,000 |
Non-persistent (Direct) fan-out | Messages Per Second: 7,500,000 | Messages Per Second: 1,200,000 | Messages Per Second: 29,000,000 |
Persistent message point-to-point | Messages Per Second: 55,000 | Messages Per Second: 45,000 | Messages Per Second: 620,000 |
Persistent message fan-out | Messages Per Second: 450,000 | Messages Per Second: 350,000 | Messages Per Second: 5,500,000 |
These results show that Solace PubSub+ Event Broker: Software offers very fast, scalable performance. In general, it behaves a lot like the 3560 appliances, just at lower message rates. That isn’t surprising since PubSub+ Software is based on the same underlying code that runs our appliances. And while the PubSub+ performance numbers in AWS are lower than in ESXi, this is not so much a limitation of the software, but the network and storage bandwidth limits of the AWS EC2 instance, and the IOPS limits of the EBS volume.
We chose to perform our testing in AWS using an m5.xlarge EC2 instance, which offers a good balance between performance and cost. An EC2 instance with higher network bandwidth capacity would yield higher performance numbers in high fan-out usecases, and in point-to-point non-persistent messaging usecases with larger message sizes. Similarly, we chose a gp2 EBS storage volume with 1000GB capacity which offers good performance at a reasonable cost for the persistent message store. A notable difference between the testing performed on the PubSub+ Event Broker: Appliance and the PubSub+ Event Broker: Software is that the tests against the Software Broker were performed using TLS encryption, which will further decrease the performance relative to the appliance, but represents a more realistic scenario as most choose to encrypt their traffic when using software (especially when running in the public cloud).
PubSub+ Event Broker: Appliance offers the highest possible throughput and performance, so if you want to support massive messaging capacity or scale across many applications in a compact footprint, this purpose-built hardware is for you. If you need less performance, want to deploy messaging into small offices or IoT, or scale your system horizontally in a private or public cloud, then the software or managed service is for you. The beauty is you can mix and match, or switch from one to another, without application or management impact.
The following is a brief introduction to the performance scenarios to help you understand the results.
The first set of scenarios cover Non-Persistent messaging using the Solace Direct message delivery mode. The results cover point-to-point and fan-out message traffic.
The second set of scenarios cover Persistent messaging. Here again the results cover point-to-point and fan-out message traffic.
Methodology
These performance results look at message throughput for point-to-point and fan-out scenarios. The results measure PubSub+ in terms of messages per second. These message exchange patterns are explained in more detail in our core concepts page but this section briefly describes these two patterns as they apply to the performance scenarios.
Point-to-Point
A point-to-point flow is a single publishing client sending messages to the PubSub+ message broker which are received by a single subscribing client.
Fan-Out
Fan-out is the typical publish/subscribe scenario where a message is published once to the PubSub+ message broker and forwarded to multiple clients who have expressed interest in receiving messages of that kind through topic subscriptions. So the message is then fanned out to the clients by the PubSub+ message broker, and each client receives a copy of the message.
SDKPerf Test Tool
For all scenarios, the Solace test tool SDKPerf was used as the traffic source and sink. Because we take performance very seriously, over many years we have built SDKPerf into a very capable tool that can be used to test many different messaging APIs. You can find out more about the tool on our SDKPerf overview and you can download the tool and try it yourself as well. For these tests, sdkperf_c was used.
For each scenario, SDKPerf measures the following:
- The received message rate
- Message loss using message ordering information that publisher instances of SDKPerf embed in the messages they send
- Message payload integrity by confirming the received message payloads are unmodified by the messaging system
This ensures that the messaging system is running fast and correctly delivering all messages to the correct consumers without corruption.
Message Delivery Modes
The scenarios in this article cover two types of messaging delivery modes:
- Non-Persistent – Which is “at most once” messaging. PubSub+ message brokers provide two options for “at most once” messaging. Both direct messaging and non-persistent messaging achieve this level of service. The results in this article use direct messaging which is the higher performance option.
- Persistent – Which is “at least once” messaging. To achieve full persistence, the messaging system must issue an acknowledgement to the message producer once both PubSub+ message brokers have fully persisted the message to non-volatile storage. And messages for consumers must not be removed from the message broker until an acknowledgement is received. Fully persisted means the message is flushed to disk. Often this option is called failsafe messaging in other message brokers. Simply caching the message in the filesystem is not equivalent and can lead to message loss in failure scenarios. Therefore, in this article when results are listed for Persistent messages this is for fully failsafe messaging with no risk of message loss.
Performance Scenarios
In the performance scenarios below, the goal is to measure the maximum performance of the PubSub+ message broker in each scenario. To achieve this, we use groups of point-to-point or fan-out clients all running in parallel. Each group of clients is monitored to make sure it is running at a rate which it can handle to avoid an individual client becoming the bottleneck and skewing the performance results artificially.
For point-to-point flows, this means you end up with the scenario illustrated in the following figure.
In this figure, client groups are depicted using colors. The figure shows a green, grey and blue group. Each group has a single publishing client sending to a single receiving client. The test uses enough groups of clients such that the broker becomes the bottleneck, not the clients. When you monitor the PubSub+ broker and observe the aggregate messages per second across all clients, you determine the maximum message rate a broker can sustain for the point-to-point scenario.
Similarly for fan-out scenarios, you will end up with the scenario illustrated in the following figure.
Again, in this figure, client groups are depicted using colors. The figure shows a green, grey and blue group. And again, the number of client groups is scaled as required until the Solace message broker maximum fan-out rate is observed. The maximum fan-out rate is the total output message rate for the Solace message brokers. Within each group, the number of subscribing clients is dictated by the fan-out factor being measured. So, a fan-out factor of 10 will require 10 subscribing clients per group. To figure out how many messages per second are being sent by clients to the Solace message router, simply take the output message rate and divide by the fan-out factor. This is the input rate.
At Solace we believe in full transparency for all aspects of performance results, so comprehensive details on the exact hardware, setup, and methodology are described below in the section Appendix A: Test Setup.
Point-to-Point Non-Persistent (Direct) Messaging
The following table shows the aggregate output message rate (messages/sec) for Solace direct messages (non-persistent) as the message payload size is varied. In this test, groups of clients are sending messages in a point-to-point fashion. Twenty parallel sets of clients are used to generate the required traffic. Note that the AWS measurements were taken over TLS while the ESXi (and Appliance) measurements were not. Most users prefer to use TLS in public cloud environments and the ratio of CPU cores to network bandwidth is such that it does not affect the message rate in most cases.
Table 2 – Point-to-Point Non-Persistent Message Throughput (Output Msgs/Sec)
Message Size | Message Size 100B | Message Size 1KB | Message Size 2KB | Message Size 10KB | Message Size 20KB | Message Size 50KB |
---|---|---|---|---|---|---|
PubSub+ Software (ESXi) | 2.6M | 850K | 450K | 100K* | 50K* | 20K* |
PubSub+ Software (AWS m5.xlarge) | 1.1M | 140K** | 75K** | 15K** | 8K** | 3K** |
PubSub+ Appliance | 12.5M | 6.8M | 3.8M | 875K* | 438K* | 160K* |
* The results measured reached the bandwidth limit of the NAB on the 3560 Appliance, or the (v)NIC on the software broker.
** The AWS m5.xlarge instance type has a baseline network bandwidth of 1.25Gbps. It is possible for the m5.xlarge to burst up to 10Gbps using a burst credit system. However, these measurements were taken under non-burst conditions.
Broadly speaking, two resource limitations are associated with non-persistent messaging: CPU and bandwidth. CPU cycles are generally consumed per message. Bandwidth limits come into play as the message size increases; larger messages consume more bandwidth. From these results you can see that as expected the highest message rate is for small message sizes.
It is also interesting to graph these results and add in calculated bandwidth in Gbps. From this you can see that bandwidth utilization is increasing as the message size increases, until the bandwidth limit of the network is reached. The ESXi server has a 10Gbps NIC and the bandwidth limit is reached at 10K byte message payload size. In AWS, the broker quickly hits the network bandwidth limit of the m5.xlarge EC2 instance. Larger EC2 instances will perform better. For example, an m5.8xlarge instance, with its 10Gbps of network bandwidth, has a performance profile very similar to the ESXi results.
For PubSub+ Event Broker: Appliance, the results for the bandwidth are similar. The hardware I/O card, called the Network Acceleration Blade (NAB), has a messaging capacity of around 80 Gbps. So as the message size increases, the NAB is able to saturate the network bandwidth and reaches the maximum bandwidth at around the 10K byte message payload size. From there the output message rate is governed more by the bandwidth capabilities of the NAB than other factors.
Fan-out Non-Persistent (Direct) Messaging
The following table shows aggregate output message rate (messages/sec) for Solace Direct messages (non-persistent) as the message fan-out is varied. At a fan-out of 1, this scenario is equivalent to the point-to-point scenario above. Then more clients are added to consume the published messages. This increases the fan-out for each message. The broker’s maximum output is recorded for each fan-out of each message size as shown below. Again, twenty parallel groups of fan-out clients are used to avoid any individual client becoming the bottleneck in the test. Note that these measurements were taken without TLS encryption.
Table 3 – Fan-out Non-Persistent Message Throughput (Output Msgs/Sec)
Endpoints/Subscribers 1 | Endpoints/Subscribers 2 | Endpoints/Subscribers 5 | Endpoints/Subscribers 10 | Endpoints/Subscribers 50 | ||
---|---|---|---|---|---|---|
Msg Size (Bytes) | Endpoints/Subscribers | |||||
PubSub+ Software (ESXi) | Msg Size (Bytes) 100 | 2.6M | 3.8M | 5.8M | 6.5M | 7.5M |
PubSub+ Software (ESXi) | Msg Size (Bytes) 1K | 850K | 1.0M* | 1.0M* | 1.0M* | 1.0M* |
PubSub+ Software (ESXi) | Msg Size (Bytes) 2K | 450K | 540K* | 540K* | 540K* | 540K* |
PubSub+ Appliance | Msg Size (Bytes) 100 | 12M | 22M | 27M | 29M | 30.00M |
PubSub+ Appliance | Msg Size (Bytes) 1K | 6.8M | 7.4M | 8.2M | 8.3M* | 8.3M* |
PubSub+ Appliance | Msg Size (Bytes) 2K | 3.8M | 4.0M | 4.2M* | 4.2M* | 4.2M* |
* The results measured reached the bandwidth limit of the NAB on the 3560 Appliance, or the (v)NIC on the software broker.
Graphing these results lets you analyze the overall impact of message delivery fan-out across message payload sizes. At low message sizes, the overhead of processing the small messages dominates. However, as message sizes increase, the limiting factor shifts to handling the bandwidth of the messages. This is seen on the graph when the curves flatten. The measurements for AWS were excluded because they are limited by the 1.25Gbps network bandwidth limit of the m5.xlarge instance in almost all cases (a bunch of flat curves are not interesting to look at). In the case of the 3560 appliances, output increases to the point of eventually being able to saturate the 80 Gbps bandwidth capacity of the NAB.
Point-to-Point Persistent Messaging
The following table shows aggregate output message rate (messages/sec) for fully persistent messages as the message payload size is varied. The PubSub+ brokers are deployed in high availability (HA) groups. When running in HA mode, the publisher does not receive an ACK for a message until it is synchronously stored by two brokers. In this test, groups of clients are sending messages in a point-to-point fashion. Parallel sets of clients are used to generate the required traffic.
Table 4 – Point-to-Point Persistent Message Throughput (Output Msgs/Sec)
Message Size | Message Size 512B | Message Size 1KB | Message Size 2KB | Message Size 4KB | Message Size 20KB |
---|---|---|---|---|---|
PubSub+ Software (ESXi) | 65K | 55K | 45K | 30K | 10K |
PubSub+ Software (AWS m5.xlarge) | 50K | 45K | 38K | 28K | 5.5K |
PubSub+ Appliance | 620K | 620K | 460K | 265K | 59K |
Note: The PubSub+ Event Broker: Software measurements were taken while clients were connecting over TLS.
As with the non-persistent scenario, from these results you can see that the performance is highest for small message sizes as these are easier to process efficiently. Each of these messages must be saved to non-volatile storage prior to acknowledging the publisher so the overhead is much higher compared to non-persistent messages. Software performance for persistent messaging becomes a function not only of the compute power and network bandwidth, but also the IOPS, storage bandwidth, and latency of the platform on which the software broker is running.
It is also interesting to graph these results and add in calculated bandwidth in Gbps. From this you can see that bandwidth utilization is always increasing as the message size increases (network bandwidth saturation is not hit as it is with non-persistent messaging).
For the PubSub+ Event Broker: Appliance, the results for the bandwidth are a little different. The hardware-guaranteed messaging card, called the Assured Delivery Blade or ADB, has a messaging capacity of around 9.5 Gbps. So as the message size increases, the ADB becomes the bottleneck, and the system reaches the maximum bandwidth at approximately the 4K Byte message payload size. From there the output message rate is governed more by the bandwidth capabilities of the ADB card than other factors.
Fan-out Persistent Messaging
The following table shows aggregate output message rate (messages/sec) for fully persistent messages as the message fan-out is varied. At a fan-out of 1, this test is equivalent to the point-to-point test above. Then more clients are added to consume the published messages. This increases the fan-out for each message. Maximum output is recorded for each fan-out and message size as shown below. Again, twenty parallel groups of fan-out clients are used to avoid any individual client becoming the bottleneck in the test.
Table 5 – Fan-out Persistent Message Throughput (Output Msgs/Sec)
Msg Size (Bytes) | Endpoints/Subscribers 1 | Endpoints/Subscribers 2 | Endpoints/Subscribers 5 | Endpoints/Subscribers 10 | Endpoints/Subscribers 50 | |
---|---|---|---|---|---|---|
Endpoints/Subscribers | ||||||
PubSub+ Software (ESXi) | 1K | 55K | 95K | 170K | 230K | 450K |
PubSub+ Software (ESXi) | 2K | 45K | 80K | 140K | 200K | 435K |
PubSub+ Software (ESXi) | 20K | 10K | 18K | 35K | 50K | 54K* |
PubSub+ Appliance | 1K | 620K | 1.2M | 2.6M | 3.7M | 5.5M |
PubSub+ Appliance | 2K | 465K | 880K | 2.0M | 3.4M | 4.1M* |
PubSub+ Appliance | 20K | 60K | 115K | 290K | 430K* | 430K* |
* The results measured reached the bandwidth limit of the NAB on the 3560 Appliance, or the (v)NIC on the software broker.
Note: The PubSub+ Event Broker: Software measurements were taken while clients were connecting over TLS.
Graphing these results lets you analyze the overall impact of message delivery fan-out across message payload sizes. In this scenario, the work of fanning out the messages is handled by the Network Acceleration Blade (NAB) in the 3560 appliance or the I/O software in the software version. This offloads the processing from the guaranteed messaging components and enables the message rate to continue to increase as fan-out increases. In the case of the appliance, output increases to the point of eventually being able to saturate the 80 Gbps bandwidth capacity of the NAB still all fully persistent.
Appendix A: Test Setup
Performance metrics are only useful if you understand how they were measured and what test equipment was used. The following two figures outline the lab setup when running the tests. Instances of the SDKPerf tool were run on test host servers which were running Linux. Depending on the test requirements, different numbers of servers were used such that the clients never became the bottleneck. For the appliance scenarios, we set up a pair of 3560 appliances in the typical fault tolerant configuration with attached external storage. Up to 16 test hosts were required to generate the appropriate test load.
For the software results, a VMware ESXi server was used to host the PubSub+ VM image. For these scenarios, fewer test hosts were required for the traffic generation due to the difference in the performance capabilities.
In both cases, a Mellanox SX1016 10GigE Ethernet switch was used to provide the LAN connectivity.
The following tables give the full details of each of the hardware used in testing.
Solace PubSub+ Software (4 core, 12 GB, ESXi)
Platform | Solace PubSub+ Enterprise Edition |
---|---|
Hypervisor | VMWare, Inc. VMware Virtual Platform |
CPUs | Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz with Hyperthreading |
Memory | 32 GB |
Datastore | 40 GB on an Intel SSD 900P Series |
Network Interface | Intel Corporation 82599EB 10-Gigabit SFI/SFP+ |
VM Settings | 4 cores, 12 GB RAM |
API | All tests use Java clients (Solace JCSMP API) |
Solace PubSub+ Software (Amazon EC2 Instance)
Platform | Solace PubSub+ Enterprise Edition |
---|---|
Amazon EC2 Instance Type | M5.xlarge |
Datastore | gp2 EBS Volume – 1000GB |
Networking | Enhanced Networking enabled |
Solace PubSub+ Appliances
Solace Message Router Appliances | |
---|---|
Platform | Solace 3560 |
NAB (I/O Card) | NAB-0810EM-01-A (8x10GE) |
ADB (Guaranteed Messaging Card) | ADB-000004-01-A (ADB-4) |
HBA Connectivity | HBA-0208FC-02-A (8Gb) |
Product Keys | GM650K Performance Key |
Storage Area Network (SAN) Characteristics | |
---|---|
Controller | Redundant controllers and write-back cache |
RAID | RAID 10, 4 + 4 Stripe Size |
Drives | Serial Attached SCSI (SAS), 10K RPM |
Connectivity | 8 Gbps Fibre Channel |
Required Test Equipment
Up to 16 performance test hosts were used to generate and sink the required message rate for the various scenarios. They all had characteristics that matched the following specifications:
Performance Test Hosts (Load Generation Hosts) | |
---|---|
CPU | Intel Core i7-3930K CPU (6 cores) @ 3.20GHz (HT Disabled) |
Memory | 16 GB |
Network Interface | Ethernet Converged Network Adapter X520-DA2 (10 GigE) |
Host OS | CentOS 7 |
Connecting all the equipment together with a Mellanox 10GigE Ethernet Switch.
Ethernet Switch | |
---|---|
Model | Mellanox SX1016 |
Description | 64-port non-blocking 10GigE Ethernet Switch |