This article provides performance numbers for PubSub+ software and appliances. It focuses on throughput and aims to help you understand the point-to-point and fan-out performance of PubSub+ message brokers.
Detailed descriptions of test methodology, parameters and results are provided below, but here is a summary of results:
Table 1 – Summary of Results
Solace PubSub+ (ESXi) | Solace PubSub+ (AWS m4.4xlarge) | Solace PubSub+ Appliance (3560) |
|
---|---|---|---|
Messages Per Second | |||
Non-persistent (Direct) point-to-point | Messages Per Second: 1,540,000 | Messages Per Second: 1,328,000 | Messages Per Second: 9,290,000 |
Non-persistent (Direct) fan-out | Messages Per Second: 5,200,000 | Messages Per Second: 1,897,000 | 28,400,000Messages Per Second: |
Persistent message point-to-point | Messages Per Second: 145,000 | Messages Per Second: 68,000 | Messages Per Second: 646,000 |
Persistent message fan-out | Messages Per Second: 858,000 | Messages Per Second: 288,000 | Messages Per Second: 5,530,000 |
These results show that Solace PubSub+ software offers very fast, scalable performance. In general, it behaves a lot like the 3560 appliances, just at lower message rates. That isn’t surprising since PubSub+ is based on the same underlying software that runs our appliances. And while PubSub+ performance numbers in AWS are lower than in ESXi, this is not so much a limitation of the software, but the network and storage bandwidth limits of the AWS EC2 instance, and the IOPS limits of the EBS volume.
We chose to perform our test using an m4.4xlarge EC2 instance, which offers a good balance between performance and cost. An EC2 instance with higher network bandwidth capacity would yield higher performance numbers in high fan-out use-cases, and in point-to-point non-persistent messaging use-cases with larger message sizes. Similarly, we chose a gp2 EBS storage volume for the persistent message store. An EBS volume with higher IOPS would increase persistent messaging performance.
PubSub+ appliances offer the highest possible throughput and performance, so if you want to support massive messaging capacity or scale across many applications in a compact footprint, this purpose-built hardware is for you. If you need less performance, want to deploy messaging into small offices or IoT, or scale your system horizontally in a private or public cloud, then the software or managed service is for you. The beauty is you can mix and match, or switch from one to another, without application or management impact.
The following is a brief introduction to the performance scenarios to help you understand the results.
The first set of scenarios cover Non-Persistent messaging using the Solace Direct message delivery mode. The results cover point-to-point and fan-out message traffic.
The second set of scenarios cover Persistent messaging. Here again the results cover point-to-point and fan-out message traffic. Additionally, often for Persistent messaging, people are interested in how fast messages can be saved to disk and separately how fast messages can be retrieved from disk and delivered to clients. For the PubSub+ software broker, the persistent store is also just a “disk,” but for our hardware appliance, it is a combination of custom hardware backed by a disk. Regardless of the technology, we call the persistent store the Message Spool and the results below include these two scenarios for various numbers of Persistent Queues in the system.
These performance results look at message throughput for point-to-point and fan-out scenarios. The results measure PubSub+ in terms of messages per second. These message exchange patterns are explained in more detail in our core concepts page but this section briefly describes these two patterns as they apply to the performance scenarios.
A point-to-point flow is a single publishing client sending messages to the PubSub+ message broker which are received by a single subscribing client.
Fan-out is the typical publish/subscribe scenario where a message is published once to the PubSub+ message broker and forwarded to multiple clients who have expressed interest in receiving messages of that kind through topic subscriptions. So the message is then fanned out to the clients by the PubSub+ message broker, and each client receives a copy of the message.
For all scenarios, the Solace test tool SDKPerf was used as the traffic source and sink. Because we take performance very seriously, over many years we’ve built SDKPerf into a very capable tool that can be used to test many different messaging APIs. You can find out more about the tool on our SDKPerf overview and you can download the tool and try it yourself as well. For these tests, sdkperf_c was used.
For each scenario, SDKPerf measures the following:
This ensures that the messaging system is running fast and correctly delivering all messages to the correct consumers without corruption.
The scenarios in this article cover two types of messaging delivery modes:
In the performance scenarios below, the goal is to measure the maximum performance of the PubSub+ message broker in each scenario. To achieve this, we use groups of point-to-point or fan-out clients all running in parallel. Each group of clients is monitored to make sure it is running at a rate which it can handle to avoid an individual client becoming the bottleneck and skewing the performance results artificially.
For point-to-point flows, this means you end up with the scenario illustrated in the following figure.
In this figure, client groups are depicted using colors. The figure shows an orange, blue and grey group. Each group has a single publishing client sending to a single receiving client. The test uses enough groups of clients such that the broker becomes the bottleneck, not the clients. When you monitor the PubSub+ broker and observe the aggregate messages per second across all clients then you determine the maximum message rate a broker can sustain for the point-to-point scenario.
Similarly for fan-out scenarios, you will end up with the scenario illustrated in the following figure.
Again in this figure, client groups are depicted using colors. The figure shows an orange, blue and grey group. And again, the number of client groups is scaled as required until the Solace message router maximum fan-out rate is observed. The maximum fan-out rate is the total output message rate for the Solace message routers. Within each group, the number of subscribing clients is dictated by the fan-out factor being measured. So a fan-out factor of 10 will require 10 subscribing clients per group. To figure out how many messages per second are being sent by clients to the Solace message router, simply take the output message rate and divide by the fan-out factor. This is the input rate.
At Solace we believe in full transparency for all aspects of performance results, so comprehensive details on the exact hardware, setup and methodology are described below in the section Appendix A: Test Setup.
The following table shows the aggregate output message rate (messages/sec) for Solace direct messages (non-persistent) as the message payload size is varied. In this test groups of clients are sending messages in a point-to-point fashion. Twenty parallel sets of clients are used to generate the required traffic.
Table 2 – Point-to-Point Non-Persistent Message Throughput (Output Msgs/Sec)
Message Size | Message Size 100B | Message Size 1KB | Message Size 2KB | Message Size 10KB | Message Size 20KB | Message Size 50KB |
---|---|---|---|---|---|---|
Software (ESXi) | 1.54M | 784K | 474K | 104K* | 53K* | 21K* |
Software (AWS m4.4xlarge) | 1.33M | 234K* | 119K* | 24K* | 12K* | 4,900* |
Appliance | 9.29M | 4.05M | 2.44M | 885K* | 443K* | 170K* |
* The results measured reached the bandwidth limit of the NAB on the 3560 Appliance, or the (v)NIC on the software broker.
From these results you can see that as expected the performance is fastest for small message sizes, as these are easier for the Solace message routers to process efficiently.
It is also interesting to graph these results and add in calculated bandwidth in Gbps. From this you can see that bandwidth utilization in ESXi is always increasing as the message size increases, until it reaches the 10Gbps bandwidth limit of the NIC at the 10K byte message payload size. In AWS, the broker quickly hits the 2Gbps network bandwidth limit of the m4.4xlarge EC2 instance. Larger EC2 instances will perform better. For example, an m4.10xlarge instance, with its 10Gbps of network bandwidth, has a performance profile very similar to the ESXi results.
For the PubSub+ appliance, the results for the bandwidth are similar. The hardware I/O card, called the Network Acceleration Blade (NAB), has a messaging capacity of around 80 Gbps. So as the message size increases, the NAB is able to saturate the network bandwidth and reaches the maximum bandwidth at around the 10K byte message payload size. From there the output message rate is governed more by the bandwidth capabilities of the NAB than other factors.
The following table shows aggregate output message rate (messages/sec) for Solace Direct messages (non-persistent) as the message fan-out is varied. At a fan-out of 1, this scenario is equivalent to the point-to-point scenario above. Then more clients are added to consume the published messages. This increases the fan-out for each message. The broker’s maximum output is recorded for each fan-out of each message size as shown below. Again twenty parallel groups of fan-out clients are used to avoid any individual client becoming the bottleneck in the test.
Table 3 – Fan-out Non-Persistent Message Throughput (Output Msgs/Sec)
Endpoints/Subscribers 1 | Endpoints/Subscribers 2 | Endpoints/Subscribers 5 | Endpoints/Subscribers 10 | Endpoints/Subscribers 50 | Endpoints/Subscribers 100 | ||
---|---|---|---|---|---|---|---|
Msg Size (Bytes) | Endpoints/Subscribers | ||||||
Software (ESXi) | Msg Size (Bytes) 100 | 1.54M | 2.62M | 3.20M | 4.27M | 4.85M | 5.2M |
Software (ESXi) | Msg Size (Bytes) 1K | 784K | 1.04K | 1.09M* | 1.10M* | 1.10M* | 1.08M* |
Software (ESXi) | Msg Size (Bytes) 2K | 474K | 542K* | 553K* | 546K* | 552K* | 545K* |
Appliance | Msg Size (Bytes) 100 | 9.29M | 15.90M | 24.20M | 27.10M | 28.00M | 28.40M |
Appliance | Msg Size (Bytes) 1K | 4.05M | 5.51M | 7.93M | 7.82M | 8.00M | 8.66M* |
Appliance | Msg Size (Bytes) 2K | 2.44M | 3.80M | 4.00M | 4.02M | 4.06M | 4.39M* |
* The results measured reached the bandwidth limit of the NAB on the 3560 Appliance, or the (v)NIC on the software broker.
Graphing these results lets you analyze the overall impact of message delivery fan-out across message payload sizes. At low message sizes, the overhead of processing the small messages dominates. However, as message sizes increase the limiting factor shifts to handling the bandwidth of the messages. This is seen on the graph when the curves flatten.
In the case of the 3560 appliances, output increases to the point of eventually being able to saturate the 80 Gbps bandwidth capacity of the NAB.
The following table shows aggregate output message rate (messages/sec) for fully persistent messages as the message payload size is varied. In this test, groups of clients are sending messages in a point-to-point fashion. Twenty parallel sets of clients are used to generate the required traffic.
Table 4 – Point-to-Point Persistent Message Throughput (Output Msgs/Sec)
Message Size | Message Size 512B | Message Size 1KB | Message Size 2KB | Message Size 4KB | Message Size 20KB |
---|---|---|---|---|---|
Software (ESXi w/o HA) | 145K | 130K | 101K | 76K | 26K |
Software (ESXi with HA) | 97K | 84K | 62K | 43K | 10K |
Software (AWS m4.4xlarge w/o HA) | 68K | 61K | 53K | 31K | 7K |
Software (AWS m4.4xlarge with HA) | 43K | 43K | 31K | 19K | 4K |
Appliance (with or w/o HA) | 646K | 645K | 502K | 283K | 60K |
As with the non-persistent scenario, from these results you can see that the performance is fastest for small message sizes as these are easier to process efficiently. Each of these messages must be saved to non-volatile storage prior to acknowledging the publisher so the overhead is much higher compared to non-persistent messages, and software performance becomes a function not only of the compute power and network bandwidth available, but also the IOPS and storage bandwidth capacity of the platform on which the software broker is running.
It is also interesting to graph these results and add in calculated bandwidth in Gbps. From this you can see that bandwidth utilization is always increasing as the message size increases.
For the PubSub+ appliance, the results for the bandwidth are a little different. The hardware guaranteed messaging card, called the Assured Delivery Blade or ADB, has a messaging capacity of around 9.5 Gbps. So as the message size increases, the ADB becomes the bottleneck and the system reaches the maximum bandwidth at approximately the 4K Byte message payload size. From there the output message rate is governed more by the bandwidth capabilities of the ADB card than other factors.
The following table shows aggregate output message rate (messages/sec) for fully persistent messages as the message fan-out is varied. At a fan-out of 1, this test is equivalent to the point-to-point test above. Then more clients are added to consume the published messages. This increases the fan-out for each message. Maximum output is recorded for each fan-out and message size as shown below. Again twenty parallel groups of fan-out clients are used to avoid any individual client becoming the bottleneck in the test.
Table 5 – Fan-out Persistent Message Throughput (Output Msgs/Sec)
Msg Size (Bytes) | Endpoints/Subscribers 1 | Endpoints/Subscribers 2 | Endpoints/Subscribers 5 | Endpoints/Subscribers 10 | Endpoints/Subscribers 50 | |
---|---|---|---|---|---|---|
Endpoints/Subscribers | ||||||
PubSub+ (ESXi w/o HA) | 1K | 130K | 227K | 453K | 652K | 858K |
PubSub+ (ESXi w/o HA) | 2K | 101K | 181K | 365K | 483K | 541K* |
PubSub+ (ESXi w/o HA) | 20K | 26K | 46K | 55K* | 56K* | 56K* |
PubSub+ (ESXi with HA) | 1K | 84K | 174K | 361K | 493K | 695K |
PubSub+ (ESXi with HA) | 2K | 62K | 132K | 284K | 376K | 462K |
PubSub+ (ESXi with HA) | 20K | 10K | 21K | 44K | 50K | 53K* |
PubSub+ Appliance (with or w/o HA) | 1K | 645K | 1.25M | 2.69M | 3.87M | 5.53M |
PubSub+ Appliance (with or w/o HA) | 2K | 502K | 938K | 1.85M | 3.10M | 4.18M* |
PubSub+ Appliance (with or w/o HA) | 20K | 60K | 120K | 301K | 316K | 434K* |
* The results measured reached the bandwidth limit of the NAB on the 3560 Appliance, or the (v)NIC on the software broker.
Graphing these results lets you analyze the overall impact of message delivery fan-out across message payload sizes. In this scenario the work of fanning out the messages is handled by the Network Acceleration Blade (NAB) in the 3560 appliance or the I/O software in the software version. This offloads the processing from the guaranteed messaging components and enables the message rate to continue to increase as fan-out increases. In the case of the appliance, output increases to the point of eventually being able to saturate the 80 Gbps bandwidth capacity of the NAB still all fully persistent.
These tests focus specifically on how fast a PubSub+ message broker can save messages to its non-volatile storage called the Message Spool. For the Spool scenarios, publishing clients are sending persistent messages to the broker destined for durable queues. Since there are no consumers connected to the queues, the message broker must save these messages and the state of each queue in its Message Spool. Then for the Unspool scenario, all the publishers are stopped and this time consuming clients are connected to the queues containing the saved messages. These consuming clients try to receive the messages as fast as possible. This highlights the maximum rate at which PubSub+ message brokers can retrieve messages from the Message Spool and deliver them to consumers. This is in essence the rate at which clients can recover following some application outage.
The following table shows the messages/sec observed in aggregate across the Solace message router. The messages all have a payload size of 2K bytes. The results are captured for a single queue, 10 queues, and 50 queues to give you an idea of how these rates vary by number of queues. In all cases messages are sent directly to individual queues in a point-to-point fashion.
Table 6 – Message Spool Read & Write Performance (Msgs/Sec)
Scenario | Endpoints/Subscribers 1 | Endpoints/Subscribers 10 | Endpoints/Subscribers 50 | |
---|---|---|---|---|
Endpoints/Subscribers | ||||
Solace PubSub+ (ESXi without HA) | Spool | 123K | 122K | 123K |
Solace PubSub+ (ESXi without HA) | Unspool | 221K | 215K | 167K |
Solace PubSub+ (ESXi with HA) | Spool | 82K | 79K | 80K |
Solace PubSub+ (ESXi with HA) | Unspool | 218K | 208K | 147K |
Solace PubSub+ Appliance (with or w/o HA) | Spool | 519K | 515K | 509K |
Solace PubSub+ Appliance (with or w/o HA) | Unspool | 305K | 232K | 229K |
One of the key differentiators of PubSub+ is its handling of consumers that are slow or offline. PubSub+ never impact publishers’ ability to send messages or online consumers’ ability to receive messages. That’s why you will see flat lines in the following graphs for message spool performance. When unspooling messages from the message spool, the message broker will recover clients as fast as possible without impacting online clients. Therefore, unspool performance varies more but is still impressive.
Performance metrics are only useful if you understand how they were measured and what test equipment was used. The following two figures outline the lab setup when running the tests. Instances of the SDKPerf tool were run on test host servers which were running Linux. Depending on the test requirements, different numbers of servers were used such that the clients were never the bottleneck. For the appliance scenarios, a pair of 3560 appliances we set up in the typical fault tolerant configuration with attached external storage. Up to 12 test hosts were required to generate the appropriate test load.
For the software results, a VMware ESXi server was used to host the PubSub+ VM image. For these scenarios, less tests hosts were required for the traffic generation due to the difference in the performance capabilities.
In both cases, a Mellanox SX1016 10GigE Ethernet switch was used to provide the LAN connectivity.
The following tables give full details of each of the hardware used in testing.
Platform | Solace PubSub+ Enterprise Edition |
---|---|
Hypervisor | VMWare, Inc. VMware Virtual Platform |
BIOS Version | 6.0.0 |
CPUs | Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz with Hyperthreading |
Memory | 32 GB |
Datastore | 40 GB Intel SSD 750 Series |
Network Interface | Intel Corporation 82599EB 10-Gigabit SFI/SFP+ |
VM Settings | 4 cores, 8 GB RAM |
API | All tests use Java clients (Solace JCSMP API) |
Platform | Solace PubSub+ Enterprise Edition |
---|---|
Amazon EC2 Instance Type | m4.4xlarge |
Datastore | gp2 EBS Volume – 200GB, 10K IOPS, 2 Gbps bandwidth |
Networking | Enhanced Networking enabled |
Solace Message Router Appliances | |
---|---|
Platform | Solace 3560 |
NAB (I/O Card) | NAB-0810EM-01-A (8x10GE) |
ADB (Guaranteed Messaging Card) | ADB-000004-01-A (ADB-4) |
HBA Connectivity | HBA-0208FC-02-A (8Gb) |
Product Keys | GM650K Performance Key |
Storage Area Network (SAN) Characteristics | |
---|---|
Controller | Redundant controllers and write-back cache |
RAID | RAID 10, 4 + 4 Stripe Size |
Drives | Serial Attached SCSI (SAS), 10K RPM |
Connectivity | 8 Gbps Fibre Channel |
Up to 12 performance test hosts were used to generate and sink the required message rate for the various scenarios. They all had characteristics that matched the following specifications:
Performance Test Hosts (Load Generation Hosts) | |
---|---|
CPU | Intel Core i7-3930K CPU (6 cores) @ 3.20GHz (HT Disabled) |
Memory | 16 GB |
Hard Disk | Western Digital WD2002FAEX (2 TB) |
Network Interface | Ethernet Converged Network Adapter X520-DA2 (10 GigE) |
Host OS (Kernel) | CentOS release 6.4 (2.6.32-358.el6.x86_64) |
Connecting all the equipment together was a Mellanox 10GigE Ethernet Switch.
Ethernet Switch | |
---|---|
Model | Mellanox SX1016 |
Description | 64-port non-blocking 10GigE Ethernet Switch |