Solace Enterprise Stats is a new solution from the Solace Professional Services Group that enables easy access for monitoring your Solace message routers. In this blog article, I will walk you through the architecture and some of the details underlying the Enterprise Stats solution.
An Enterprise Stats deployment always consists of at least one instance of the StatsPump and at least one Stats Receiver. In the diagram below, we have a simple deployment model for the Enterprise Stats solution.
In this diagram, the StatsPump is polling a Solace message router using the SEMP protocol. It then re-formats the SEMP responses it receives from the message router, and publishes the stats back out as messages using the very same message router. Also in this picture, a Stats Receiver is subscribing to some statistical data and writing it into a time series database. From there, an operations GUI is pulling the data for visualization.
In this blog post, we will examine the problems this tool solves, take a closer look at how it works, and the options available for customization. If you haven’t done so already, ensure you are familiar with Solace core concepts and check out our Introduction to Enterprise Stats.
We will look at:
If you have Solace message routers running in your enterprise and would like to begin using the Enterprise Stats solution, the following steps will get you up and running as quickly as possible:
The Enterprise Stats solution architecture makes collecting stats about your Solace message routers as easy and powerful as possible. In the Introduction to Enterprise Stats we outlined who needs this solution. The following points examines why and how.
In the receiver framework, stats are made available to the plug in. Let’s take a look at what these statistics look like. Each stat has the following members:
String TopicName Map<String, String> Tags Map<String, Object> Values Long Timestamp
The following subsection examines each of these members.
The topic name is an abbreviation of the topic that the Pump published the message on. Each unique type of stats has a unique name. Examples include: SYSTEM_CSPF_NEIGHBOR_STATS, SYSTEM_MSG-SPOOL, SYSTEM_MEMORY. See the notes below under Available Stats regarding the complete list. If using a relational database to write these stats, you would likely have a separate table for each unique topic name.
The tags are the unique identifiers which together unambiguously identify the entity for which the stat pertains. In relational database terminology, these are the composite primary key values. The list of possible tags varies from message to message. Let’s look at a few examples of live data:
If you are writing to a database, these tags will most likely be your identifying attributes or primary keys. Time series databases such as InfluxDB allow you to explicitly identify the TAGS for each row of data. In contrast, relational databases require creating a schema with these tag fields used as (potentially composite) primary keys, or at least indexed.
The values are passed in as a flattened key value pair map. Nice and simple. Here is an example:
SYSTEM_MSG-SPOOL: spool-sync-last-failure-reason = N/A total-messages-currently-spooled = 0 transaction-resource-utilization-percentage = 0.00 spool-sync-status = Synced mate-disk-partition-usage = - disk-messages-currently-spooled = 0 disk-array-wwn = N/A operational-status = AD-Active datapath-up = true spool-while-charging = false current-disk-usage = 0.0 synchronization-status = Synced message-count-utilization-percentage = 0.00 spool-sync-last-failure-time = N/A max-disk-usage = 1500 current-persist-usage = 0.0 delivered-unacked-msgs-utilization-percentage = 0.00 current-rfad-usage = 0.0 active-disk-partition-usage = 39.00 num-delete-in-progress = 0 spool-without-flash = false next-message-id = 1 spool-files-utilization-percentage = 0.00 defrag-status = Idle rfad-messages-currently-spooled = 0 transacted-session-count-utilization-percentage = 0.00 max-message-count = 240M config-status = Enabled (Primary) using-internal-disk = true transacted-session-resource-utilization-percentage = 0.00
Note that some of the Stats contain hierarchical data. The data is simply flatted down with slashes delineating the levels of the hierarchy in the name. Here is an example of flattened data:
SYSTEM_MEMORY: physical-memory/memory-info|1/free-in-kb = 1599708 event-thresholds|0/name = physical-memory ipc-buffers/buffer-info|1/free-buffers = 16526 ipc-buffers/buffer-info|3/free-buffers = 450 ipc-buffers/buffer-info|4/total-buffers = 10 ipc-buffers/buffer-info|5/free-buffers = 1 ipc-buffers/buffer-info|0/free-memory-in-kb = 2048 subscriptions-memory/memory-info|0/type = Memory subscriptions-memory/memory-info|0/free-in-kb = 131053 [snipped for brevity]
This is a simple long integer value, in milliseconds, from midnight, January 1, 1970 UTC. It is when the stat was obtained from the router. This is critical for any visualization or planning surrounding your data.
All stats which are available in the Solace message router can be consumed in a Stats Receiver. Take a browse around in SolAdmin or the CLI take note of the verbose information regarding the message routers health and usage as well as details about each and every connected client and endpoint. The StatsPump comes out of the box with a fairly comprehensive set of Stats collected as it’s default configuration. From there, you can easily streamline what you want the StatsPump to provide on the bus, adding more in some areas and reducing it in others as you need.
For a complete guide of all of the statistical information available to you, please refer to the Solace Command Line Interface reference documentation.
In this blog we have:
In an upcoming blog post, I will show you how build your own Stats Receiver plug-in for moving Stats into your favorite data repository.
Please feel free to leave me any comments regarding this blog, or leave comments for the entire community to get involved in.