I am half way through my new blog series that is taking an in-depth look at PubSub+ use cases in the realm of capital markets. So far I have covered market data/reference data distribution, frontend user interfaces (UI) for information displace and collecting orders, and now I will be diving into pre-trade order processing.
As I discussed a bit in part 2, order flow is a key component of systems in capital markets. Applications or humans decide which orders need to be executed and then those orders are submitted to a brokerage to be executed at venues where client can get the best price. Once the orders are executed, messages are sent back with confirmation and submitted to another system or counterparty for settlement and clearing.
In this post, I will focus on pre-trade which includes determining what changes are required and then translating them into concrete actions. Typically, in a hedge fund, you will have two systems responsible for handing trade flows: Order Management System (OMS) and Execution Management System (EMS). OMSs are generally used by portfolio managers (PMs) to get an overview of the portfolios they manage and to make modifications to these portfolios without worrying about the actual execution(s). For example, a PM might want to reduce exposure to a certain security, e.g., IBM, in their portfolio. They can make that change via an OMS but that change will eventually result in orders being placed via an Execution Management System (EMS).
An EMS is responsible for providing connectivity to different exchanges and trading venues. It is used by traders to execute trades as efficiently as possible with the goal of getting the best execution price. It is not uncommon to find an application that serves both as an OMS and an EMS.
Both OMS and EMS require market data and trade order data. Because I have already discussed market data and Direct Messaging extensively in previous sections, I will focus on trade order data in the next section.
Trade orders are extremely important. Each message representing a trade order represents some monetary value. It needs to be carefully delivered and in a timely fashion. Failure to do either, can result in a high opportunity cost of not being able to trade. Or if it was a cancellation order – the ability to cancel a trade. The key feature required here is zero message loss and message ordering. Not a single trade order can be afforded to be lost and message order must be maintained.
With Solace’s Guaranteed Messaging, messages are locally persisted in the broker. If you have an OMS application publishing trade orders that need to be executed by the EMS, OMS will publish the messages to a well-defined Solace topic such as trading/OMS/<version>/<order-id>/<action>. Once the broker receives the message, it will persist it to local disk and send an acknowledgement back to the publisher. At this point in time, the publisher knows that the broker has received the message and it is now free to continue publishing more messages. However, the message still needs to be delivered to relevant subscribers. If the subscriber is online, it will immediately get the message. But if it is not, Solace will save the message in the underlying spool till the subscriber comes online. Once the subscriber receives the message, it will send an acknowledgement back to the broker. Once the broker has received acknowledgement from all downstream subscribers interested in the message, it will remove the message from underlying spool. And that’s how Solace guarantees end-to-end zero message loss.
Strong message ordering is of high importance as well. The sequence diagrams below show the intended order of trades and out-of-order trades. The intended order is to buy shares of AAPL at $95, sell at $100, and then buy again at $97. Instead, due to lack or ordering, the actual trades that are executed are quite different. As shown in Figure 1, Order # 3 gets executed first, followed by Order #1, and finally, Order #2. This leads to us buying AAPL at $100, instead of $95 and selling it at $97, instead of $100.
In this scenario below, out-of-order messages have led directly to monetary loss for our trading desk.
When working with Guaranteed Messaging, Solace provides queues for enqueuing messages. Publishers continue to publish messages to rich hierarchical topics without knowing anything about how the consumer will be subscribing to these messages. If the consumer wishes to leverage persistence, it will use queues and map relevant topics subscriptions to it. For example, if an instance of my EMS application is interested in all types of orders from OMS then it will subscribe to topic: trading/OMS/<version>/> and map it to its queue. Any message published to a topic matching this topic hierarchy will be enqueued in this queue. If there are a lot of messages being published and EMS is unable to process them all quickly, the queue will provide shock absorption mechanism and spool the messages so they can be processed later. Similarly, if the EMS application crashes for some reason, the messages will be spooled and will not be lost.
There can also be additional instances of EMS interested in different topic subscriptions. For example, there might be an instance which is simply processing new orders, for auditing purposes. To do so, it can simply have a queue with topic subscription of: trading/OMS/<version>/*/newOrder.
Solace queues have several built-in features such as access types, message expiry and DMQ, and last value queue (LVQ) which can be leveraged differently for different use cases.
When processing messages, there are two common design patterns on how you might want to distribute the load. One way is to have an active-standby configuration where you might have more than one subscriber binding to the same queue but only one of those subscribers is active. If the active subscriber goes offline, the backup subscriber will become active and continue processing remaining messages. This configuration is known as Exclusive access type. Alternatively, Solace queues support Non-Exclusive access type which supports load balancing configuration. With this access type, you can still have multiple consumers binding to a queue, but the messages are round-robin-ed across all consumers to distribute the load. Regardless of which access type is selected, any given message enqueued in a queue will only be delivered to one consumer. If you want multiple consumers to receive same messages, you will need to create separate queues (with same topic subscriptions) for them. This may seem like a lot of work or duplication of messages, but Solace handles this scenario very well. Solace only stores the message once in the underlying spool. If there are two queues that need to enqueue the same message, they will simply hold references to the message instead of storing it twice. Thus, creating another queue is a relatively light-weight operation. It also allows users to maintain separate queues for separate clients to avoid interdependency.
Different messages have different levels of importance. Some messages retain their importance over time whereas others become less and less important as time passes. For example, many trade orders are only ‘valid’ for a short period of time while the client thinks they can get the same price. If a trade order time window expires, the client is at the risk of getting a ‘worse’ price. Due to this reasoning, clients would rather have stale messages expire after a time period rather than have them consumed.
For such scenarios, Solace PubSub+ broker has the concept of message expiry. There are different ways a message can expire. Once it expires, it can either be discarded or moved to a Dead Message Queue (DMQ) for further processing.
A message can expire in the following scenarios:
A common pattern, especially at large companies with separate support/operations team, is to let the problematic messages expire and move to a DMQ so that someone from support or operations team can manually investigate those messages later.
When you are provisioning a queue, you must provide storage depth. If you set it to 0, the queue is considered a Last Value Queue (LVQ) which means that it will only store the latest message. This feature can be used in several different use cases. For example, if an application crashes, it can quickly check the LVQ to see what was the latest message that was delivered and compare it against an internal list of the last message that it processed before it crashed.
As trades are executed, they need to be forwarded to relevant organizations for settlement and clearing for further processing. Traditionally, settlement and clearing has taken approximately 3 days, known as t+3 settlement. However, in recent times, with advancement in technology and pressure to become more real-time, organizations have upgraded to t+2 settlement for equities and t+1 for government bonds.
Three key steps of any trade are execution, clearing, and settlement. I have discussed execution already where firms use OMS and EMS to execute trades. At this point the buyer and seller have gotten into an agreement to trade. Next step is clearing which involves performing steps such as recording the transaction and posting sufficient margin so that the trade can be settled. Final step is settlement which involves transfer of ownership or title of the security to the buyer and money to the seller. All of these steps involve messages that must not be lost under any circumstances, or the firm would have to incur a monetary and/or reputational loss. There are also strict regulatory laws around execution, clearing and settlement which all participants need to follow to avoid regulatory fines.
Usually clearing takes place via a third-party clearing house that the trading firms have a membership in. Only the members are allowed to use the services of this clearinghouse. Some examples of clearinghouses in US are NYSE, DTCC, and CME.
To facilitate transfer of ownership of securities, central depositories emerged over the years where securities could be held centrally, and ownership of the securities could easily be transferred without having to move the securities themselves. The largest depository in the world is Depository Trust and Clearing Corporation (DTCC) which processed $2.15 quadrillion in securities in 2019. In Europe, Euroclear and Clearstream are the two popular alternatives.
Any firm’s processes participating in execution, clearing and/or settlement are extremely critical and must be highly available and redundant. These processes must be able to withstand any disruptions such as loss of server, loss of datacenter, and loss of region without significant downtime and without any loss of messages. Again, these are not just requirements based on monetary or reputational loss, but regulatory authorities require these as this is considered critical infrastructure to a country’s economy. Additionally, given the sensitivity of the data, security is of extreme importance.
Given the importance of clearing and settlement, firms choose to rely on Solace PubSub+ brokers due to their native built-in support for high availability (HA) and disaster recovery (DR), and enterprise security. Over the years, many firms have primarily chosen PubSub+ brokers due to their high reliability and security.
How do you protect your critical applications from server failures within a datacenter? What if there was an issue with your broker and it went down? How do you ensure zero message loss and business impact? To avoid such issues, PubSub+ brokers come equipped with a high availability feature. The brokers are deployed in pairs (appliances) or in triplets (software) to achieve high availability. In both scenarios, there is a primary broker and backup broker with a link connecting both and synchronously replicating both messages and configurations. Any message that is published to the primary broker is synchronously replicated to the backup broker as well so the two brokers are always in sync. Additionally, as messages are acknowledged and removed from primary broker’s spool, they are also removed from the backup broker’s spool so message states are also synched. Finally, if you create objects such as queues on primary broker, they are also created on the backup broker. This ensure that if something happens to the primary broker and you are forced to failover to the backup broker, there will be no difference and your applications will continue to function without any interruption.
From an application’s perspective, it will simply provide a host list containing connection details of both primary broker and backup broker. If the primary broker is available, the application will connect to it and if it’s not, after few retries, it will connect to backup broker instead.
While not very common, what if an entire datacenter goes offline? For example, when Sandy hurricane hit US East Coast in 2012, a lot of datacenters lost power and were inaccessible. How do you ensure your applications and businesses are protected from such incidents? Typically, large enterprises will host their servers in multiple datacenters so that if their primary datacenter is unavailable, they can easily failover to the disaster recovery datacenter.
Solace PubSub+ brokers support disaster recovery out-of-box. You can pair two stand-alone brokers together, two high availability pairs together, or an high availability pair and a stand-alone broker in disaster recovery mode depending on your requirements. An ideal scenario is to have two high availability pairs setup in disaster recovery mode so that if the primary high availability pair is unavailable, your applications can easily failover to the disaster recovery high availability pair and still have high availability in the backup datacenter. However, due to cost restraints, there are many firms that choose to only have a single broker deployed in the backup datacenter. The ideal setup depends on regulatory requirements, cost, and risk tolerance.
Solace’s disaster recovery feature allows clients to replicate both messages and configurations to the disaster recovery site. While high availability only supports synchronous replication, disaster recovery allows clients to configure on a topic-by-topic basis whether they want messages to be replicated synchronously or asynchronously. This allows clients to replicate critical messages synchronously and non-critical messages asynchronously for optimum performance. Just like HA, Solace PubSub+ brokers also replicate message state so once messages are acknowledged from the primary site, they are also removed from the disaster recovery site. This allows applications to continue from where they left off in case of DR, instead of having to process all messages and rebuild state. Furthermore, in the case of PubSub+ appliances, there is additional level of optimization where if the messages are being acknowledged quickly, they are never replicated to the disaster recovery appliance. This increases performance as messages do not have to be replicated to disaster recovery site.
Additionally, for clearing and settlement use cases, it is extremely important to ensure zero message loss and PubSub+’s guaranteed messaging ensures no messages are lost. See previous section for more detail.
In financial services, there are different types of data with different levels of sensitivity which require different levels of security —such as vendor data such as market data which is publicly available. Such data is not sensitive and doesn’t have to be secured heavily. However, any data proprietary to the firm such as data pertaining to trading activity at a prop trading firm or client’s data such as trade orders at a liquidity provider is highly sensitive and must be secured. A lot of firms in financial services that engaging in trading are also regulated by various organizations (e.g., SEC, FINRA, CFTC, etc.) and are required to secure sensitive data.
There are different ways to secure a system and the underlying data:
Solace PubSub+ brokers are capable of all the above and they do so by:
With such strong security features built-into the PubSub+ brokers, companies in a highly regulated industry such as financial services have had no issues relying on Solace.
One of Solace’s clients, a global liquidity provider, uses Solace PubSub+ event brokers as the backbone for trade order management and market data/price distribution in support of their multiple lines of business services including FX, FI, equities, energy, commodities, and market data distribution.
Solace PS+ Cache is used as a last image cache to cache the state of the market and customer static data for the browser clients i.e. internal and external traders.
The browser client expects to find all its state by subscribing to topics that are backed by PS+ cache to provide late joiners their state. The cache contains data describing the browsers view of the market and validation of any inputs.
The front-end layer consists of Solace PubSub+ event brokers deployed in an active-active high availability pair, each broker backed by a PS+ Cache instance.
This approach gives them the advantage of having cache instances always maintaining a connection to a broker. The high availability mate broker and its associated cache instance will already be online and able to immediately provide caching services to the clients that were previously connected to the failed broker.
Solace’s client asked how to avoid a scenario where they have a publisher publishing time-sensitive trade orders which are consumed by downstream consumers and processed.
Sometimes, due to a bug in the publishing application, it is possible that the queue might receive several ‘spam’ messages which might clog up the queue causing a delay in the processing of the ‘real’ trade orders. And given the time sensitivity of the trade orders, the client would rather ignore stale messages and process new trade orders instead of processing each message in the queue.
The proposed solution for this problem was to use TTL on the messages (or on the queue) so that they will expire after a certain time and not clog up the queue.
A global hedge fund has publishers that are publishing trade orders as well as heartbeats. The trade orders are being received by subscribers responsible for executing the trade orders. The subscriber also receives heartbeats from publisher and compares the timestamp on the messages to when it received them to calculate whether it is receiving these messages in time or if there is a backlog. The client doesn’t want to execute stale orders.
If there is a backlog, the subscriber will send cancel orders and cancel those trades. The publisher will see these cancel orders and there is a separate logic to reroute new orders to another subscriber for execution. Basically, the publisher and subscriber need to be tightly coupled here so they know that they haven’t fallen behind.
Proposed Solution: Leverage priorities for queue ingress and set “Reject Low Priority Message Limit” on queues. Here is how that would work:
The bank’s Fixed Income trading desk is globally distributed around the world. Prior to introducing Solace, each desk generated their own version of the yield curve – a fundamental instrument that drives both pricing and risk for the entire desk. Because of the inconsistencies in the yield curve – it meant the bank offering sub-optimal prices out to the market. By deploying a global Solace event mesh, they were able to generate pricing in one location and offered a consistent view of the yield curves around the world which improved the desk’s profitability and contributed to being the #1 Fixed Income Desk on Wall Street for a number of years.
In this post, I discussed a key use case in capital markets—pre-trade processing. Unlike market data distribution, clients cannot afford to lose any messages when dealing with order flow data. Every message is crucial and must be acknowledged and processed by downstream subscribers. Solace support for multiple quality-of-service (QoS) with direct and guaranteed messaging allows clients to use PubSub+ brokers for a variety of use cases such as market data distribution and order flow data.
Additionally, I covered a variety of features that PubSub+ brokers have via queues such as exclusive/non-exclusive queues, message expiry and dead message queue, and last value queue. Finally, I touched upon the importance of redundancy, reliability, and security and how PubSub+ provides that with high availability, disaster recovery and enterprise security built-into the PubSub+ brokers.
Next up, I will be discussing reporting and P&L calculation and the PubSub+ features that made it an ideal candidate for the use case.
Be sure to check out these next posts in the series as they are published: