Things were simpler a few years ago when all of your applications were running on-premises. Sure, you had to manage all that yourself, but it was easy to deploy your applications on your finite number of servers. Now cloud computing has taken off and you don’t have to worry about managing your own datacenter anymore, at least not to the extent you used to earlier. Many companies, especially startups, have decided to embrace the cloud, but enterprises still have an on-prem datacenter for critical applications and sensitive data, with newer and less sensitive applications moved or moving to the cloud.

Similarly, your kdb+ stack used to be on-premises, running on powerful servers spread across the world to capture market data from different markets. But slowly, you are realizing that there may be a better way to manage your kdb+ stack. Maybe not all components of your kdb+ stack need to be on-prem, and maybe other applications across your organization might benefit from having access to the data in your kdb+ database.

It can bring some challenges, however. For example, do you manage data transfer between your q applications running locally on-prem and on the public cloud? How do you then make this data available to other applications in hybrid/multi-cloud systems?

I told you life was much simpler before! But don’t worry, because in this post I am going to pull together different modules I have been working on in the last few weeks and months and show you how you can easily stitch your applications together in a robust, uniform, and secure manner.

First I’ll introduce a market data flow that consists of three different components. I have already written individual posts about each of these components that go into detail on how they work and how to set them up. Please have a look at them to get a better understanding of each component. These components are:

  • Feed handler – Written in Java and designed to be deployed locally, the feed handler connects to market data feeds and publishes that data to internal apps.
  • Stats/analytics process  – Built with q/kdb+ and deployed on AWS, the stats process itself ingests raw market data updates and generates minutely stats.
  • Data warehouse – Build with BigQuery and deployed on GCP, the data warehouse collects and stores all the stats updates in real-time.

Note that this setup doesn’t just use different languages and databases, but the three components are deployed in three very different environments. To stitch these different applications and environments together, we will be using Solace PubSub+ Event Broker. This is what our architecture will look like:

Distributing Applications and Brokers Across an Event Mesh

While I had the option to just have a single PubSub+ Event Broker deployed on any of the major cloud providers via Solace PubSub+ Event Broker: Cloud and have all three processes use the same broker, that’s not how you’d implement such a system in production. In a production environment, you’d typically have multiple deployments of the broker in different environments and regions. Hence, I decided to have three deployments of PubSub+ Event Broker:

  • PubSub+ software broker locally deployed via docker
  • PubSub+ broker deployed via PubSub+ Cloud in AWS
  • PubSub+ broker deployed via PubSub+ Cloud in GCP

That’s great, but how do you connect all these brokers? By building an event mesh that takes advantage of a PubSub+ feature called dynamic message routing (DMR). With PubSub+ brokers linked together and DMR enabled, applications can continue to publish to their local instance, but can subscribe to messages being published to topics on other brokers. This enables the stats process to consume raw market data in AWS that is being published to a local broker.

Putting it all together

I have gone ahead and connected the three brokers and started each of the three processes.

Feed Handler (market data simulator)

My feed handler is publishing simulated market data for a handful of securities from different exchanges:

Publishing to topic: EQ/marketData/v1/UK/LSE/BARC
Data: {"date":"2020-06-09","symbol":"BARC","askPrice":90.35925,"bidSize":510,"tradeSize":160,"exchange":"LSE","currency":"GBP","time":"12:17:29.291393-04:00","tradePrice":90.021675,"askSize":340,"bidPrice":89.6841}
Publishing to topic: EQ/marketData/v1/UK/LSE/TED
Data: {"date":"2020-06-09","symbol":"TED","askPrice":136.32913,"bidSize":640,"tradeSize":360,"exchange":"LSE","currency":"GBP","time":"12:17:29.292771-04:00","tradePrice":135.48236,"askSize":320,"bidPrice":134.63559}
Publishing to topic: EQ/marketData/v1/US/NASDAQ/AAPL
Data: {"date":"2020-06-09","symbol":"AAPL","askPrice":273.38898,"bidSize":400,"tradeSize":200,"exchange":"NASDAQ","currency":"USD","time":"12:17:30.295731-04:00","tradePrice":272.02884,"askSize":480,"bidPrice":270.6687}
Publishing to topic: EQ/marketData/v1/US/NASDAQ/FB
Data: {"date":"2020-06-09","symbol":"FB","askPrice":198.85513,"bidSize":500,"tradeSize":30,"exchange":"NASDAQ","currency":"USD","time":"12:17:30.301617-04:00","tradePrice":196.88628,"askSize":650,"bidPrice":194.91742}
Publishing to topic: EQ/marketData/v1/US/NASDAQ/INTC
Data: {"date":"2020-06-09","symbol":"INTC","askPrice":66.58829,"bidSize":0,"tradeSize":490,"exchange":"NASDAQ","currency":"USD","time":"12:17:30.306857-04:00","tradePrice":65.929,"askSize":650,"bidPrice":65.269714}
Publishing to topic: EQ/marketData/v1/US/NYSE/IBM
Data: {"date":"2020-06-09","symbol":"IBM","askPrice":98.59332,"bidSize":60,"tradeSize":10,"exchange":"NYSE","currency":"USD","time":"12:17:30.31108-04:00","tradePrice":97.859375,"askSize":460,"bidPrice":97.12543}
Publishing to topic: EQ/marketData/v1/US/NYSE/BAC
Data: {"date":"2020-06-09","symbol":"BAC","askPrice":22.801601,"bidSize":130,"tradeSize":470,"exchange":"NYSE","currency":"USD","time":"12:17:30.315562-04:00","tradePrice":22.603819,"askSize":400,"bidPrice":22.406036}
Publishing to topic: EQ/marketData/v1/US/NYSE/XOM
Data: {"date":"2020-06-09","symbol":"XOM","askPrice":46.533016,"bidSize":80,"tradeSize":230,"exchange":"NYSE","currency":"USD","time":"12:17:30.31798-04:00","tradePrice":46.072292,"askSize":410,"bidPrice":45.61157}
Publishing to topic: EQ/marketData/v1/UK/LSE/VOD
Data: {"date":"2020-06-09","symbol":"VOD","askPrice":86.92494,"bidSize":40,"tradeSize":410,"exchange":"LSE","currency":"GBP","time":"12:17:30.320502-04:00","tradePrice":85.95792,"askSize":350,"bidPrice":84.99089}
Publishing to topic: EQ/marketData/v1/UK/LSE/BARC
Data: {"date":"2020-06-09","symbol":"BARC","askPrice":91.83111,"bidSize":530,"tradeSize":140,"exchange":"LSE","currency":"GBP","time":"12:17:30.32268-04:00","tradePrice":90.92189,"askSize":280,"bidPrice":90.01267}

As you can see, currently, our feed handler is publishing data for securities from the US and UK exchanges (since I am running the feed handler during their market hours). However, I am only interested in generating stats for securities traded in the US. So, in my market_data queue, I have used PubSub+’s wildcard filtering capability to subscribe to this topic: EQ/marketData/v1/US/>. This only enqueues US market data into my queue and saves me the trouble of having to filter these records myself in my q stats process.

Stats Generation

In parallel, I have my q stats process running on AWS and connected to a different broker deployed in AWS. Here is the output of my stats process for each symbol:

AAPL| "[{\"date\":\"2020-06-09\",\"sym\":\"AAPL\",\"time\":\"13:02\",\"lowAskSize\":0,\"highAskSize\":790,\"lowBidPrice\":316.5098,\"highBidPrice\":330.9588,\"lowBidSize\":0,\"highBidSize\":780,\"lowTradePrice\":318.2236,\"highTradePrice\":333.0169,\"lowTradeSize\":0,\"highTradeSize\":490,\"lowAskPrice\":319.8147,\"highAskPrice\":335.9308,\"vwap\":235.2322}]"
BAC | "[{\"date\":\"2020-06-09\",\"sym\":\"BAC\",\"time\":\"13:02\",\"lowAskSize\":0,\"highAskSize\":790,\"lowBidPrice\":55.14443,\"highBidPrice\":63.4184,\"lowBidSize\":20,\"highBidSize\":780,\"lowTradePrice\":55.70145,\"highTradePrice\":64.21822,\"lowTradeSize\":0,\"highTradeSize\":500,\"lowAskPrice\":56.25846,\"highAskPrice\":65.02095,\"vwap\":238.9565}]"
FB | "[{\"date\":\"2020-06-09\",\"sym\":\"FB\",\"time\":\"13:02\",\"lowAskSize\":10,\"highAskSize\":790,\"lowBidPrice\":139.3889,\"highBidPrice\":146.585,\"lowBidSize\":0,\"highBidSize\":720,\"lowTradePrice\":140.2678,\"highTradePrice\":148.2529,\"lowTradeSize\":10,\"highTradeSize\":500,\"lowAskPrice\":140.6184,\"highAskPrice\":149.9207,\"vwap\":225.4108}]"
IBM | "[{\"date\":\"2020-06-09\",\"sym\":\"IBM\",\"time\":\"13:02\",\"lowAskSize\":0,\"highAskSize\":730,\"lowBidPrice\":72.99904,\"highBidPrice\":79.32771,\"lowBidSize\":10,\"highBidSize\":800,\"lowTradePrice\":73.54964,\"highTradePrice\":79.32771,\"lowTradeSize\":0,\"highTradeSize\":500,\"lowAskPrice\":73.73595,\"highAskPrice\":79.93908,\"vwap\":227.7111}]"
INTC| "[{\"date\":\"2020-06-09\",\"sym\":\"INTC\",\"time\":\"13:02\",\"lowAskSize\":0,\"highAskSize\":790,\"lowBidPrice\":81.64793,\"highBidPrice\":87.0667,\"lowBidSize\":0,\"highBidSize\":780,\"lowTradePrice\":82.36865,\"highTradePrice\":87.17567,\"lowTradeSize\":10,\"highTradeSize\":500,\"lowAskPrice\":83.08938,\"highAskPrice\":87.61224,\"vwap\":228.8886}]"
XOM | "[{\"date\":\"2020-06-09\",\"sym\":\"XOM\",\"time\":\"13:02\",\"lowAskSize\":20,\"highAskSize\":800,\"lowBidPrice\":18.55276,\"highBidPrice\":19.46045,\"lowBidSize\":0,\"highBidSize\":790,\"lowTradePrice\":18.73785,\"highTradePrice\":19.55809,\"lowTradeSize\":0,\"highTradeSize\":500,\"lowAskPrice\":18.83445,\"highAskPrice\":19.71465,\"vwap\":273.4346}]"

These stats are computed every minute and are published to new dynamic topics with the following topic heirarchy: EQ/stats/v1/. For example, INTC’s stats are published on EQ/stats/v1/INTC.

A separate queue called stats is subscribed to EQ/stats/> so it is capturing all stats messages that our stats process is publishing.

Data warehousing in BigQuery

My third process is a Beam/Dataflow pipeline running in GCP which is consuming all stats messages from our stats queue, parsing them, and then writing them to BigQuery. Here is what that pipeline looks like in Dataflow:

And here you can see that stats are being written into BigQuery:

Voila! Stats data is being inserted into BigQuery, and you can also see that it only includes stats for US stocks because I used PubSub+’s wildcard filtering earlier in the q stats process.

Wrap Up

As other application architectures have evolved in the last few years, kdb+ architecture has also evolved. Applications running in hybrid/multi-cloud environments and with different runtime environments can communicate with each other using PubSub+ Event Broker. In this post, I showed how a Java feed handler running on-prem can send raw market data prices to a q stats process which in turn sends it to BigQuery through PubSub+.

This architecture can evolve further to add multiple different components such as a machine learning algorithm in AWS or a visualization application running in Azure on top of data streaming through PubSub+.

I hope you enjoyed this post. Feel free to leave a comment in our developer community if you have any questions.

Himanshu Gupta

As one of Solace's solutions architects, Himanshu is an expert in many areas of event-driven architecture, and specializes in the design of systems that capture, store and analyze market data in the capital markets and financial services sectors. This expertise and specialization is based on years of experience working at both buy- and sell-side firms as a tick data developer where he worked with popular time series databases kdb+ and OneTick to store and analyze real-time and historical financial market data across asset classes.

In addition to writing blog posts for Solace, Himanshu publishes two blogs of his own: enlist[q] focused on time series data analysis, and a bit deployed which is about general technology and latest trends. He has also written a whitepaper about publish/subscribe messaging for KX, publishes code samples at GitHub and kdb+ tutorials on YouTube!

Himanshu holds a bachelors of science degree in electrical engineering from City College at City University of New York. When he's not designing real-time market data systems, he enjoys watching movies, writing, investing and tinkering with the latest technologies.

Join Our Developer Community

Join the Solace Developer Community to discuss and share PubSub+ API hints, new features, useful integrations, demos, and sample code!