Controlling Information Flow with Topics

First things first:  What is a Topic?

Publish-Subscribe (PubSub) architectures are designed to create a separation between data publishers and consumers.  Using this approach rather than publishers sending data to specific consumers, as you would when sending an email, data is sent with tags that allow for consumers to find the information based on interest.  The data tagging utilized by many middleware providers is called a Topic.  For consumers to receive messages, they need to register their interest in one or more topics – like choosing to follow someone on Twitter.  When dealing with Solace Messaging Appliances, the appliance acts as the broker that keeps track of everyone’s interests, and does the high performance matching to make sure that everyone gets each message that they are interested in.

Topics & Message Routing

A topic is a string or a sequence of strings separated by a delimiter.  Different Messaging systems use different delimiters, the most common being dots (.) or slashes (/).  In their simplest form topics can be thought of as something akin to the name of a TV channel where both broadcasters and subscribers can find the same topic simply by using the same name.

Some examples for topic names are:

  • equities/us/nasdaq/aapl
  • foods/waterloo/plantfloor/<machine id>/<sub system id>

In more complex situations wildcards allow receiving applications to subscribe to large swaths of a topic namespace with a single subscription.  Much like delimiters, the wildcards you’ll have access to depend on which messaging middleware you’re using.  Solace Messaging Middleware currently supports the following wildcards:

  • ‘*” replaces any part or whole level identifier
  • “>” replaces any number of level identifiers

As an example, imagine a food processing plant that has 100’s of machines, each identified by a <machine id> and each machine has several sub systems each identified by a <sub system id>.  As you can imagine, the overall dashboard for KPIs of the plant would want to subscribe to something like:

  • food/waterloo/plantfloor/>

Which would allow it to subscribe to every update from every machine on the plant floor.  Whereas a dashboard that was responsible for just a subset of the plant floor like packaging, might want to use (assuming packaging machine id’s all started with 115:

  • food/waterloo/plantfloor/115*/>

This would allow the packaging dashboard to receive every published message from every machine who’s ID started with 115 – which should map to our packaging machine ids.

As you can see, being able to apply wildcards efficiently can depend on having a logical topic hierarchy that has taken into account possible uses of the information that is being published.  Had my mythical food company just sequentially numbered each machine id, there would be no way to use wildcards to subscribe to just a portion of the data flow.  This can lead to long lists of topics that have to be subscribed to individually – which slows down product development, opens the door for human error, and increases time required for testing and maintenance.

So What’s a Topic Namespace?

Topic namespaces are the naming conventions and policies that define how topic names are assigned and used. Topic namespaces are important because they dictate how users interact with the messaging layer, and how the organization manages and governs topics. Namespace governmance is a complex and important concept that I’ll address in a future article. (update: that post is now available)