Before you can break into a cold sweat about tackling the design of a system that analyzes big data volumes, you first need to be able to capture the data. More often than not, the design parameters feel like a traffic engineering problem — there are simply too many cars and not enough road.
Certainly, LAN and WAN network technology introduces many limits and the largest commercial databases (e.g. Netezza, Teradata) or open source big data stores (e.g. Hadoop, Splunk) can only store data so fast. Even in memory data grids are limited by how many in-memory writes can be performed per second. Managing the distributed information is usually some kind of middleware, once again, usually a commercial product (e.g. JMS or MQ) or open source code (e.g. Kafka or Qpid).
Even at full speed, a single instance of the middleware layer runs at far less capacity than the network, in-memory grid, or data store can process, making it the weakest link. This means to keep up, the software middleware traffic has to be scaled horizontally across many middleware brokers or servers. Each application becomes a fragile layered mess of servers and any disruption can lead to significant cascading problems of volume and backlog.
An increasing number of our customers with big data projects (e.g. in capital markets, internet infrastructure and transportation) have thrown in the towel on attempting to use traditional JMS, MQ, or open source for this scale of data capture. Instead, they’re opting for Solace’s hardware messaging to feed their big data stores. Where software messaging peaks at a few thousand messages per second, Solace’s failsafe queuing solution exceeds 150, 000 messages per appliance. That means you would need to horizontally scale a typical JMS, MQ or open source alternative to 30 or more servers (assuming it could sustain 5, 000 msgs per JMS server) to match the throughput of one Solace appliance. It just makes everything easier if the layers and moving parts in your scaling architecture stay light and lean. Fewer servers, less datacenter space, fewer outages = cheaper and less headaches.
Many customers initially think a commercial solution like Solace’s has to be more expensive than open source, after all open source is free and Solace costs money. But it is easy to show that when you factor in server costs, rack space, power, and management it’s far cheaper to pay for an appliance that replaces 30 or more servers.
Big data is right in the sweet spot of (one of the many) use cases that this company was built to address. If you are struggling with these problems, we’d like the opportunity to talk to you about solving them.