Blogger Kirk Wylie recently commented on our Barclays Capital announcement with some provocative points, which led to a dialog in the comments which I will continue here. Kirk makes the point that comparing the cost of messaging appliances to commercial messaging software plus servers should be appealing, but questions:
I’m not sure that they’ll look at the commercials for Solace vs. RabbitMQ and see the types of benefits out of the hardware approach that more traditional latency-sensitive applications might.
The last part of that comment raises an interesting point about why people traditionally turn to hardware. There are three common reasons: performance, cost/complexity/consolidation, and compute-intensive problems.
1) Performance: As Kirk points out, the primary reason people turn to hardware for messaging is to improve the performance of latency-sensitive applications. Kirk makes the case well: If the appliance has a 100% data path and fully eliminates the operating system, network stacks, etc from the performance equation, they will run performance circles around software running on servers, or appliances that have a software data path inside their box. This is how IP routers and deep packet inspection technology get the performance they do. Ditto for Solace content routers and the Tibco Messaging Appliance.
2) Cost/Complexity/Consolidation: In financial services and other industries, large data volume applications scale by dividing the traffic flow across pools of servers — sometimes by segmenting the name space, sometimes in a round robin approach as is more common in cloud computing, and sometimes to isolate connected clients. Most big financial services (and logistics, government, telecom, internet, …) firms end up with literally thousands of servers running messaging. So the second reason people move to hardware is to consolidate to fewer devices and realize cost savings. In his post, Kirk used a word formula that can be summarized as:
if appliance cost <= license + server hardware costs then consider choosing appliance
…because it will be faster as measured by every metric. This is true, but for high data volume companies, I think it’s more accurately represented with the formula:
if 2 * (appliance cost + power/cooling/space + appliance operations costs) <= 10 * (license + server hardware + power/cooling/space + server/software operation costs) then consider choosing appliance
Notice this second formula assumes that a pair of routers (for redundancy) would replace 10 low latency or queue-based software servers. Different scenarios would of course change the multiplier from 10 to be lower, or more often, higher. We actually have a customer that is replacing 50 servers with a pair of Solace content routers, although this is admittedly an exceptional case. The power/cooling/space and operations costs though, will always be higher in the software server sprawl scenario.
In the current economy, this cost reduction/consolidation motivation is the one most companies are interested in. Even when software license cost is near $0, as with open source, the math in the formula above still supports the appliance model for most high-volume scenarios. Especially if the open source upgrade cycle is more than a couple times a year (which will result in high OpEx when upgrading 10s of machines).
The math gets even more attractive when the same platform can do both low latency and guaranteed (MQ-style) messaging in one box. That’s the kind of front, middle, back office consolidation that BarCap is benefiting from.
We’ve seen a very similar scenario before: companies stopped building separate parallel networks (Novell, DecNet) about 15 years ago when IP became the shared backbone, and realized massive cost savings. Today, they still build separate middleware networks for applications and replicate the cost of hardware, software, man effort, etc. Hardware appliances have throughput properties that support the consolidation of many applications onto shared middleware with similar cost saving properties.
3) Compute-Intensive Problems: the third case for hardware is when you are trying to do something very compute intensive that is hard to do with general purpose CPUs. Graphics cards for 3D rendering of games, or hardware encryption/compression of WAN data streams are good examples. In the middleware world, that’s what’s compelling about content routing and the announcement we are making tomorrow.
In software, if you want to route on content, you need a message broker or integration server to pull a message off the message bus, parse the content, apply a set of rules, find the matches, and re-address new messages back onto the bus (with new subjects or topics) to be carried. There are two dimensions that software struggles here:
In software, you could expect to parse the payload of maybe 100 messages a second and apply 100 to 1000 complex rules against each message and still get 1-2 second performance on a powerful multi-CPU server. If you need more of either dimension, message volume or rules, you need to parallelize the workload. This means more servers, more message broker and messaging licenses, more application development costs etc.
In hardware, you can process an event stream of hundreds of thousands of messages per second, while parsing for millions of complex rules within each message — all in a couple hundred microseconds. This could literally take 100 or more servers with open source or commercial software and a ton of complexity splitting and reassembling the stream.
At the end of the day, it’s true that there’s nothing hardware can do that some amount of software/servers can’t do, but most large scale content routing solutions are impractical to do in software because of the cost of the infrastructure that would be required. As a result, most of these apps have never been built. Cloud computing may help address some of the cost, but with further reduction in performance. In the final analysis, the justification for reason #3 (compute intensive) boils down once again to the combination of performance and cost.
I agree with Kirk that compute intensive scenarios are not the majority of use cases by a long shot, nor (as he points out) can you drop content routing into an existing JMS application with no changes, but for the firms that have content routing problems, this stuff is very valuable. We’ll talk about just such a compute-intensive use case tomorrow in association with a new technology announcement we’ll be making.