WAN application infrastructure? Off with its head!

I was watching the HBO series Game of Thrones the other night, and as the various kings and lords were forming alliances and drawing battle lines, it struck me how similar their world is to today’s capital markets. Charismatic leaders, high risks, huge rewards, complicated politics — the metaphor easily extends in a lot of directions.

The parallel I want to focus on in this article is the very different infrastructure within the castles and across the territories. Inside the castles, life is pretty efficient. Many things happen in parallel, the king demands and gets what he wants immediately, and different kinds of loud bells can instantaneously put everyone on alert. But once you get into the vast expanse between settlements, that efficiency breaks down. Riders on horses take days to accomplish what can be done in mere minutes within the castle walls.

This reminds me of the mismatch between LANs and WANs today. Many capital markets applications like risk management or order routing are designed with the LAN in mind, and when they’re extended over the WAN the colossal differences between the environments are ignored. Over the LAN the application enjoys up to 10 Gigabits of throughput and a few microseconds of latency, while the WAN offers a lot less bandwidth, and frequently exhibits hundreds of milliseconds of latency. In LAN environments, it’s very common for an application to be chatty and send a bunch of serial messages to get a piece of work done — an approach that would take many full seconds over the WAN with current approaches.

In the Game of Thrones analogy, imagine the king sitting in the dining hall for a four course meal, with his servers retrieving each course from the adjacent kitchen. Now suppose the king decides he wants his dinner prepared in another village, a half days ride from his castle. If his servers used the same protocol, fetching each course after he’s done with the last, it would take four full days to get him his meal. I’m sure they’d quickly figure out that it is more efficient for the courier to retrieve all four courses in one trip. With that one optimization — batching the courses — the castle’s operations manager just cut latency by three days!

How else might they satisfy the king’s need to “order out”?

  • Maybe the courier knows the king beheaded the last cook that served apple pie, so he refuses to even bring that dessert back with the meal. That’s content filtering — don’t waste WAN bandwidth if you know the payload isn’t needed at the remote location.
  • Suppose they had the technology to dehydrate the food on one end and reconstitute it on the other with no negative effects. Each delivery could feed far more people with the same cargo space. That’s like using compression to get more data through the WAN’s bandwidth.
  • Since the cart is empty on the way to pick up the dinner, and probably not completely full on the way back, why not load it up with other items that need to get between those villages, each in a locked box that can only be opened by the recipient. Maybe horseshoes for the stables, arrowheads for the soldiers, or love letters to and from the king’s mistresses? That’s like virtualizing a single WAN connection so many applications can share it, making better use of that expensive bandwidth.

There are many fairly obvious things that can be done to dramatically improve the efficiency and performance of communications over the WAN, but way too often developers need to code these optimizations into their apps because if they just leave it to the messaging infrastructure, let’s say a typical JMS message bus, it will likely serve the four course meal in four days.

That’s not good enough anymore — when the king decide what he wants, it should automatically happen as quickly and efficiently as possible. We’ve spent a decade tuning the LAN for low latency, now it’s time to expect more over the WAN. Review how efficient your infrastructure is over the WAN, and if it isn’t cutting the mustard, then OFF WITH ITS HEAD!