Apples, Oranges, and WAN Optimization Appliances

iStock_000002646362XSmall-300x199WAN optimization is a crowded and confusing space full of solutions that all sound alike and promise magic 10-20x performance increases. But when you look at what each solution actually does, they are often completely different and your mileage may vary, so it’s important to understand the various approaches and apply the right tool to each challenge.

I recently presented at QCon in San Francisco on the topic of “Distributed Data Fabrics and Hardware WAN Optimization“. If you have never been to a QCon event, note that it is a geekfest and the attendees have zero tolerance for marketing hype. Developers and architects come to QCon to learn, share, and socialize amongst their peers, not to listen to sales pitches. I like the vibe at QCon and think it’s very much in line with the kind of content that works on a tech blog, rather than a corporate blog, so I’d like to expand the audience of my thoughts here.

lfn-diagramMy talk was about the Long Fat Network (LFN, or sometimes “elephant”) problem and how to distinguish between different ways of solving this problem at the network/transport level (layer 3) versus the application/messaging level (layer 7).

The basic problem of WAN optimization is that a WAN is not just a slow LAN. TCP/IP based apps that run great on a LAN often suck wind on the WAN and the blame often incorrectly goes to the lack of bandwidth. The expensive way to find out this is wrong is to pay for a 10 Gigabit WAN link and find out that throughput is actually a function of the latency, and loss, in addition to bandwidth.

Check out one of the many throughput calculators and you will quickly see that a 10 Gigabit WAN link with 100ms round trip latency and 0.5% packet loss yields just 1.65 Mbps of maximum effective throughput per TCP flow. This is all an artifact of the TCP sliding window and the fact that over long distances the speed of light becomes a significant limit. The further complication is that unlike a LAN, packets get dropped and need to be retransmitted. Even small loss rates stall the advancement of the the TCP windows and kill your throughput.

Why do you care? Databases and data grids are increasingly becoming replicated and distributed across multiple datacenters for redundancy or active/active partitioning and localization. The trend to “big data” is also expanding to sources that span remote datacenters, and even mobile apps and sensors, which all adds more WAN into the mix.

Enter Riverbed and Silverpeak, and a host of other WANop hardware and software vendors that apply compression, forward error correction, and other techniques to speed up the flow of data across the WAN. These solutions are typically not application aware. They operate on TCP packets without understanding what they contain. That can still yield impressive results but it really depends on the use case. Replicating encrypted data or pre-compresses binary data will reduce or eliminate the performance gains that a layer 3 WAN optimization tool can provide.

Some WANop solutions try to go further up the stack and optimize for popular application protocols like NFS or Microsoft’s CIFS file sharing protocols. However the vast majority of application specific WAN traffic still goes unoptimized. This is the target for distributed messaging solutions such as those provided by Solace.

Consider the following example of distributed database replication using a CDC tool like GoldenGate or DataMirror. In the simplest case you might have just two datacenters and want to replicate all updates from one site to the other for disaster recovery. This is a great use case for traditional WANop solutions because the data is point to point and all of the updates at one site go to the other. There is no value in looking at the contents of the updates or filtering based on the values of the data because everything needs to flow across a single WAN link between two sites.

Now consider a more distributed data grid, where some data (reference data) needs to flow between 3 or more global sites and other local data needs to be selectively replicated, or in some cases not replicated at all. This is a great fit for “layer 7” or application aware WAN Optimization because it is necessary to look into the contents of the data and selectively filter and route the updates.

WAN optimization and application-aware WAN Optimization both add value and can even be used together, and you’ll be most successful when you use the right tool for each job. Network optimization is a broad brush that can speed up all your network traffic, while application-specific WAN optimization accelerate the traffic that you care most about to an even higher level.