Baseball, the cloud, and big data

As the release of the upcoming movie Moneyball approaches, it’s inevitable that we’ll be hearing much more about baseball’s sabermetrics. ReadWriteWeb is out front of the topic with a nice article today on How Big Data and the iPad have Fundamentally Changed Baseball.

This article ties together three of my favorite topics – big data analytics, cloud computing infrastructure and baseball. Take a look at the iPad dashboard at right and think about a starting pitcher and catcher sitting together on a flight using this kind of highly-visual tool to decide how to pitch each hitter. Now think about the rudimentary paper-based systems from five or ten years ago. Which pitcher has the edge? I’m sure that comparable data exists for batters on pitcher’s tendencies and release points, but it does appear that the overwhelming advantage of this technology favors the pitcher/catcher game plan.

But do the results back up the theory? Take a look at the aggregate statistics across major league baseball the last 5 years. Since 2006 both league batting average and runs per game have fallen every single year. Batting averages are down from .269 to .255 and runs per game are down from 4.86 to 4.28 (per team). Only one offensive stat is up in each of those years – strikeouts, from 6.52 per team per game to 7.06. MLB is marketing this as a pitching renaissance. Maybe it’s a data/knowledge renaissance? Pitchers simply have better tools than hitters and it’s showing up as a reduction in offense.

This phenomenon is not at all unique to baseball. What Moneyball and sabermetrics did for baseball happened a decade before in financial services. Human traders that specialized in arbitrage and technical chart reading have been overrun by automated systems that perform split-second algorithmic trading resulting in billions in profits. Online giants like Google and Amazon have applied similar big data analytics techniques to know what you want to buy before you do. I would be willing to bet that a similar explanation is behind recent record big company profits, despite the luke warm global economy. The past decade has seen an explosion in the number and accuracy of tracked company metrics, along the lines of the baseball dashboard above. It’s the same story – interactive dashboards and process automation are buoying decision making accuracy, resulting in fewer mistakes and greater corporate efficiency. This is a classic example of what Greenspan would have called “worker productivity improvements enabled by technology”.

We are marching inexorably towards a interconnected world of huge volumes of ever changing data and anywhere, anytime access which will make today’s incredible improvements look quaint when we look back ten years from now. Whether you work in a baseball front office, at a leading internet company, in corporate IT or supply companies like these with the world’s most scalable messaging middleware it’s an exciting time to be building applications that connect the web and mobile worlds with the processing power of the cloud in real time.