Posts

Big Data Rivers Webinar

Every wonder about the impact of data in motion? Or how lambda architecture is used to drive real world big data solutions?

Sumeet Puri, Solace’s global head of systems engineering, shared his experiences and expertise in a webinar covering those topics and more. In particular, Sumeet discussed event-driven architectures and how data movement patterns vary across such architectures, not being limited to request/reply exchanges. He also focused on the importance of open data movement, i.e. the distribution of data using standard protocols and open APIs across diverse cloud and on-premise environments.

The technologies used to implement big data vary across use cases and continue to evolve, but the common thread, as Sumeet put it, is that “Data lakes need data rivers to feed them”.

sumeet-data-river-slideSumeet illustrated with analogies and anecdotes some ways to deal with sync and async data, and how data rivers apply to financial services use cases such as the processing of orders and online payments.… Read the rest

FPGAs and Cyclical Fashion Trends

I saw a headline in the Wall Street Journal speculating that the several year trend towards men wearing beards was coming to an end.  Like all shifts in fashion, nobody can say for sure what causes a trend to turn, but I hypothesize it might have been this photo.

In the world of computing, the rise of the cloud has left little oxygen in the press for anything other than the storyline that Amazon, Microsoft and Google (the tech fashionistas) have decided cheap servers running Linux are the answer to every computing problem. When one unit of compute isn’t enough, you write your software to split the workload two ways, or four, or eight, or a thousand. If you find that hard, too bad, hire smarter people like Google and Amazon.

Horizontal scale on cheap CPUs became a religion, and anything that didn’t look like AWS EC2 was like telling the girlfriend who bought you a flannel shirt for your birthday, “sorry, I don’t want to grow a beard”.… Read the rest

Does Your Data Have a “Best Before” Date?

Today I presented as part of a webinar about real-time streaming analytics alongside Forrester analyst Mike Gualtieri. He described how streaming analytics differs from traditional analytics where data is collected, loaded into an analytics engine and analyzed after some period of time.

I was intrigued by Mike’s characterization of the “perishability” of information, that is, how much opportunity for insight is lost as different kinds of events age. Equating this observation to everyday life, I bet you don’t usually check the expiration date on a bag of chips. But if you’re like me you do glance at the stamp on milk, and if the date is coming up you’ll give that gallon a sniff before pouring it all over your Lucky Charms.

Which events in your business have long shelf lives, and which ones become useless before you can identify what happened or have a chance to act?

Everything depends on context, of course, and streaming analytics introduces many new opportunities for insight and action.… Read the rest

Technical Look at Using Solace as a Channel for Apache Flume

In a previous blog I explained what the Solace channel for Apache Flume was and how it could be used to streamline data workflow from enterprise message bus to big data infrastructure and vice-versa.  In this blog I will describe how the Solace channel works and what are the technical advantages compared to memory or file channels.

Solace Flume channel receiving Flume Events from Source.

solace-flume-channel-post_1The Flume Source interacts with the Flume channel with 4 methods calls:createTransaction(), doPut(Event) and finally doCommit() or doRollback().

The Flume Channel uses the Solace message router session based transactions to implement the channel transactions.  For more information on Solace session based transactions see Using Local Transactions

If the channel receives doPut(Flume.Event) it takes the Event and transposes it into a Solace message.  This is done by taking the Event header name value pairs and placing them into a Solace message header map, then taking the Events body and writing it as a byte buffer into the body of the Solace message. … Read the rest

Merging the Megatrends – Big Cloudy Things

 

For a couple of years now, the press and industry analysts have been focused on the three megatrends that will shape the future of computing. You know the script – everything is either about big data, cloud computing or the Internet of Things.

Lately I’ve seen these three trends coming together to create interesting new use cases. For example, this Forbes article makes the case that IoT is the Killer App for Big Data. The author covers two in the title, and he hits the trifecta when he points out that most of that IoT data flows through the cloud on its way to the big data store.

That’s three for three — the megatrends collapsed into one use case.

In our business, we see the same pattern emerging. We see all kinds of use cases tapping into the IoT at some point, everybody’s moving some (or lots) of their infrastructure to some variation of the cloud, and data volumes are climbing as both a cause and effect of all this change.… Read the rest

Flying High: How Big Data will Reduce Delays in Your Air Travel

There was a fascinating article this week in ReCode about efforts to squeeze more efficiency out of the world’s aviation networks using smarter algorithms to isolate the impact of weather events. From the article:

“Bad weather is the cause of 70 percent of all traffic delays within the U.S. National Airspace System, according to the Federal Aviation Administration (FAA), adding about $6.7 billion a year in passenger costs. An estimate from the Congressional Joint Economic Committee puts the annual cost of delayed flights at about $40 billion. That spending also adds up to lost time. In fact, in 2013 alone, the airline industry experienced more than 12 million minutes worth of weather-related delays.”

Air traffic controllers do their best with the data available, but will err on the side of caution on any judgement calls in the name of public safety. According to the article, up to 66% of those decisions to delay should be preventable with input from algorithms that use the very latest weather and flight data, and study rates of meteorological change as it applies to the specific flight situation.… Read the rest

Solace and DataTorrent Partner to Enable Real-Time Ingestion and Analysis of Streaming Big Data

The Combination of Solace Messaging Technology and DataTorrent RTS Provides Complete Real-Time Analytics Solution

OTTAWA, Ont., and SANTA CLARA, Calif., December 14, 2015 – Solace Systems, a leading provider of data movement technology, today announced a strategic alliance with DataTorrent, a leader in real-time big data analytics and creator of DataTorrent RTS, the world’s first open-source, enterprise-grade unified platform for both stream and batch processing on Hadoop.

With the introduction of Hadoop 2.0, an increasing number of companies are looking to move from batch-based to real-time analytics, and investing in new technologies that support the collection, filtration and analysis of real-time information.

Solace’s world-class messaging hardware and software enables streaming of very large volumes of information from diverse sources into DataTorrent RTS, giving companies the power to take action in real time and drive greater business value from their data. Solace and DataTorrent both offer elastic capacity to keep up with rising data volumes, and DataTorrent is one of the few applications that can scale to capture as much data as Solace can deliver.

Read the rest

Solace as a Channel for Apache Flume

As per the flume.apache.org: “Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.”

So why would you need anything other than the already available components of Flume to transfer data into your Big Data infrastructure?  Why not just use existing Flume Source and Flume Sinks to connect Flume to your enterprise messaging solutions?

Well the next picture starts to show the problem.  To scale Flume you need to stitch together several Flume Agents in varying hierarchical ways with point-to-point TCP links.  If your data has high value, next you will need to add fault-tolerance and high availability meaning you will need to add disk access and redundancy at each level. … Read the rest

Is Big Data Recreating the Messaging Proliferation Problem?

The previous blog post in this series explained how many capital markets firms have ended up with a multitude of messaging technologies to handle different data movement requirements, the problems associated with that, and how Solace technology helps simplify such environments. This post drills in to how big data is leading companies in other industries down this same path.

According to Gartner, as of June 2014, 73 percent of organizations have or will be investing in big data in the next two years. While big data is intended to support both structured and unstructured data, many organizations start by analyzing their structured transactional data because it’s easier for them to understand, and can be more readily applied to optimizing operational efficiency.

During this phase, enterprises focus on how the big data technology will extract value from their data. They load all kinds of information into their big data lake using whatever tools are at hand or easiest to add to the mix.… Read the rest

Infographic: Real-Time Information and The Data Deluge

In our industry, the meaning of terms often shift over time. Case in point, the term “real time” has been around for decades, but its meaning has changed several times.

Initially, the term real time was used in high performance computing circles to describe problems that required correlation between events within a latency budget. For example, a high-tech manufacturing process step that needs to be altered within 3 milliseconds of a given condition being identified. Those real-time computing systems were optimized for timing predictability.

Later, the term real time came to mean up-to-the-minute or second information – kind of the opposite of batch oriented. At the time, batch was still the dominant architecture for sharing information across systems and geographies due to the high cost of adequate WAN bandwidth and computing power. Making any system real time required serious cost justification, otherwise, batch prevailed.… Read the rest