Technical Look at Using Solace as a Channel for Apache Flume

In a previous blog I explained what the Solace channel for Apache Flume was and how it could be used to streamline data workflow from enterprise message bus to big data infrastructure and vice-versa.  In this blog I will describe how the Solace channel works and what are the technical advantages compared to memory or file channels.

Solace Flume channel receiving Flume Events from Source.

solace-flume-channel-post_1The Flume Source interacts with the Flume channel with 4 methods calls:createTransaction(), doPut(Event) and finally doCommit() or doRollback().

The Flume Channel uses the Solace message router session based transactions to implement the channel transactions.  For more information on Solace session based transactions see Using Local Transactions

If the channel receives doPut(Flume.Event) it takes the Event and transposes it into a Solace message.  This is done by taking the Event header name value pairs and placing them into a Solace message header map, then taking the Events body and writing it as a byte buffer into the body of the Solace message. … Read the rest

Solace as a Channel for Apache Flume

As per the “Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.”

So why would you need anything other than the already available components of Flume to transfer data into your Big Data infrastructure?  Why not just use existing Flume Source and Flume Sinks to connect Flume to your enterprise messaging solutions?

Well the next picture starts to show the problem.  To scale Flume you need to stitch together several Flume Agents in varying hierarchical ways with point-to-point TCP links.  If your data has high value, next you will need to add fault-tolerance and high availability meaning you will need to add disk access and redundancy at each level. … Read the rest

Is the real-time web streaming data in the wrong direction?

Determined SalmonExamples of web streaming have become rather predictable and yawn-worthy. It’s always some variation of streaming real-time stock market data, news and status updates from the cloud to your browser, tablet or phone – classic filtered fan-out data distribution. Sure, there are a few upstream bits like the character inputs used for real-time keyword search completion, or chat applications, but the upstream data is a trickle compared to the fire hose coming downstream. However this model is beginning to flip directions and applications are more frequently streaming large volumes of upstream data with a downstream trickle.

Consider how most Big Data is being collected at the server side today. Click streams, log data, activity streams, search queries – they are all pouring into Kafka, Scribe, or Flume and ending up in a variety of big data repositories. As users increasingly run thicker smartphone, tablet or desktop apps the view from the web server becomes less and less complete.… Read the rest