Cassandra London Meetup - on Hadoop Integration


Details
This month we are focussing on Hadoop Integration. We have one confirmed speaker and one more in the pipeline.
Schedule:
7.00pm Meet and chat with other Cassandra users
7.30pm Talk: Jairam Chandar explains how to integrate Cassandra and Hadoop
8.00pm Talk: Richard Low from Acunu (http://www.acunu.com/) talks about what we've learned from Cassandra performance testing
8.30pm Finish up; more discussions then off to the pub
Please come along!
Jairam Chandar explains how to integrate Cassandra and Hadoop
Summary
Will be talking about Hadoop-Cassandra integration and how VisualDNA (http://www.visualdna.com/) is using Hadoop to analyse data stored in a Cassandra cluster, including a real-world example and some statistics.
Synopsis
VisualDNA is a behavior-based audience discovery and targeting network. We use a patented visual quiz system to profile audiences at scale and anonymously aggregate this information to help publishers better understand their audience. VisualDNA also runs high performance ad campaigns optimized to maximise revenue for e-commerce sites, and optimize branding campaigns.
We use Cassandra as our primary data-store. With increased volumes of data, simple serial php scripts to run analysis started to take ridiculously long to process. Enter Hadoop! One of the processes that took over 48 hours using a php script was done in just over 4 hours!
Cassandra has been supporting Hadoop since 0.6+ with more and more features being added with newer releases. We will be discussing some of these features with one real-world example (and its not a word-count example!) of how one can use Hadoop for analysis over data stored in Cassandra.
Richard Low from Acunu talks about what we've learned from Cassandra performance testing
Summary
We'll show the effect of heavy write loads on Cassandra, in particular on range queries, and explain how Acunu improves on vanilla Cassandra performance and predictability.
Synopsis
Acunu gives “Big Data” applications high and predictable performance, robustness and simple management. By using the Acunu Storage Platform (http://www.acunu.com/solutions/) to power NOSQL stores such as Cassandra, we enable developers 1-to take full advantage of low cost and high performance commodity hardware, 2-to speed the dev/test cycle with Acunu’s unique instant thin clones by letting each developer work with the whole dataset, and 3-to simplify the management and monitoring of their deployment so they can focus on what matters: their applications.
In the first release of the Acunu Storage Platform, we are focussing on Cassandra and have gone through a significant performance benchmarking exercise. In this session, we will present some of the findings and lessons learned.

Cassandra London Meetup - on Hadoop Integration