Aller au contenu

Détails

On this second meetup of June, we will talk about NoSQL Database and Big Data using Apache Flink.

We would like to thank to Datastax who provides full support to the Cassandra in Action talk. They will also provide some goodies to thank to your presence.

The meetup will take place in Amadeus, so make sure you reserve your seats !

Agenda

6:15pm - 6:25pm Opening
6:25pm - 6:30 pm Introduction
6:30 - 6:40 Lightning Talk - Fast Introduction to C* Victor Coustenoble, Datastax
6:40pm - 7:30pm C* Experience, with Jean Armel Luce from Orange
7:30pm - 7:40pm Break
7:40pm - 8:30pm Doing even Bigger Data with Apache Flink, with Aljoscha Krettek
from TU Berlin.

Cassandra in Action

The first talk of the evening is about Cassandra where we will learn from Jean Armel Luce the story of putting a NoSQL database like Cassandra in Orange. We will see from technical, and also from development view points. Several points of interest in his presentations:

• The choice of Cassandra over other database and its performance in production.

• Analytics using Hadoop/Hive in Cassandra.

• How we did the isolation between the on line queries and the map/reduce processes ?

• Feedbacks from developers.

• Many other nice stuffs.

Bigger Data with Apache Flink (aka. Stratosphere)

It is intention of Riviera Scala Clojure to be avant-gardist. This talk of Apache Flink enters to this spriti.

We will welcome Apache Flink (aka Stratosphere) http://www.stratosphere.eu), a big data system that adds a set of optimization missing in current systems like Hadoop or Spark. Aljoscha Krettek, one of the main contributors of the project (recently incubated in Apache) will give us the presentation on this very promising tool.

Hadoop and more recently Spark have started to gain adoption in Big Data projects around the globe. While they are both excellents, Big Data community learns more and more the use cases. Machine learning, real time query, large graph processing, and stream processing are the most famous use of the two systems. These use cases show that some optimizations are needed. In memory-processing, optimized join, in iterative process, optimization in shuffling process, and many others are shown to be important. Apache Flink addresses exactly those challenges of optimization. Who knows, Stratosphere might be the next big thing in Big Data ?

Aljoscha will introduce us the architecture of Apache Flink and a very nifty Scala API to work with.

SPEAKERS

Jean Armel Luce is a Senior Software Engineer at Orange, with more than 20 years of software development in various environments. During the last years, he has had to deal with some applications using large databases that require scalability and high availability. A few years ago, Jean Armel had learned about NoSQL and did a large study about a few NoSQL databases (Cassandra, MongoDB, HBase, Hypertable, Riak, …). This study focused on performances, robustness, scalability and exploitability. At the end of his study, Orange decided to replace the relational databases (Postgres and Mysql cluster) with Apache Cassandra for the application PnS (a highly available and critical service for collecting and serving live data about Orange customers), in order to sustain growth in requests and volume of data. The migration to Cassandra was done in 2013, and all the PnS data are now stored in the C* cluster.

Aljoscha Krettek is a main contributor of Apache Flink. did his Bachelor's in a dual studies program together with IBM in Stuttgart. Then moved to Berlin to get a Master's Degree at TU Berlin where he is also working as a Student Researcher with Professor Markl from Database Systems and Information Management Group. Aljoscha is interested in systems programming, language/API design, and functional programming. As a core developer on the Stratosphere System (soon to be Apache Flink), his main responsibility is programming interface design, among other things he developed the current Scala front end.

Les membres s'intéressent également à