DataKRK #10 - Mysteries of the universe, Spark and DataFrames

This is a past event

124 people went

Location image of event venue


This just cannot be missed! Have you ever dreamed about working with Large Hadron Collider? If you did (like us) then please stand still... Piotr Turek will tell us about, lo and behold:

Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop

He will talk about the challenges faced, lessons learned and fun he had while reinventing the way offline data analysis is done at one of LHC (Large Hadron Collider) experiments. A journey, which took us to another land: of contemporary Big Data stack, and which finally married those two. Did it make any sense in the end? Come and you will know.

Among other things you will learn:

• the why, what and how of data analysis at CERN

• why latency variability in large distributed systems matters (literally ;))

• why using C++ as a scripting language is both the best and the worst idea ever

• how to implement a reliable Hadoop cluster provisioning mechanism on OpenStack

• how to marry a huge data analysis framework written in C++, with Hadoop 2

• what is the moral of this story

#CERN #OpenStack #Hadoop2 #Sahara #MindBogglingFacts

If that is not enough then we will have two more talks in which:

• Szymon Sobczak will quickly explain how Spark works and why it's a revolution, not evolution from MapReduce.

• Mateusz Buśkiewicz will present DataFrames - great tool for data science, which can be used on a single machine and very a powerful big data tool, when run on Spark cluster.

The event will take place in Tech Space (Wyczółkowskiego 7) on Thursday, June 18th at 6:30 PM! Hope to see you there!