Skip to content

14. Introducing Apache Flink (+) Hadoop Operations Powered By ... Hadoop

Photo of Adam Kawa
Hosted By
Adam K.
14. Introducing Apache Flink (+) Hadoop Operations Powered By ... Hadoop

Details

We are excited to invite you to the next meeting of Stockholm Hadoop User Group! This time, we will have two technical presentations! Please find the details below:

--------------------------------------------------

Presentation 1:

--------------------------------------------------

Topic: Introducing Apache Flink - A new approach to distributed data processing

Speaker: Stephan Ewen

Abstract:

The talk introduces the Apache Flink (incubating) project, (http://flink.incubator.apache.org), a new project at the Apache Software Foundation that is compatible with the Hadoop ecosystem and runs on top of HDFS and YARN. Flink pushes the technology forward in many ways: The system is built on the principle "write like a programming language, execute like a database", using a unique style of execution engine that aggressively uses in-memory execution, but very gracefully degrades to disk-based execution when memory is not enough, allowing very robust execution behavior. Flink introduces native closed-loop iteration operators, making graph analysis and machine learning application very fast on the platform.Flink programs are not executed directly but are optimized by Flink's cost-based optimizer This means that Flink applications require little (re-)configuration and little maintenance when the cluster characteristics change and the data evolves over time. Finally, Flink's runtime is a true data streaming engine, and ongoing work in the community is unifying batch and true stream processing (rather than mini batches) in a single system. Flink is an active open source project with more than 50 contributors from industry and academia.

Bio:

Stephan Ewen is one of the originators and committers of the Apache Flink project, and co-founder of the Berlin-based startup “Data Artisans”. He was a Ph.D. student at University of Technology, Berlin, where he co-initiated the Stratosphere project (out of which Flink originated) and published several papers on data analytics technologies. Stephan has worked for Microsoft Research and IBM Research on their database products.

--------------------------------------------------

Presentation 2

--------------------------------------------------

Topic: Hadoop operations powered by ... Hadoop

Speaker: Adam Kawa

Abstract:

At Spotify we collect huge volumes of data for many purposes. Reporting to labels, powering our product features, and analyzing user growth are some of our most common ones. Additionally, we collect many operational metrics related to the responsiveness, utilization and capacity of our servers.

To store and process this data, we use scalable and fault-tolerant multi-system infrastructure, and Apache Hadoop is a key part of it. Surprisingly or not, Apache Hadoop generates large amounts of data in the form of logs and metrics that describe its behaviour and performance. To process this data in a scalable and performant manner we use … also Hadoop!

During this presentation, I will talk about how we analyze various logs generated by Apache Hadoop using custom scripts (written in Pig or Java/Python MapReduce) and available open-source tools to get data-driven answers to many questions related to the behaviour of our +860-node Hadoop cluster. At Spotify we frequently leverage these tools to learn how fast we are growing, when to buy new nodes, how to calculate the empirical retention policy for each dataset, optimize the scheduler, benchmark the cluster, find its biggest offenders (both people and datasets) and more.

Bio:

Adam Kawa works as Data Engineer at Spotify, where his main responsibility is to build and maintain one of the largest Hadoop-YARN clusters in Europe. Every so often, he implements and troubleshoots MapReduce, Hive, Pig and Tez applications. Adam has also been working as Hadoop instructor for more than 2 years. He regularly blogs about the Hadoop ecosystem at HakunaMapData.com.

Photo of Stockholm Hadoop User Group group
Stockholm Hadoop User Group
See more events
Spotify Office
Birger Jarlsgatan 61 (11tr) · Stockholm