29th meetup - Pre-Hadoop-Summit gathering

This is a past event

77 people went

Details

Maybe you didn't know it yet, but this year's European Hadoop Summit (http://2015.hadoopsummit.org/brussels/) is taking place in Brussels on 15-16 April 2014. The night before, the organizers are so kind to provide us with exceptional speakers after a free reception with food and drinks!

Agenda

17:30 – 18:15 Open reception at Panoramic Hall, Level 5

18:30 – 21:00 Presentations at Studio 214, Level 2

Adding Insert, Update, and Delete to Hive (Alan Gates)

Apache Hive provides a convenient SQL query engine and table abstraction for data stored in Hadoop. Hive uses Hadoop to provide highly scaleable bandwidth to the data, but until recently did not support updates, deletes, or transaction isolation. This has prevented many desirable use cases, such as updating of dimension tables or doing data clean up. We have implemented the standard SQL commands insert, update, and delete allowing users to insert new records as they become available, update changing dimension tables, repair incorrect data, and remove individual records. This also allows very low latency ingestion of streaming data from tools like Storm and Flume. Additionally, we have added ACID compliant snapshot isolation between queries so that queries will see a consistent view of the committed transactions when they are launched. This talk will cover the intended use cases, architectural challenges of implementing updates and deletes in a write once file system, performance of the solution, as well as details of changes to the file storage formats and transaction management system.

Beyond the Tweeting Toaster: (I)IoT Streaming Analytics With Apache Storm, Kafka, and Arduino (Taylor Goetz)

Sensors have been all around us for a long time: In our homes, buildings, cars, even in our pockets and around our wrists. The difference today is that these devices are becoming increasingly network-enabled and low-cost. Gartner predicts there will be 26 billion devices on the Internet of Things by 2020. Capturing and analyzing the data from these devices provides a wealth of opportunity, but where do you start? In this session we will look at how streaming sensor data fits into a variety of (I)IoT analytics use cases, and how Apache Storm and Kafka fit into an overall architecture for large-scale streaming analytics. You will also learn how to leverage the highly accessible Arduino microcontroller platform to create low-cost sensor networks and stream data to Apache Storm for analysis in real time. Finally, we will give a live demonstration of sensor analysis using Kafka, Storm, and an out-of-the-box Arduino board (no soldering required!).

Alan Gates, Co-founder of Hortonworks.

Alan, one of the founders of Hortonworks, is an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan also designed HCatalog and guided its adoption as an Apache Incubator project. Alan has a BS in Mathematics from Oregon State University and a MA in Theology from Fuller Theological Seminary. He is also the author of Programming Pig, a book from O`Reilly Press.

Taylor Goetz, PMC and Apache Storm committer.

P. Taylor Goetz is the Apache Storm PMC Chair and has been involved with the usage and development of Storm since it was first released as open source in September of 2011. As an active contributor to the Storm community, Taylor has lead a number of open-source projects related to Storm. Specifically, he has authored a number of open source storm projects which enable enterprises to integrate Storm into heterogeneous infrastructure.