Dear HUG UK members,
I am pleased to announce our November Meetup at London's Hilton Metropole (Bleinheim Room) on 11th November.
This event is scheduled to be one of the biggest Meetups this year and we have some really exciting content lined up.
This Meetup will include 3 great presentations from :
Session 1 - Cloudera
Title: Impala: A Modern, Open-Source SQL Engine for Hadoop
Abstract: The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. With Impala, the Hadoop community now has an open-sourced codebase that helps users query data stored in HDFS and Apache HBase in real time, using familiar SQL syntax. In contrast with other SQL-on-Hadoop initiatives, Impala's operations are fast enough to do interactively on native Hadoop data rather than in long-running batch jobs. Now you have the freedom to discover relationships and explore what-if scenarios on Big Data datasets. By taking advantage of Hadoop's infrastructure, Impala lets you avoid traditional data warehouse obstacles like rigid schema design and the cost of expensive ETL jobs.
This talk starts out with an overview of Impala from the user's perspective, followed by a presentation of Impala's architecture and implementation. It concludes with a summary of Impala's benefits when compared with Apache Hive, commercial MapReduce alternatives, and traditional data warehouse infrastructure.
Speaker: Marcel Kornacker is a tech lead at Cloudera for new products development and creator of the Cloudera Impala project. Following his graduation in 2000 with a PhD in databases from UC Berkeley, he held engineering positions at several database-related start-up companies. Marcel joined Google in 2003 where he worked on several ads serving and storage infrastructure projects, then became tech lead for the distributed query engine component of Google's F1 project.
Session 2 - Syncsort
Title: Smarter Big Data Integration for Hadoop
Abstract: Hadoop has become a de facto standard in supporting Big Data analytics. A very common use case for Hadoop is data Transformation and a new way to deliver ETL and SQL migration. With this in mind, Syncsort has made a contribution to Apache Hadoop that not only makes sort pluggable, but also facilitates new and difficult real world ETL use cases and database off-load, working natively within the MapReduce framework. This session will show (including a short demo) how the Syncsort contribution optimises ETL processes which enable vertical scalability and a smarter integration tool set for Hadoop.
Speaker: Steven Haddad, Senior Big Data Solution Consultant, Syncsort
Session 3 - Spotify
Title: From 1 to 100 Hadoop developers: Scaling for developer productivity at Spotify
Abstract: The demand for processed data is increasing exponentially at Spotify and we've found our developer infrastructure to be at least as much of a barrier to that scaling as having the hardware available. This talk will tell the story of what problems we've faced in transitioning fro the Hadoop team being a small annexe of analytics to having dozens of developers throughout the organisation writing code to run on the cluster; and how we're working to solve these by building better developer infrastructure; from how data processing jobs are developed, tested and scheduled, to how the resulting datasets are catalogued to be discoverable and used by other developers
Speaker: David Whiting, Data Infrastructure Engineer, Spotify. David spent 18 months in the data team at Last.fm and since Feburary has been developing data infrastructure at Spotify - making him something of an expert in working with music data sets. He mostly works with Hadoop, but can occasionally be found dabbling in data warehousing, SQL query optimisation and front-end web apps; as well as telling everybody else they're not doing enough testing and that everything is better with static typing. As well as generating music data, he also generates music under the guise of Demoscene Time Machine (http://music.demoscenetimemachine.com/ ), takes part in the occasional triathlon and has some very unusual dance moves.
and will also include a panel discussion (compered by Matt Aslett from 451Research) consisting of the following panelists:
• Doug Cutting from Cloudera
• Steven Totman from Syncsort
• David Whiting from Spotify
• Brett Sheppard from Splunk
The panel will cover topics and questions on the evolution and future of Hadoop, and what the collaboration between open source communities & technologies and traditional technologies & vendors will mean for the next generation of data management solutions.
Look forward to seeing you there.