May 2015 Meetup

Details
Join us on the evening of this year's Strata+Hadoop World London on the 5th of May.
We have a jam packed session lined up at one of this year's biggest HUGUK events, with speakers from Cloudera, Attunity, BT, Dato, and BigBoards covering a great range of topics for us:
- Introducing Hive's New Execution Engine
Presenter: Xuefu Zhang, Software Engineer, Cloudera
The Hive on Spark initiative (HIVE-7292) is probably the most-watched project in Hive, with 120+ watchers. The effort has attracted developers both from communities around the world, and from vendors such as Intel, IBM, Cloudera, and MapR.
Apache Hive has become de facto standard for batch-oriented SQL workloads in the Hadoop ecosystem. With its open architecture and backend neutrality, Hive queries can currently run on MapReduce and Tez. However, Apache Spark as an open-source data analytics cluster computing framework has gained significant momentum recently, and marrying the two—that is, providing a new execution engine to Hive—has many benefits for Spark users as well Hive users.
This presentation will talk about the motivation, design principles, architecture, challenges, and current status of the project followed by a live demo.
- From Trunk to Tail: How to Incorporate Hadoop into Your Whole Enterprise IT Environment
Presenter: Ted Orme, Business Alliances and Technical Director, Attunity
Making better and faster decisions is a common goal for organisations striving to achieve competitive advantage. And, the more expensive and difficult it is to manage growing data volumes and enable real-time data delivery, the more challenging it is to ensure that your data can provide value to the enterprise.
In this session, Ted Orme will share the Trunk-to-Tail (end-to-end) methods that you can use to solve these challenges by incorporating Hadoop and Attunity technology into your existing IT environment. Specifically, he’ll focus on how companies can streamline the process of understanding data usage, as well as moving and analysing data and files.
Once you can better understand which data and files can or should be moved, where, and how, then you can create an integrated model to help the business improve its processes, reduce risk, and add value to its bottom line.
Attend this session and learn how a major US bank offloaded a 600TB DB2 Data Warehouse to a Hadoop cluster. Hear the story and see a live demo.
- Building Machine Learning Applications with Dato: From Inspiration to Production on Hadoop
Presenter: Danny Bickson, co-founder, Dato
Machine learning brings a huge amount of value to a variety of business applications. However, building a machine learning enabled app is often an onerous process that requires multiple months and numerous teams. Dato (formerly known as GraphLab), built Machine Learning Platform that enables data scientists to building and deploying predictive applications at scale. In this talk, we will give an overview and a quick demo (if there’s time) of the key steps: data cleaning, feature engineering, model building, and deployment. With this suite of tools, the same code works for prototyping and production, whether on your personal laptop, on the cloud, or on a Hadoop cluster.
Danny Bickson is a co-founder of Dato, formerly known as GraphLab. Danny is a notable expert in large-scale machine learning and the writer of the the popular blog Large Scale Machine Learning and Other Animals. Prior to co-founding Dato/GraphLab, he was a project scientist at Carnegie Mellon University, working on the GraphLab PowerGraph project from its early stages. Danny’s applied research is focused on the intersection of distributed algorithms, machine learning and big data.
- BigBoards Hex, from 0 to Big Data in less than 1 hour
Presenter: Daan Gerits, co-founder and CTO, BigBoards
BigBoards' Hex is your micro-datacenter that sits on your desk. With 6 computer nodes and 6 TB of storage it is the ideal laboratory to learn and experiment with big data technologies and data science tools. You Hex comes fully pre-installed. Just feed it network and power to boot it, browse to the dashboard to operate it and use our library with big data tints. The library is like the appstore on your smartphone: just click and install various data processing technologies, datasets and tutorials. We'll give a short presentation on the background of our product and complement it with a demo.
Daan Gerits is an entrepreneurial geek with 12 years experience as architect at scale, leading technology teams in building future proof platforms. Linchpin. Relentlessly seeking to improve software development by better tooling. Co-organizer of BigData.be since mid 2011. Co-founder of BigBoards.io since 2014.
- A Game of Elephants (How an old elephant(like BT) adopts a new elephant(like Hadoop))
Presenter: Phillip Radley, Chief Data Architect, BT
In this session Phill will share experiences integrating Hadoop into a large and complex enterprise IT environment. In particular looking at getting the foundations right and adopting an incremental approach that allows early wins using simple features and gradually increasing functionality. He will explain BT’s approach to integrating Hadoop with Enterprise security and how this is being evolved to handle front end BI and ETL tools. The session will cover challenges associated with running a multi-tenant Hadoop as a Service platform and how to on-board application teams that know nothing about -Hadoop.
------------------------------------------------------------------------------------
Doors Open at 6:30pm.
Networking, Food and Drinks before and after the presentations.
Looking forward to seeing you all next week!
HUGUK Team.

May 2015 Meetup