Past Meetup

November HUGUK Meetup

This Meetup is past

167 people went


Hi Folks,

We are pleased to announce our November MeetUp at the London Olympia Conference Centre on the evening of November 3rd. Seminar Room 2

We can confirm our first speaker and our Sponsor Cloudera for the event - please see details below.

In the meantime you can also register for the Big Data London Conference taking place on the 3rd and 4th of November at the Olympia Conference Centre - this is required for security and entry to the MeetUp.

Registration is free - please see the following link for full details of the event:

Presentation 1

Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data

Apache Kudu is a fast new columnar data store for the Hadoop ecosystem designed to enable high-performance, flexible analytic pipelines. Being optimized for lightning-fast scans, Kudu is particularly well suited to hosting time-series data such as metrics, machine learning model-building workloads, and data warehousing applications. Despite its impressive scan speed, Kudu also supports operations supported by many traditional data stores, including real-time insert, update, and delete operations. Kudu supports a "bring your own SQL" model, and supports being queried by multiple SQL engines, including Apache Spark SQL, Apache Impala (incubating), and Apache Drill. This talk will discuss what Kudu is, why we decided to build it, what makes it fast, and an example of how it can be used for a time-series use case.


Mike Percy is a Software Engineer at Cloudera and a committer on Apache Kudu (incubating). Prior to joining Cloudera, Mike worked on big data infrastructure for machine learning at Yahoo! Mike holds a BSCS from UC Santa Cruz and an MSCS from Stanford.

Presentation 2

Skool: a new open-source data integration tool for Hadoop

“Skool is a data integration tool which handles the following:

a) data transfer from Hadoop into a relational database (Oracle / SQL Server / MySQL / Neteeza or any JDBC compliant database)

b) data transfer from a relational database into Hadoop (includes automated creation of Oozie workflows and Hive tables)

c) file transfer and Hive table creation for file-based transfers into Hadoop

d) automatic generation and deployment of file creation scripts and jobs from Hadoop or Hive tables

It is suitable for use by data scientists for ad hoc data loads, and also for productionised regular data loads.

The main benefit of Skool is that it simplifies the process for the end user and provides default configuration which avoids the need for detailed knowledge of the underlying technologies. But it is customisable for advanced users.”

Presenter: Gareth Watkins, Big Data Architect in BT’s Data Analytics team

We look forward to seeing you all there.