Skip to content

Meeting on Streamsets, Datameer and Kudu: Final Agenda

Photo of Matthias Vallaey
Hosted By
Matthias V.
Meeting on Streamsets, Datameer and Kudu: Final Agenda

Details

Agenda:

18:30: Welcome with sandwiches

19:00: Apache Kudu (https://kudu.apache.org/): Fast Analytics on Fast Data - Mike Percy, Cloudera

Apache Kudu is a fast new columnar data store for the Hadoop ecosystem designed to enable high-performance, flexible analytic pipelines.

19:45: Datameer (http://www.datameer.com): Make Big Data Analytics easy for everyone - Eelco Jan Boonstra & Erik Stalpers, Datameer

Joint Cloudera/Datameer Use Case regarding Customer Segmentation followed by a demonstration.

20:30: Rapid data ingestion pipelines with StreamSets (https://streamsets.com/) - Robert Gibbon, Big Industries

In this talk Rob Gibbon will turn the microscope on StreamSets, a new, open source streaming data ingestion system for the Hadoop ecosystem and friends.

Rob will give us an overview of this useful tool, guide us through the process of developing a data ingestion pipeline, and look at options for extending the base functionality.

21:00: Close

More Info on Apache Kudu: Fast Analytics on Fast Data

Apache Kudu is a fast new columnar data store for the Hadoop ecosystem designed to enable high-performance, flexible analytic pipelines. Being optimized for lightning-fast scans, Kudu is particularly well suited to hosting time-series data such as metrics, machine learning model-building workloads, and data warehousing applications. Despite its impressive scan speed, Kudu also supports operations supported by many traditional data stores, including real-time insert, update, and delete operations. Kudu supports a "bring your own SQL" model, and supports being queried by multiple SQL engines, including Apache Spark SQL, Apache Impala (incubating), and Apache Drill. This talk will discuss what Kudu is, why we decided to build it, what makes it fast, and an example of how it can be used for a time-series use case.

Bio:

Mike Percy (https://www.linkedin.com/in/mpercy) is a Software Engineer at Cloudera and a PMC member / committer on Apache Kudu and Apache Flume. Prior to joining Cloudera, Mike worked at Yahoo! on big data infrastructure for machine learning at scale. Mike holds a BSCS from UC Santa Cruz and an MSCS from Stanford.

Photo of Belgium Cloudera User Group group
Belgium Cloudera User Group
See more events
Cronos - meeting room Atlantis
Veldkant 39 · Kontich