Skip to content

Details

We have three great talks from visiting contributors/committers who are in town for ApacheCon.

Chris Fregly (IBM)

Real-time, Advanced Analytics and Recommendations using ML, Graph Processing, NLP, and Approximations (featuring Apache Spark, Stanford CoreNLP, and Twitter Algebird)

Starting with a live, interactive demo generating audience-specific recommendations, we'll dive deep into each of the key components including NiFi, Kafka, Stanford CoreNLP, Docker, Word2Vec, LDA, Twitter Algebird, Spark Streaming, SQL, ML, GraphX. As a bonus, we'll discuss the latest Netflix Recommendations Pipeline and related open source projects.

Mike Percy (Cloudera) & Dan Burkert (Cloudera)

Kudu and Spark for Fast Analytics on Streaming Data

Apache Kudu (incubating) is a new storage engine for the Apache Hadoop ecosystem that enables extremely high-speed analytics without imposing data-visibility latencies. Using Apache Spark and Kudu, we show that it is now easy to create applications that query and analyze mutable, constantly changing datasets while getting the impressive query performance that you would normally expect from immutable columnar data formats like Apache Parquet and ORCFile. Kudu delivers this with a fault-tolerant, Spanner-like distributed architecture and a columnar on-disk storage format. This talk provides an introduction to Kudu and demonstrates using Spark and Kudu together to achieve impressive results in a system that is friendly to both app developers and operations engineers.

Xuefu Zhang (Uber)

Hive on Spark, an Uber Use Case

As Hive on Spark has been mature and production ready, Hive community has seen exciting user adoption. Uber has recently built up its Hadoop based data lake, Hive is extensively used to support ETL, BI, and analytics workloads. As the data size as well as user base increases, faster Hive for the same set of workloads is desired. Hive on Spark has demonstrated great potential to meet the need. Here Uber's experience with Hive and Hive on Spark is shared.

Big Thanks to Cloudera for sponsoring the evening.

Members are also interested in