Focusing on Ingest into Hive/Impala and Streaming with Kafka

Name: Focusing on Ingest into Hive/Impala and Streaming with Kafka
Start: 2016-08-25T18:00:00-07:00
End: 2016-08-25T21:00:00-07:00
Location: Cloudera

Hosted By SF Bay Area Data Ingest Meetup

public group

Focusing on Ingest into Hive/Impala and Streaming with Kafka

Details

Continuing our series of ingest-focused meetups, we'll look at StreamSets (https://streamsets.com/)' new feature for ingesting variable data into Apache Hive, and streaming architecture for Apache Kafka.

This event is kindly hosted by Cloudera (http://www.cloudera.com/).

6 - 6:30 pm - Food and networking.

6:30 - 7:15 pm - Jarcec Cecho (https://www.linkedin.com/in/jarcec), Santhosh Kumar (https://www.linkedin.com/in/santhosh-kumar-manavasi-lakshminarayanan-5aa0b123) & Junko Urata (https://www.linkedin.com/in/junkourata), Software Engineers at StreamSets, "Drifting With Hive & Impala - From Earth to Mars Without Losing a Single Atom"

Importing data into Hive is one of the most common use cases in big data ingest, but gets tricky when data sources 'drift', changing the schema of incoming data. StreamSets' Hive Drift Solution detects drift in incoming data and updates corresponding Hive tables. The solution enables creating and updating Hive tables based on record requirements and writing data to HDFS based on record header attributes.

Jarcec, Santhosh and Junko will discuss data drift, the challenges it poses in ingesting data into Hive, and how they implemented a solution.

7:15 - 8 pm - Maheedhar Gunturu (https://www.linkedin.com/in/maheedhargunturu), Software and Solutions Architect at VoltDB, "How to Simplify Your Streaming Data Architecture with Kafka and VoltDB"

The story of Fast Data and how to get what you want

Writing mission-critical applications on top of streaming data requires high throughput, scalability and event processing without compromising "non negotiables" such as transactional consistency, and resiliency in a distributed computing environment. Kafka is becoming a default mechanism of moving data between layers. This can be observed in its adoption by Teradata, HPE, MapR with their respective ingestion layers.

A common challenge is how to manage the ingesting and processing of data while ensuring transactional consistency and meeting stringent latency SLAs for demanding throughput levels. There are several disparate approaches for accomplishing this using a combination of open source and proprietary technologies. In this talk, we'll help you understand how a simplified architecture can deliver performance and reliability without the guesswork.

You will learn:

• How to make Kafka imports more actionable

• Ensure scalable, fully consistent data with synchronous command logging

• Meet low latency SLA requirements

• How to guarantee the aforementioned non-negotiables

Events in Palo Alto, CA