Ingesting Data from Kafka to HDFS with Dedupper & Enrichment using JDBC


Details
Presenter: Dr Sandeep Deshmukh; Committer Apache Apex & DataTorrent Engineer
Schedule:
5.45 - 6.00 : Registration & Networking
6.00 - 6.30 : Introduction to Apex
6.30 - 6.45 : Networking / Snacks Break
6.45 - 7.30 : Ingesting Data from Kafka to HDFS with Transform & Enrichment using JDBC
Abstract:
Ingesting and extracting data from Hadoop can be a frustrating, time consuming activity for many enterprises. DataTorrent Data Ingestion is a standalone big data application that simplifies the collection, aggregation and movement of large amounts of data to and from Hadoop for a more efficient data processing pipeline. DataTorrent Data Ingestion makes configuring and running Hadoop data ingestion and data extraction a point and click process enabling a smooth, easy path to your Hadoop-based big data project.
In this series of talks, we would cover how Hadoop Ingestion is made easy using Apache Apex. The second talk in this series would focus on ingesting unbounded data from Kafka to HDFS with couple of processing operators - Transform and enrichment.
Apex
Apache Apex is a next generation native Hadoop data in motion platform that is being used by customers for both streaming as well as batch processing. Common use cases include ingestion into Hadoop, streaming analytics, ETL, database off-loads, alerts and monitoring, machine model scoring, etc. Apache Apex completely separates operational logic from business logic, and handles all operational aspects. This enables developers to concentrate on business logic and reduce time to market as well as total cost of ownership.
For deeper engagement with Apache Apex (http://apex.apache.org/)- follow ApacheApex (https://twitter.com/apacheapex), presentations (http://www.slideshare.net/ApacheApex), recordings (https://www.youtube.com/user/datatorrent), download (community (https://www.datatorrent.com/download/datatorrent-community-edition-download-meetups/), sandbox (https://www.datatorrent.com/download/datatorrent-rts-sandbox-edition-download-meetups/)), Apache Apex releases (http://apex.apache.org/downloads.html), docs (http://apex.apache.org/docs.html)

Ingesting Data from Kafka to HDFS with Dedupper & Enrichment using JDBC