Skip to content

Ingesting Data from Kafka to HDFS with Dedupper & Enrichment using JDBC

Photo of Apex Users Group Organizer
Hosted By
Apex Users Group O.
Ingesting Data from Kafka to HDFS with Dedupper & Enrichment using JDBC

Details

Presenter: Dr Sandeep Deshmukh; Committer Apache Apex & DataTorrent Engineer

Schedule:

5.45 - 6.00 : Registration & Networking

6.00 - 6.30 : Introduction to Apex

6.30 - 6.45 : Networking / Snacks Break

6.45 - 7.30 : Ingesting Data from Kafka to HDFS with Transform & Enrichment using JDBC

Abstract:
Ingesting and extracting data from Hadoop can be a frustrating, time consuming activity for many enterprises. DataTorrent Data Ingestion is a standalone big data application that simplifies the collection, aggregation and movement of large amounts of data to and from Hadoop for a more efficient data processing pipeline. DataTorrent Data Ingestion makes configuring and running Hadoop data ingestion and data extraction a point and click process enabling a smooth, easy path to your Hadoop-based big data project.

In this series of talks, we would cover how Hadoop Ingestion is made easy using Apache Apex. The second talk in this series would focus on ingesting unbounded data from Kafka to HDFS with couple of processing operators - Transform and enrichment.

Apex

Apache Apex is a next generation native Hadoop data in motion platform that is being used by customers for both streaming as well as batch processing. Common use cases include ingestion into Hadoop, streaming analytics, ETL, database off-loads, alerts and monitoring, machine model scoring, etc. Apache Apex completely separates operational logic from business logic, and handles all operational aspects. This enables developers to concentrate on business logic and reduce time to market as well as total cost of ownership.

For deeper engagement with Apache Apex (http://apex.apache.org/)- follow ApacheApex (https://twitter.com/apacheapex), presentations (http://www.slideshare.net/ApacheApex), recordings (https://www.youtube.com/user/datatorrent), download (community (https://www.datatorrent.com/download/datatorrent-community-edition-download-meetups/), sandbox (https://www.datatorrent.com/download/datatorrent-rts-sandbox-edition-download-meetups/)), Apache Apex releases (http://apex.apache.org/downloads.html), docs (http://apex.apache.org/docs.html)

Photo of Data Science & Data Engineering group
Data Science & Data Engineering
See more events
Pune IT Park, Building C, 9th Floor
Bhau Patil Rd, Pragati Nagar, Bopodi · Pune