Skip to content

Malhar&Geode Integration; Ingest: Kafka to Hadoop/Apex & results into Ampool

Photo of Amol Kekre
Hosted By
Amol K.
Malhar&Geode Integration; Ingest: Kafka to Hadoop/Apex & results into Ampool

Details

Do come to pick up Apache Apex T-Shirts

This event will cover three talks. Agenda is as follows

6:00pm - Food & Drinks, Socialize

6:15pm - Apache Geode, and Pivotal's leadership role in open sourcing Geode by Nitin Lamba

6:45pm - Q&A

6:50pm - Details of integration of Apache Apex with Apache Geode by Ashish Tadose

7:35: Q&A

7:45pm - A demo of a big data AdTech pipeline that shows ingestion from Kafka into Hadoop using Apex. Compute and transformations using Apex and finally a load (egress) int Geode by Vitthal Gogate

8:20pm - Q&A, and Socialize

Sponsored by Ampool

Talk 1: Title - Pivotal's effort on Apache Geode by Nitin Lamba

Abstract: Nitin will discuss rationale behind Apache Geode and walk through the leadership role played by Pivotal in OSS efforts of Apache Geode.

Bio: Nitin Lamba leads product management at Ampool, a company he co-founded last year. Prior to Ampool, he worked at a robotics company, which builds ocean drones using a real-time Java platform. Before that industrial IoT start-up, he had been with Pivotal for over a year leading in-memory data grid and monitoring/management of Data Fabric products. Nitin moved to SF Bay Area few years ago after a long stint at Honeywell for over 10 years! He held several progressive engineering and business roles at various Honeywell divisions, including strategic marketing, business development and product management. An engineer by education, Nitin still likes to tinker and code in his free time.

Talk #2: Title - Apex & Geode In-memory computation, storage & analysis Abstract: Apache Apex & Apache Geode are two very promising incubating open source projects, combined they promise to fill gaps of existing big data analytics platforms.

Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing.

In this session we will talk about use cases and on-going efforts of integrating Apex and Geode to build scallable & fault tolerant RealTime streaming applications that ingest from various sources and egress to Geode.

Use case 1 - Geode as data store to write streaming processed data computed by Apex which is powering user applications or dashboards.

Use case 2 - Apex application reading data from Geode cache and use it for data processing.

Use case 3 - Apex platform's operator checkpointing in Geode to improve performance of Apex batch operations.

Bio: Ashish Tadose is a technical lead at Ampool, and worked at PubMatic, as a Lead Engineer, Big Data & Analytics, where he led a team driving large scale data ingestion and real-time streaming analytics solutions. Ashish is experienced in design & implementation of scalable streaming analytics technologies such as Apache Storm, Kafka, Kinesis, Flink, Spark Streaming & Apex. Ashish also delivered data infrastructure to facilitate large scale data ingestion from 6 geographic regions in both AWS cloud and in-prem using Kafka and Apex. Prior to PubMatic, Ashish worked at Verisign as Senior Software Engineer in Big Data Team where he worked on projects which required large scale data processing using Hadoop and MapReduce. Ashish holds Bachelors and Masters degree in Computer Science and passionate about development of products leveraging distributed computing platforms.

Talks #3: Title – AdTech Pipeline: Kafka->Apex->Geode

Abstract: Demonstration of a common big data AdTech data pipeline using Kafka, Apex, and Geode. The data source is Kafka, and Geode is used to store results of computations. Data will be ingested into Hadoop using Kafka input connector from Apache Malhar. Computations and transformations will then be performed on this data by an Apache Apex application that runs natively in Hadoop. The results are then loaded (egressed) into Geode for UI queries.

Bio: Vitthal Gogate is a Hadoop veteran who has worked on various Hadoop components. His work experience includes Senior Research staff engineer at IBM; Solutions Architect role in Yahoo! Hadoop; Chief Architect and Product Manager of Pivotal Hadoop Distribution; Architect for Hadoop Installation, Management & Monitoring product at Hortonworks. Vitthal founded the Open Source project “Apache Ambari” and serve as a PMC member and Committer for it in Apache open source foundation. Vitthal is also the main contributor to Vaidya – a performance advisor framework for MapReduce.

Photo of Scalable Architecture for IoT, AI and Blockchain group
Scalable Architecture for IoT, AI and Blockchain
See more events
DataTorrent Inc
2833 Junction Ave, Suite 200 · San Jose, CA