Big Data Apps Meetup - EDW Optimization, Apache Beam and Apache Nifi!

Name: Big Data Apps Meetup - EDW Optimization, Apache Beam and Apache Nifi!
Start: 2017-05-10T18:00:00-07:00
End: 2017-05-10T21:00:00-07:00
Location: Cask HQ

Hosted by Priyanka N.

Distributed Data Bay Area

Details

Shoutout to Cask (http://cask.co/) for kindly sponsoring and hosting this meetup!

Cask will also be giving away an Amazon Tap (https://www.amazon.com/dp/B01BH83OOM)! Enter the raffle on the day of the event for a chance to win.

AGENDA

6:00 - 6:30 - Socialize over food and beer(s)
6:30 - 8:00 - Talks

TALKS

Talk #1: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask

Talk #2: Future-proof, portable batch and streaming pipelines using Apache Beam, by Malo Denielou from Google

Talk #3: Turning a data pond into a data lake with Apache NiFi, by Gene Peters from Telligent Data

ABSTRACTS

Talk #1: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask

The cost of maintaining a traditional Enterprise Data Warehouse (EDW) is skyrocketing as legacy systems buckle under the weight of exponentially growing data and increasingly complex processing needs. Hadoop, with its massive horizontal scalability, and CDAP which offers pre-built pipelines for EDW Offload in a drag&drop studio environment, can help.

Sagar will demonstrate Cask’s solution, which shows how to build code-free, scalable, and enterprise-grade pipelines for delivering an easy-to-use and efficient EDW offload solution. He will also show how interactive data preparation, data pipeline automation, and fast querying capabilities over voluminous data can help unlock new use-cases.

Talk #2: Future proof, portable batch and streaming pipelines using Apache Beam, by Malo Denielou from Google

Apache Beam is a top-level Apache project which aims at providing a unified API for efficient and portable data processing pipeline. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, Apache Apex, ...) and proprietary (e.g., Google Cloud Dataflow). This talk will cover the basics of Apache Beam, describe the main concepts of the programming model and talk about the current state of the project (new python support, first stable version). We'll illustrate the concepts with a use case running on several runners.

Talk #3: Turning a data pond into a data lake with Apache NiFi, by Gene Peters from Telligent Data

In recent years, there has been a drive for organizations to consolidate their analytic data -- both internal and external -- into a central source of truth: the data lake. But how do you actually go about populating this lake in a scalable, low-latency fashion? Enter Apache NiFi. From piping 3rd party vendor data accessed through RESTful APIs into Apache Kafka clusters, to syncing on-premise HDFS with a cloud-based object store, NiFi provides the glue to bring together the many varied components of a big data ecosystem. At Telligent Data, we use Apache NiFi as the backbone of the software and services we provide. This talk will cover how to take advantage of NiFi's realtime streaming capabilities to replicate siloed data sources into a unified data lake.

SPEAKER BIOS:

• Sagar Kapare is a Software Engineer at Cask where he is building software to simplify data application development. He is also a regular contributor to Apache Tephra. Prior to Cask, he worked on high performance digital messaging platform at StrongView Systems.

• Malo Deniélou is a Software Engineer in the Google Cloud Dataflow team where he works on the Cloud Dataflow managed service and on the Apache Beam SDKs. His main efforts are towards reducing the number of « knobs » that big data system users have to set in order to get the best performance and cost. Previously, Malo Deniélou was a lecturer at Royal Holloway, University of London, where he worked on the theory of distributed systems.

• Gene Peters is the co-founder/CTO of Telligent Data, where he leverages his favorite open source software to bring on-premise big data to the cloud. A contributor to the Apache NiFi project, it's his life mission to put everything into Docker containers. Before starting Telligent Data, he worked on building out the data stack at KIXEYE, a multi-studio gaming company based in San Francisco.

ARRIVAL AND PARKING

Cask HQ is a few minutes walk from the California Avenue Caltrain Station.

Also, Cask HQ has its own parking lot, but it will certainly not accommodate all guests. Please use parking lots available nearby:

https://secure.meetupstatic.com/photos/event/5/b/2/f/600_438983343.jpeg

Distributed Data Bay Area

Big Data Apps Meetup - EDW Optimization, Apache Beam and Apache Nifi!

Distributed Data Bay Area

Details

Related topics

You may also like