BDAM 02/13: Maintaining full data lineage; Migration & Change Data Capture: CDAP


Details
Big thanks to Ascend.io for hosting and sponsoring this meetup event!
AGENDA
6:00 - 6:30 - Socialize over food and beverages
6:30 - 8:00 - Tech Talks
8:00 - 8:30 - Networking
TALKS
#1: Maintaining full data lineage and governance across billions of data partitions by Steven Parkes, Ascend.io
#2: Moving to the Cloud: Data Migration and Change Data Capture (CDC) with CDAP by Tony Hajdari, Google
#3: The Linux Foundation ONAP project enables 5G & Edge Computing using CDAP by Amar Kapadia, Aarna Networks
ABSTRACTS
#1: Maintaining full data lineage and governance across billions of data partitions
As organizations design and build data pipelines, much of the focus tends to be around the end-to-end operations and orchestration that happens during each run. However, as the data and number of pipelines scale, tracking and optimizing each of these dependencies becomes incredibly brittle and manual to maintain. This can be alleviated by building context-awareness into pipeline.
We’ll also walk through a few use cases on how Ascend has architected a cloud service for building these context-aware autonomous pipelines, leveraging open source technologies such as Spark and Kubernetes to support billions of partitions.
#2: Moving to the Cloud: Data Migration and Change Data Capture (CDC) with CDAP
Moving enterprise data to the cloud can be a daunting process. Beyond the initial data offloading from an on-premise Enterprise Data Warehouse (EDW), enterprises require efficient and scalable mechanisms for keeping data in sync. Until recently the open source community had limited options for CDC. CDAP enables Change Data Capture of relational databases for consuming change data events and updating the corresponding cloud instance to continually keep data between an on-premises warehouse and a cloud warehouse in sync. In this talk, we will discuss use-cases for migrating an EDW to the cloud and keeping both on-premises and cloud instances in sync with CDAP pipelines and plugins.
#3: The Linux Foundation ONAP project enables 5G & Edge Computing using CDAP
5G and edge computing are a once in a generation disruption that will transform every facet of the telecommunications industry. 5G/edge will be software driven using technologies such as NFV, SDN and cloud with open source software playing a significant role. The Linux Foundation ONAP project is an orchestration, management, and automation platform for NFV, SDN and edge computing services. One of the core concepts of ONAP is real-time closed loop automation.
SPEAKER BIOS
-
Steven Parkes is CTO at Ascend.io where he guides architecture development and has implemented many of the core abstractions for Ascend’s semantic scheduler. Prior to Ascend, he built big data infrastructure and applications at both Twitter and Square. He also has experience working with these big data systems from his roles at IBM Research, where he was able to develop against them in their early days.
-
Tony Hajdari is a Customer Engineer on the Big Data specialists team at Google where he works on the open source Big Data Application Platform CDAP (cdap.io). Prior to Google, he worked at Cask Data where he was responsible for field technical services and customer enablement helping customers build the next generation of Big Data applications with less code and greater agility.
-
Amar Kapadia is an NFV specialist and co-founder at Aarna Networks, an open source NFV company providing products and services around the Linux Foundation ONAP project. He is also the author of "ONAP Demystified" book. Previously, Amar held senior management positions at Mirantis, Seagate, Emulex, Philips, and HP and has an MS in EE from the University of California, Berkeley.
PARKING INFORMATION
There is a parking garage directly behind the office, which has free parking after 5pm. There is also street parking in front of office that's open after 6pm.

BDAM 02/13: Maintaining full data lineage; Migration & Change Data Capture: CDAP