Evening with Google Cloud, Distributed DataFrame, and Apache Flink


Details
Please join us for another exciting evening to share knowledge and experience with Apache Flink community.
Tentative schedule:
6pm-6:30pm - Door open and socialize
6:30pm-8pm - Talks
Abstracts
• Apache Flink Community Updates
Will share the current community updates for Apache Flink including releases, community growth, new features, adoptions, and meetups happening.
• Google Cloud Platform and Apache Flink
Using Apache Flink with Google Cloud Dataproc (Google's managed Hadoop MapReduce, Spark, Pig, and Hive service) and Cloud Bigtable (Google's high performance NoSQL database).
• Building Interactive Big Apps on Flink & Spark using DDF (Distributed DataFrame) - http://ddf.io
Enterprise users today demand the ability to glean insights from their disparate data spread across varied transactional and analytics sources; hence, analytics application developers need the ability to connect to varied data & compute engines such as Spark, Flink, Cassandra, etc.
A key pain point for developers is the lack of a uniform API across data & compute engines, a limitation which adversely impacts developer productivity, while also restricting dataflow across different engines. DDF (Distributed DataFrame) is a simple but powerful API above and across multiple engines. Using DDF, developers reap significant benefits including (1) a unified and highly productive API for data/compute access, (2) the ability to process data at-source, bypassing the absolute requirement for a Hadoop data lake, and (3) future-proofing against rapidly shifting economics of specific data engines.
To date, DDF has been implemented on Spark, Flink, and other engines. In this talk we demonstrate, for the first time, a business-analyst-friendly realtime data exploration and visualization system working directly with Flink. We will show how a business user can enter natural-language questions of his/her data and get real-time answers from Flink, in the form of visual charts and tables. We’ll also show interaction with the DDF-on-Flink API at the developer level, and share our experience on the challenges and lessons learned in realizing this vision on Flink, and compare and contrast that with the same experience on Spark.
Speaker Bios
• Henry Saputra
Henry is a PMC member for the Apache Flink and also member of the Apache Software Foundation. Henry also member of Apache Incubator PMC and former mentor of Apache Flink while still in incubation.
Currently Henry is working on distributed systems and big data application platforms.
• Christopher Nguyen, Founder and CEO, Adatao
Christopher is the CEO & co-founder of Adatao. Previously, he served as engineering director of Google Apps and co-founded two other successful startups.
As a professor, he co-founded the Computer Engineering program at HKUST.
He earned his BS degree from University of California Berkeley summa cum laude and a Ph.D. from Stanford, where he created the first standard-encoding Vietnamese software suite, authored RFC 1456, and contributed to Unicode 1.1.
He is a co-creator of the open-source Distributed DataFrame project http://ddf.io.
• Rohit Rai, Founder and CEO of Tuplejump
Rohit is the founder and CEO of Tuplejump, Inc. and oversees the research and product development operations of the company.
He is author of the book, Real-Time Web Application Development. He is the creator of Calliope, the first connector for Cassandra & Spark, play-yeoman sbt plugin and has been a contributor to many open source projects.
He is an expert at scala, akka, spark, cassandra and distributed systems in general. Over the past 10 years, he has helped several companies including a few fortune 100, establish their (big) data analytics strategy, infrastructure and the required solutions.
• Les Vogel has been a Software engineer for over 40 years and worked for Google in developer relations for four. He worked with Apple, TVA, Motorola, Boeing, Ashton-Tate and many others.
He's most well known for AirPort wireless networking, but has worked on Flood management, Solar housing, handwriting recognition, a spreadsheet and written a couple OS's.

Evening with Google Cloud, Distributed DataFrame, and Apache Flink