Past Meetup

Fluentd & Docker, Turbocharging CDAP, and Building a Data Science Platform

This Meetup is past

151 people went


For the next Big Data Application Meetup we have great sets of speakers to do the great evening of tell and learn session.

The talks will be focusing on building data pipeline platform including introduction to Fluentd and Docker integration.

Please do join us on the evening to share and learn.

Schedule of events

6:00pm-6:30pm Door open and socialize

6:30pm-7:00pm Talk #1: Fluentd and Docker Integration by John Hammink, Treasure Data

7:00pm-7:30pm Talk #2: Turbocharging CDAP Applications With Ampool, by Milind Bhandarkar, Ampool

7:30pm-8:00pm Talk #3: 5 Tips for Building a Data Science Platform by David Chaiken, Altiscale


Talk #1: Fluentd and Docker Integration by John Hammink, Treasure Data

New to Docker? Application logging? As with any production application framework, logging is an essential piece. But traditionally logging in complex architectures has been a mess.

Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data. Based on open source software and built by the team that went on to found Treasure Data, Fluentd works by becoming a common interface between logging sources and log destinations by using community-developed input and output plugins around a common core. For more info, read here: ( .

Learn and see firsthand how Fluentd and Docker integration work together to simplify the logging story for your container-based apps.

This presentation Includes a live demo. We'll:
1.) Log into a docker instance and build a contaner;
2.) Configure Fluentd as our logging driver within our container;
3.) Send events through Fluentd and route them to Treasure Data;
4.) Query the Treasure Data interface to view our events;
5.) Discuss where we can go from here.

Additionally, we'll look at ways to route our queried data to other systems, including Amazon Redshift, Postgres, S3, Riak, Tableau and more.

Talk #2: Turbocharging CDAP Applications With Ampool, by Milind Bhandarkar, Ampool

In this talk, we will describe how Ampool takes advantage of CDAP’s extensibility and enables fast analytical data pipelines. CDAP provides consistent developer interfaces to both processing frameworks, and storage abstractions. This allows developers to build Hadoop solutions using multiple processing paradigms, with uniform capabilities to build, test, deploy & manage across multiple environments. More importantly, CDAP provides core data abstractions that allow applications to be decoupled from storage engines, such as HDFS & HBase. We will outline Ampool’s vision of the modern data architecture, and how CDAP & Ampool are working together on realizing it. We will provide details about extending CDAP’s data abstractions to allow unprecedented speeds for Hadoop solutions built with CDAP.

Talk #3: 5 Tips for Building a Data Science Platform by David Chaiken, Altiscale

Data scientists need to be advocates for a self-serve, Hadoop-based environment that is productive, reliable and a joy to use. This talk presents five tips to make your Big Data environment successful, and shows how best-of-breed tools like Spark fit together with the components of the Hadoop ecosystem.

These tips are valid whether the environment is built on premises, on top of infrastructure as a service, or deployed as a service. That said, the talk concludes by pointing out that buying the underlying platform as a service is the fastest path to deriving business value from big data.

Speaker Bios

• John Hammink ( became chief evangelist for Treasure Data in the course of looking for an easy data analytics solution to support his work as a digital artist. Previously an engineer for F-Secure, Nokia, Mozilla, and early Skype, he's travelling the world, blogging, demoing and teaching what he's learning as he learns it.

• Milind Bhandarkar is the founder & CEO, Ampool, Inc. Milind was the founding member of the team at Yahoo! that took Apache Hadoop from 20-node prototype to datacenter-scale production system, and has been contributing and working with Hadoop since version 0.1.0. He started the Yahoo! Grid solutions team focused on training, consulting, and supporting hundreds of new migrants to Hadoop. Parallel programming languages and paradigms has been his area of focus for over 20 years. He worked at the Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo! and Linkedin. Milind was the Chief Architect at Greenplum Labs, a division of EMC. Most recently, Milind was the Chief Scientist at Pivotal, a spinoff by EMC & VMware.

Milind holds his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign

• David Chaiken is the CTO of Altiscale, which sells a Big Data platform that just works. He comes to Altiscale from Yahoo, where he served as Chief Architect. At Yahoo, he led teams building consumer advertising and media systems with Apache Hadoop at their core. Over his career, David has also built voice-search products for consumers, mobile applications for enterprises, network management systems, project management software, large-scale multiprocessor architectures, a tablet computer, and several information appliances. David earned a ScB in Chemistry and Mathematics from Brown University and a PhD in Electrical Engineering and Computer Science from MIT.

Arrival and Parking

Cask HQ is a few minutes walk from the California Avenue Caltrain Station.

Also, Cask HQ has its own parking lot, but it will certainly not accommodate all guests. Please use parking lots available nearby: