Past Meetup

Big Data App Meetup 2/8 - Cask Market, Airbnb Dataportal, & Agile Data Science!

This Meetup is past

168 people went

Details

Shoutout to Ampool (http://www.ampool.io/) and Cask (http://cask.co/) for kindly sponsoring and hosting this meetup!

Cask will also be giving away a Raspberry Pi 3 Starter Kit. Enter the raffle on the day of the event for a chance to win.

AGENDA

6:00 - 6:30 - Socialize over food and beer(s)

6:30 - 8:00 - Talks

TALKS

Talk #1: Cask Market - Big Data's App Store - by Albert Shau, Cask

Talk #2: Scaling tribal knowledge - by Chris Williams, John Bodley, Airbnb

Talk #3: Agile Data Science: Full-Stack Analytics App Dev OR Building an Aviation Data Explorer - by Russell Jurney

ABSTRACTS

Talk #1: Cask Market - Big Data's App Store - by Albert Shau, Cask

Cask Market is Cask's "Big Data App Store", that supports push-button deployment of pre-built applications across various Hadoop distros. Cask hosts a public Market, but enterprises can create theirs to host both internal and external apps that can be easily discovered and installed by their users. The talk will discuss use cases around Cask Market, as well as the platform and technology that enables it.

Talk #2: Scaling tribal knowledge - by Chris Williams, John Bodley, Airbnb

Airbnb has numerous Hive tables, dashboards, charts, posts etc. and wanted to develop a tool for users to explore what data resources are available, understand context, and discover related content - with the goal to democratize data at Airbnb. The Airbnb Dataportal is an internal data resource search engine which connects users with visualizations, tools, curated data, and metrics to do their job more effectively. It aids with data exploration, discovery, trust, and empowers Airbnb employees to be "data informed" in their decision making, and encourages a culture of self-service. Technologies used include Hive, MySQL, Neo4j, GraphAware, Elasticsearch (back end), Flask (API) and React, Redux, and Aphrodite (front end). One of the key learnings of this session will be how high-growth companies such as Airbnb find ways to scale tribal knowledge around their data sources.

Talk #3: Agile Data Science: Full-Stack Analytics App Dev OR Building an Aviation Data Explorer - by Russell Jurney

Agile Data Science 2.0 (O'Reilly 2017) defines a methodology and a software stack with which to apply the methods. *The methodology* seeks to deliver data products in short sprints by going meta and putting the focus on the applied research process itself. *The stack* is but an example of one meeting the requirements that it be utterly scalable and utterly efficient in use by application developers as well as data engineers. It includes everything needed to build a full-blown predictive system: Apache Spark, Apache Kafka, Apache Incubating Airflow, MongoDB, ElasticSearch, Apache Parquet, Python/Flask, JQuery. This talk will cover the full lifecycle of large data application development and will show how to use lessons from agile software engineering to apply data science using this full-stack to build better analytics applications.

SPEAKER BIOS

• Albert Shau is a software engineer at Cask, where he is working to simplify data application development. Prior to Cask, he worked on search systems at Box, and recommendation systems at Yahoo.

• Chris Williams is a Data Visualization Engineer at Airbnb working on visualizations, frameworks, and data tools. Prior to Airbnb he worked at Interana, a business analytics startup, and studied genomics at UCSF.

• John Bodley is a Software Engineer at Airbnb working on developing data tools. Prior to Airbnb he worked as a Data Scientist at Facebook, and studied Computational Mathematics at Stanford.

• Russell Jurney is principal consultant at Data Syndrome, a product analytics consultancy dedicated to advancing the adoption of the development methodology Agile Data Science, as outlined in the book Agile Data Science 2.0. He has worked as a data scientist building data products for over a decade, starting in interactive web visualization and then segwaying towards data products, machine learning and artificial intelligence at companies such as Ning, LinkedIn and Hortonworks. Russell is a self taught visualization software engineer, data engineer, data scientist, writer and most recently, I’m becoming a teacher.

ARRIVAL AND PARKING

Cask HQ is a few minutes walk from the California Avenue Caltrain Station.

Also, Cask HQ has its own parking lot, but it will certainly not accommodate all guests. Please use parking lots available nearby: