What we're about

[b]Youtube channel for archived talk videos[/b]

SF Big Analytics Youtube Channel (https://www.youtube.com/channel/UC9MOf69YTbmDKqr22l7aigA)

The SF Big Analytics meetup focuses on all aspects of the big data analytics, from data ETL, feature generation, AI/machine learning theory, algorithm and implementation to technologies and infrastructures associated with big data analytics. Topics include AI/Machine Learning, data processing and monitoring (Hadoop, Spark, Hive, Streaming (Flink, Apex, Kafka etc)), data visualization, data science lifecycle etc. This meetup covers the full range of the big data analytics topics and data mining pipelines.

We try to provide high quality talks for each meetup, here are some of the policies related to talks we have been following in last few years

-- Technical focused

-- No marketing

-- No product promotion (unless it is open sourced project)

-- No high level business talks (unless it is from highly respected leaders)

Upcoming events (1)

AirBnB/Lyft/Google: End-to-End ML Platform, Airflow and More

Google SF office Info: 345 Spear Street, on the 7th Floor at the room Batgirl. Enter via the West elevator lobby to Google Office. Recommend Hills Plaza Garage for Parking, as it is right underneath the SPE building and costs $10 per vehicle after 5:00PM. It's open until 11:00 PM. Agenda: 6 - 6:30 pm Networking + food 6:30 pm -- 6:40 pm Introduction 6:40 pm -- 7:15 pm Talk 1 (AirBnb) + QA 7:15 pm -- 7:50 pm Talk 2 (Lyft) + QA 7:50 pm -- 8:35 pm Talk 3 (Google) + QA 8:40 pm -- 9 pm Networking + Closing Talk 1: Bighead: Airbnb's end-to-end machine learning platform Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Many ML Platforms cover data collection, feature engineering, training, deploying, productionalization, and monitoring but few, if any, do all of the above seamlessly. Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python, Spark, and Kubernetes. The components include a lifecycle management service, an offline training and inference engine, an online inference service, a prototyping environment, and a Docker image customization tool. Each component can be used individually. In addition, Bighead includes a unified model building API that smoothly integrates popular libraries including TensorFlow, XGBoost, and PyTorch. Each model is reproducible and iterable through standardization of data collection and transformation, model training environments, and production deployment. This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adopted in Airbnb and we have variety of models running in production. We plan to open source Bighead to allow the wider community to benefit from our work. Speaker: Andrew Hoh Andrew Hoh is the Product Manager for the ML Infrastructure and Applied ML teams at Airbnb. Previously, he has spent time building and growing Microsoft Azure's NoSQL distributed database. He holds a degree in computer science from Dartmouth College. Talk2: Apache Airflow At Lyft Lyft has been one of the first companies to adopt Airflow in production. Today Airflow powers many Lyft use cases: from powering executive dashboards to metrics aggregation, to derived data generation, to machine learning feature computation, etc. In this talk, we will first cover how we operate Airflow at Lyft in production, then we will talk about the improvement we have done for Airflow to boost internal ETL development productivity. Lastly, we will talk about some of our open source contributions which could benefit the whole community. Speaker: Tao Feng, Tao Feng is a software engineer at Lyft data platform team working on various data products. Tao is also a committer and PMC on Apache Airflow. Previously, Tao worked at Linkedin and oracle on data infrastructure, tooling and performance. Talk 3: TBD (Google)

Photos (347)