• Scale By the Bay 2021

    Needs a location

    Register for SBTB at https://scale.bythebay.io using the code BAYYAREAAI20 for 20% off.

    A few years ago, at AI By the Bay, we predicted that every application will soon be an AI application. Today it is the case practically everywhere.

    On the one hand, we see a broad deployment of AI, and the production issues related to it: reliability, ease of use, integration in the end to end data pipelines.

    On the other hand, the high-level question of AI ethics didn't go anywhere -- rather, they are elevated into practical question of explainability, bias, and AI safety.

    Clement Delangue, the founder of Huggingface, keynotes SBTB 2021 on the new approaches to do AI and NLP that both changes the way we do things and makes ethics a core part.
    Alena Medelyan, the founder of Thematic, another great NLP success story, keynotes on how to do your own AI startup properly.
    Ricardo Baeza-Yates, one of the founders of modern Information Retrieval, illuminates Responsible AI.

    This year features a strong reproducibility review with the SAME project from Microsoft's MLOps leader and returning SBTB speaker David Aronchik. Another SBTB and Bay Area AI veteran, Lukas Biewald, is now a founder of Weight and Biases and presents his own Reproducible ML view.

    The productionization of AI is evident in the rise of platforms and tools around its deployment. Our friends from DVC.org and Iterative.ai enjoy widespread acceptance. With Milecia McGregor of Iterative.ai we learn how to manage experiments with it.

    Databases are at the root of all data, and we have a great review of what works. Uber and Databricks show how SQL is reborn with Spark. Rob Hedgepeth from MariaDB tells us not to call it a SQL comeback. We get to rethink scalable ML with the Spark ecosystem and Adi Polak of Microsoft.

    We show how to build and operate AI platforms in excruciating detail, soup to nuts. We build a complete one from scratch with Aporia, packed with tools like DVC, MLflow, and GitHub Actions. We see how to build containerized and serverless ML with NuWorks. We learn from real-world platform at Workday, a company that returns to share its experience.

    Reserve your seat at https://scale.bythebay.io with BAYYAREAAI20

  • Faster and Cheaper Training for Large Models

    Online event

    ML engineering is a key focus of Scale By the Bay 2021, October 28-29: Register at https://scale.bythebay.io to attend online.


    State-of-the-art DNN model sizes in many domains are growing faster than hardware throughput, making cutting-edge ML less accessible. In this talk, I’ll present two broad lines of research from my group at Stanford to make large-scale ML accessible. First, we can try to train existing DNN models more cheaply through new algorithmic schemes such as pipeline and hybrid parallelism, as we demonstrated in the PipeDream and FlexFlow projects. These approaches are now used in some of the most optimized large-scale training codebases, such as NVIDIA’s Megatron-LM, which is able to train 1 trillion parameter models on 3000 GPUs at 52% of peak hardware efficiency. The second approach is to change large ML models themselves to be more hardware friendly. In this space, I’ll present our work on retrieval-based NLP models, such as ColBERT, ColBERT-QA and Baleen, that use a small DNN to *search* through a corpus of documents for relevant knowledge when they do inference (e.g., the right wikipedia pages to answer a science question) as opposed to memorizing all their knowledge in trillions of parameters. Our retrieval-based models have set new SotA results in multiple hard NLP problems while running as much as 1000x faster than large language models such as GPT3, and providing other advantages as well, such as easier interpretation and support for instantaneous updates of the model’s knowledge without retraining (just by replacing some of its indexed documents). Our work on both lines of research is open source.

    Speaker: Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley, and has worked on other widely used open source data analytics and AI software including MLflow and Delta Lake. At Stanford, he is a co-PI of the DAWN lab focusing on infrastructure for machine learning. Matei’s research work was recognized through the 2014 ACM Doctoral Dissertation Award, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the US government to early-career scientists and engineers.

  • Reproducible Machine Learning at Scale

    Online event

    Submit a talk to Scale By the Bay 2021! CFP is open until June 30:


    How can we support effective, reproducible, and explainable deep learning and coordination across practitioners? In this talk, Lukas Biewald will share best practices for conducting, debugging, and sharing deep learning experiments at scale. He will talk through how some of the best tech companies in the world use the Weights & Biases platform for managing datasets, debugging models, versioning training/evaluation recipes, extracting insights, and storing all the crucial details needed to make their models reproducible and their research collaborative.

    Speaker: Lukas Biewald is a co-founder and CEO of Weights & Biases, an experiment tracking platform for deep learning. In 2009, Lukas founded Figure Eight, formerly CrowdFlower. Lukas has dedicated his career optimizing ML workflows and teaching ML practitioners, making machine learning more accessible to all.

  • Scale By the Bay 2021 CFP extended until June 30

    Online event

    We invite all meetup members to submit talks to SBTB, as the CFP was extended until June 30.

    Scale By the Bay 2021 will be held online on October 28-29, 2021.
    Following the dramatic success of our revolutionary community setup for 2020, SBTB 2021 will also be produced by Konfy, an all-women tech company building events with love for developers, by developers. They created Scala Love, Java Love, Haskell Love, Data Love, and more.

    The CFP is open at https://scale.bythebay.io/cfp.

    Early bird registration is open too! It's our best deal ever (no code needed) and will be gone as those passes are gone.

    Check out previous years, all linked form the landing page, to get an idea for a talk, and give it your best shot!

  • Scale By the Bay 2021 CFP is now open until May 31/June 15!

    Online event

    Scale By the Bay 2021 returns online in October this year.


    A major independent conference for the Bay Area and the world, we're in our 9th year. Our defining characteristics are:

    -- deeply technical content accepted on merit
    -- data engineering for AI working together with software engineering and devops
    -- soup to nuts, high performance to distributed systems approach

    The CFP works in two stages as always:
    -- Submit first choice talk by May 31
    -- Submit a talk to be considered for the program by June 31.

    Early bird registration is also open!


  • KUDO for MLOps: Kubernetes Universal Declarative Operator

    Online event

    While the rise of Kubernetes has been meteoric, deployment of stateful services onto Kubernetes is still in its infancy. While tooling to build operators to handle the natural complexity that stateful distributed services bring, it requires deep expertise both in Kubernetes API development and the problem domain. The Kubernetes Universal Declarative Operator (KUDO) is a framework for rapidly building production-grade operators for complex, stateful services on Kubernetes.

    This talk will introduce operators and KUDO, and demonstrate two production ready operators on top of Kubernetes built with KUDO.

    Gerred Dillion, D2IQ, the creator of KUDO.

  • Scale By the Bay 2020 begins on Thursday!

    Online event

    Folks -- the first-ever online SBTB is this week.


    The online ticket is a already a low $125, and we give you 20% off that with BAYAREAAI20:


    Some highlights:

    Martin Odersky opens SBTB on 11/12 with

    Countdown to 3!

    Matei Zaharia and Anima Anandkumar keynote

    Li Haoyi, Getting Things Done in the Scala REPL

    Julien Truffault, Monocle 3: a peek into the future

    Adam Warski, Project Loom? Better Futures? What’s next for JVM concurrent programming

    Prof. Bayer, the co-creator of B-trees, presents C-chain: the Integration of 5G and real time Blockchain

    Shameera Rathnayaka of Spotify, Materialize Typeclasses with Magnolia

    Justin Heyes-Jones, YoppWorks, Applicative: The Origin Story

    Steve Cosenza, Twitter, Rebuilding Twitter’s public API

    Greg Kesler, Intuit, Query Planning in GraphQL

    Lei Gao, Workday Goku Flow: A Self-Service Data Pipeline Builder

    Prashant Sharma, IBM, Apache Spark meets FIPS standard

    Dean Wampler, Domino Data Labs, Ray: A System for High-performance, Distributed Machine Learning Applications

    Dirk Slama, VP Co-Innovation, Bosch, AIoT: Why now? And How To?

    Antje Barth, AWS, Put Your Machine Learning on Autopilot

    We have three debate panels we are (in)famous for:

    Will AI Kill Programming?

    Were Microservices a Huge Mistake?

    Programming Languages in the Era of the Cloud

    See the full program at https://www.scale.bythebay.io/schedule, and register!

  • Reactive Summit + SBTB 2020 CFP Open through July 31

    Online event

    Scale By the Bay is now following Reactive Summit, the conference of the Reactive Foundation, a Linux Foundation project focused on cloud-native applications.

    We're happy to report that Martin Odersky and Matei Zaharia, the creators of Scala and Spark, will keynote SBTB 2020, among other awesome keynote speakers.

    The general CFP is now extended through July 31.

    There's still time to submit a talk: https://scale.bythebay.io!
    Looking forward to more great speakers to join us in November.

  • How I turned my PhD in NLP into a Y Combinator-backed Startup

    Online event

    In this year-later update since Alyona had given the original talk at Bay Area NLP, she will share how her startup, Thematic, is growing.

    Alyona Medelyan will be sharing her story of discovering NLP while studying linguistics and pivoting her degree into CS and Machine Learning. She will explain the ML framework of the keyword extraction algorithm KEA, powered by one of the first ML libraries WEKA. Having co-authored the most-cited study of using Wikipedia in NLP research, she'll share two of her own projects that use it, including Maui, which ended up getting onto the radar of NATO.

    Alyona's passion lies in commercialization of research ideas. On her third attempt she managed to start a company that uses the ideas from her PhD to solve a problem most companies struggle with today: understanding the needs of their customers. She will share how she co-founded Thematic and the what was required to turn it into a successful venture, as well as how it grew since then.

    Alyona was instrumental in helping events By the Bay be more diverse since Text By the Bay held in 2015, reaching out from New Zealand (where she's back now due to the global pandemic). We want to thank ScaledML 2020 for hosting Alyona, where we reconnected, and Rob Munro, the organizer of Bay Area NLP, for the original meetup with Alyona last year.

  • Hugging Face By the Bay!

    Online event

    We're happy to host Clement Delangue with a talk about Hugging Face, whose Transformers are taking the NLP scene by storm.

    We'll update the meetup with more details when we get them.

    The meetup is online, and we cap our Zoom at 100. Please manage your RSVP responsibly.

    Clement is the co-founder and CEO of Hugging Face, the leading NLP startup, based in NYC and Paris, that raised more than $20M from prominent investors. The company created Transformers, the fastest growing open-source library enabling thousands of companies to leverage natural language processing. Prior to Hugging Face, Clement started his machine learning journey at Moodstocks, a start up that built machine learning for computer vision and got acquired by Google.