• Scale By the Bay 2020 begins on Thursday!

    Online event

    Folks -- the first-ever online SBTB is this week. https://scale.bythebay.io/ The online ticket is a already a low $125, and we give you 20% off that with BAYAREAAI20: https://scale.bythebay.io/register Some highlights: Martin Odersky opens SBTB on 11/12 with Countdown to 3! Matei Zaharia and Anima Anandkumar keynote Li Haoyi, Getting Things Done in the Scala REPL Julien Truffault, Monocle 3: a peek into the future Adam Warski, Project Loom? Better Futures? What’s next for JVM concurrent programming Prof. Bayer, the co-creator of B-trees, presents C-chain: the Integration of 5G and real time Blockchain Shameera Rathnayaka of Spotify, Materialize Typeclasses with Magnolia Justin Heyes-Jones, YoppWorks, Applicative: The Origin Story Steve Cosenza, Twitter, Rebuilding Twitter’s public API Greg Kesler, Intuit, Query Planning in GraphQL Lei Gao, Workday Goku Flow: A Self-Service Data Pipeline Builder Prashant Sharma, IBM, Apache Spark meets FIPS standard Dean Wampler, Domino Data Labs, Ray: A System for High-performance, Distributed Machine Learning Applications Dirk Slama, VP Co-Innovation, Bosch, AIoT: Why now? And How To? Antje Barth, AWS, Put Your Machine Learning on Autopilot We have three debate panels we are (in)famous for: Will AI Kill Programming? Were Microservices a Huge Mistake? Programming Languages in the Era of the Cloud See the full program at https://www.scale.bythebay.io/schedule, and register!

  • Reactive Summit + SBTB 2020 CFP Open through July 31

    Online event

    Scale By the Bay is now following Reactive Summit, the conference of the Reactive Foundation, a Linux Foundation project focused on cloud-native applications. We're happy to report that Martin Odersky and Matei Zaharia, the creators of Scala and Spark, will keynote SBTB 2020, among other awesome keynote speakers. The general CFP is now extended through July 31. There's still time to submit a talk: https://scale.bythebay.io! Looking forward to more great speakers to join us in November.

  • How I turned my PhD in NLP into a Y Combinator-backed Startup

    In this year-later update since Alyona had given the original talk at Bay Area NLP, she will share how her startup, Thematic, is growing. Alyona Medelyan will be sharing her story of discovering NLP while studying linguistics and pivoting her degree into CS and Machine Learning. She will explain the ML framework of the keyword extraction algorithm KEA, powered by one of the first ML libraries WEKA. Having co-authored the most-cited study of using Wikipedia in NLP research, she'll share two of her own projects that use it, including Maui, which ended up getting onto the radar of NATO. Alyona's passion lies in commercialization of research ideas. On her third attempt she managed to start a company that uses the ideas from her PhD to solve a problem most companies struggle with today: understanding the needs of their customers. She will share how she co-founded Thematic and the what was required to turn it into a successful venture, as well as how it grew since then. Alyona was instrumental in helping events By the Bay be more diverse since Text By the Bay held in 2015, reaching out from New Zealand (where she's back now due to the global pandemic). We want to thank ScaledML 2020 for hosting Alyona, where we reconnected, and Rob Munro, the organizer of Bay Area NLP, for the original meetup with Alyona last year.

    4
  • Hugging Face By the Bay!

    Online event

    We're happy to host Clement Delangue with a talk about Hugging Face, whose Transformers are taking the NLP scene by storm. We'll update the meetup with more details when we get them. The meetup is online, and we cap our Zoom at 100. Please manage your RSVP responsibly. Clement is the co-founder and CEO of Hugging Face, the leading NLP startup, based in NYC and Paris, that raised more than $20M from prominent investors. The company created Transformers, the fastest growing open-source library enabling thousands of companies to leverage natural language processing. Prior to Hugging Face, Clement started his machine learning journey at Moodstocks, a start up that built machine learning for computer vision and got acquired by Google.

    4
  • To Kernels and Back Again — A study of Empirical Phenomena in Machine Learning

    As machine learning algorithms increasingly pervade our everyday life, it becomes imperative that we better understand the algorithms we deploy. In this talk I will present my work that uses classical kernel methods to study empirical phenomena in machine learning. I will first present structured kernel constructions that provide competitive performance on a range of scientific tasks ranging from computational biology to heliophysics, but fall short on standard machine learning benchmarks such as Cifar-10 and ImageNet. Then I take a substantial detour to understand whether the high predictive performance of neural networks on standard benchmarks is fundamental, or simply due to overfitting on test sets. This leads to a line of work involving constructing novel test sets for Cifar-10 and ImageNet and precisely measuring human and model performance on these datasets. Finally I return to improve my structured kernel constructions to achieve significantly higher performance on standard machine learning benchmarks. Bio: Vaishaal Shankar is a final year PhD student working with Ben Recht at UC Berkeley. He broadly works on experimental analysis of phenomena in machine learning. A majority of his research has revolved around understanding the fundamental limitations of deep neural networks and their connection to classical kernel methods. He will be joining a special projects team at Amazon in Fall 2020.

    1
  • First Choice Scale By the Bay 2020 CFP is now Open through May 31

    The 8th Annual Scale By the Bay developer conference will be held either online or in person in November, 2020. The CFP is now open at https://scale.bythebay.io. The First Choice CFP will run until May 31st, when 1/2 of the program will be selected. The next 1/4 will be selected by June 30th, and so on. The bar will move higher in each iteration, allowing for the strongest talks to still join. Please submit your best talk early, and hope to see you on the program!

  • Featuring PyTorch with FB, Autodesk and AWS

    On Feb 11, Bay Area AI Meetup will be featuring PyTorch with speakers from Facebook, Autodesk and AWS. Talk 1: "PyTorch 1.4 Release Update" by Facebook - Speaker: Brad Heintz is a partner engineer at Facebook working with PyTorch, the open source framework for Deep Learning in research and enterprise production. He's been building software, professionally and for fun, for forty years. Talk 2: “Doing NLP with Transformers” by Autodesk - Summary: Since their introduction three years ago, transformers have had an enormous impact on the state of the art for many natural language-based tasks. One interesting aspect of transformer-based systems is that they include large, pre-trained language models which are freely available. This talk will discuss the transformer, and how to fine-tune transformers for use on Sagemaker instances for different applications, including some of the ways they are used in Autodesk’s Digital Help organization. - Speaker: Alex O'Connor is Lead Data Scientist for the DPE-DHX-DS Data Science Team at Autodesk. Alex obtained his PhD in Computer Science from Trinity College Dublin in 2010. From there he went on to be a Funded Investigator at the SFI ADAPT Global Centre for Digital Content, and Lecturer at Dublin City University. He has published in research areas including the semantic web, natural language processing, digital humanities, and adaptive hypermedia. Before joining Autodesk, Alex was Director of Research at a Fintech startup in Menlo Park. Talk 3: "Less Code, More Intelligence with PyTorch on SageMaker" by AWS - Summary: Tired of managing your own EC2 instances? Wish you could just write your Torch model and walk away? In this session we’ll dive deep into Amazon SageMaker and understand how it helps both senior data scientists and beginning developers increase their impact by leveraging a managed service. We will focus on common design patterns for developing, training, and deploying PyTorch models. - Speaker: Emily Webber is a Machine Learning Solutions Architect at AWS. She has been leading data science projects for many years, piloting the application of machine learning into social media violence detection, economic policy evaluation, computer vision, reinforcement learning, IOT, drone, and robotic design. She is a keynote speaker at Amazon Web Services, and has lead hundreds of workshops for customers in every stage of their cloud journey. Her direct contributions have led to countless innovations on the AWS machine learning stack, and many of her customers are public about their appreciation for Amazon SageMaker. Previously she worked as a solutions architect for an explainable AI start-up in Chicago and as data scientist at the Federal Reserve Bank of Chicago.

    3
  • Starting 2020 at Microsoft Reactor: Making Apache Spark Better with Delta Lake

    We’re kicking off the year at our new partner venue, Microsoft Reactor! Apache Spark™ is the dominant processing framework for big data. Delta Lake adds reliability to Spark so your analytics and machine learning initiatives have ready access to quality, reliable data. This session covers the use of Delta Lake to enhance data reliability for Spark environments. Topics: The role of Apache Spark in big data processing Use of data lakes as an important part of the data architecture Data lake reliability challenges How Delta Lake helps provide reliable data for Spark processing Specific improvements that Delta Lake adds The ease of adopting Delta Lake for powering your data lake Speaker: Chris Hoshino-Fish is a Solutions Architect at Databricks. Chris is an active member of the Performance Subject Matter Expert group and a former Principal Consultant focused on Data Engineering, working with several Fortune 500 Databricks customers. Prior to Databricks, Chris worked for an adtech company as a data engineer managing pipelines using Apache Spark for 3.5 years. Chris has a B.A. in Computational Mathematics from the University of California, Santa Cruz. Lightning Talks -- we'll open the floor for the rest of the meetup to the lightning talks proposed in the comments!

    9
  • [External Registration][Conference] Scale By the Bay 2019, November 13-15

    Oakland Scottish Rite Center

    This year Scale By the Bay (https://scale.bythebay.io) runs for only two days. But we packed an incredible 70 sessions in these two days! We start with a hot breakfast and excellent coffee. Coffee never ends -- continuous uninterruptible supply of great coffee is a hallmark of every conference By the Bay. Each morning there is a keynote where we all gather as a community, and a panel closing each conference day where we all get together again before the happy hour -- also every day. The heart of the conference are its iconic four tracks: Thoughtful Software Engineering, Service Architectures, End-to-end Data Pipelines up to ML/AI, which we historically call Functional, Reactive, and Data. That's three right? The fourth is the hallway track -- and we're legendary for it! The core theme this year is Distributed Systems. Joe Beda, Principal Engineer at VMware and the co-creator of Kubernetes, keynotes one day, and Heather Miller, Professor at CMU and former Director of the Scala Center, keynotes the other. We have multiple talks considering cloud deployments on Kubernetes in concert with other systems, such as Kafka, Spark, and Flink. We cover important new directions with Unison, and inherent issues such as Change Data Capture from Disney Streaming. We will learn about the new GIS features for Google BigQuery from their author, about the Databricks Data Lake approach, and infrastructure as code at Target. Our "reactive" track started as reactive microservice architectures but came to encompass all kinds of systems, as well as data manipulation techniques. We'll see how Lyft is enabling real-time queries with Apache Kafka, Flink and Druid. We'll hear about the lessons learned developing and running Netty from its creator. We see how Serverless is developed at Google. Machine Learning and AI are only as scalable as the data pipeline feeding them. Moreover, you need to ensure your data is typesafe and your predictions are based on the data whose integrity or even privacy is provable. This year, we have three talks on Swift for TensorFlow, including from the original Google team developing it, as well as Coinbase and Quarkworks. We hear from Sony Entertainment on near real-time, low latency predictions, and many many other leaders. And we'll uphold the rigorous and thoughtful software engineering that is underpinning of every system scalable in time and tech space -- a system that can deliver but also grow with companies and their people. We'll hear from Comcast and Netflix on human-centric software engineering and ML organizations. We'll hear about community-first Open-Source approaches. We'll see how F# invigorates .Net ecosystem with functional approach, including JavaScript apps, and how Scala with React is doing the same for the full-stack development on the JVM. We'll hear about Rust, Haskell, Scala, Java, Python, F# and other ecosystems used for quality development and production deployment. We'll see how the sausage is made at JetBrains to power our IDEs. We'll dig deeper into GraalVM with Oracle and Twitter, as well as Scala Native. We pioneered GraphQL at Scale By the Bay three years ago when almost nobody heard about it. Furthermore, our focus was not on the frontend alone but on middleware usage of GraphQL. This year Nick Schrock, a co-creator of GraphQL, joins us. The day before the conference, we run a bespoke, all-day, hands-on training that we build specifically for SBTB. This year, it's Portable Serverless Workshop with Ryan Knight and James Ward. James is now at GCP and has a driver seat going to the serverless future. You'll go home with a complete serverless backend under your belt! All in all, we'll have a lot of fun, pack a year of learning in just two or free days, and again experience the magic that makes Scale By the Bay a legend! Reserve your Early Bird seat soon at https://scale.bythebay.io.

  • MODEL VERSIONING: WHY, WHEN, AND HOW

    Location visible to members

    Note: model versioning and deployment is an integral part of the https://scale.bythebay.io data pipelines track. Join us very soon, in mid-November using the code MEETBAYAREAAI15 to get 15% off all passes, including the bespoke Serverless workshop with Google! A special discount for Scale By the Bay will be revealed at the event for actual attendees only. We have two talks. (1) MODEL VERSIONING: WHY, WHEN, AND HOW Models are the new code. While machine learning models are increasingly being used to make critical product and business decisions, the process of developing and deploying ML models remain ad-hoc. In the “wild-west” of data science and ML tools, versioning, management, and deployment of models are massive hurdles in making ML efforts successful. As creators of ModelDB, an open-source model management solution developed at MIT CSAIL, we have helped manage and deploy a host of models ranging from cutting-edge deep learning models to traditional ML models in finance. In each of these applications, we have found that the key to enabling production ML is an often-overlooked but critical step: model versioning. Without a means to uniquely identify, reproduce, or rollback a model, production ML pipelines remain brittle and unreliable. In this talk, we draw upon our experience with ModelDB and Verta to present best practices and tools for model versioning and how having a robust versioning solution (akin to Git for code) can streamlining DS/ML, enable rapid deployment, and ensure high quality of deployed ML models. Speakers: Manasi Vartak, CEO, Verta.ai, Conrado Miranda, CTO, Verta.ai Manasi Vartak is the founder and CEO of Verta.ai (www.verta.ai), an MIT-spinoff building software to enable high-velocity machine learning. Manasi previously worked on deep learning for content recommendation as part of the feed-ranking team at Twitter and dynamic ad-targeting at Google. Conrado Miranda is the CTO at Verta.AI. Conrado has a PhD in Machine Learning and a focus on building platforms for AI. He was the tech lead for the Deep Learning platform at Twitter’s Cortex, where he designed and led the implementation of TensorFlow for model development and PySpark for data analysis and engineering. He also led efforts on NVIDIA’s self-driving car initiative, including the Machine Learning platform, large scale inference for the Drive stack, and build and CI for Deep Learning models. (2) Model Monitoring in Production Machine Learning models continuously discover new data patterns in production they have never seen during training and testing iterations. The best offline experiment can lose in production. The most accurate model is not always tolerant to a minor data drift or adversarial input. Neither prodops, data science or engineering teams are skilled to detect, monitor and debug model degradation behaviour. Real mission critical AI systems require advanced monitoring and model observability ecosystem which enables continuous and reliable delivery of machine learning models into production. Common production incidents include: - Data anomalies - Data drifts, new data, wrong features - Vulnerability issues, adversarial attacks - Concept drifts, new concepts, expected model degradation - Domain drift - Biased Training set In this demo based talk we discuss algorithms for monitoring text and image use cases as well as for classical tabular datasets. Demo part will cover the full cycle of machine learning model in production: Model training and deployment with Kubeflow pipelines Production traffic simulation Model monitoring metrics configuration Data drift detection Drift exploration and monitoring metadata mining New training dataset generation from production feature store Model retraining and redeployment Stepan Pushkarev is a CTO of Hydrosphere.io - Model Management platform and co-founder of Provectus - an AI Solutions provider and consultancy, a parent company of Hydrosphere.io.

    2