What we're about
Upcoming events (2)
Join us for virtual tech talks at Data + AI Meetup about MLflow Integration with PyCaret and PyTorch sponsored by the Databricks MLflow Team. It will be simultaneously broadcasted live on YouTube and LinkedIn.
9:00 - 9:05 AM: Introduction & Announcements
9:05 - 9:35 AM: Machine Learning made easy with PyCaret and MLflow
9:40 - 10:10 AM: Reproducible AI using MLflow and PyTorch
Title: Machine Learning made easy with PyCaret and MLfLow
Presenter: Moez Ali
Abstract: PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of environment. This talk is a practical demo using PyCaret in your existing workflows and supercharges your data science team's productivity.
Bio: Moez Ali is a seasoned data scientist with a decade of experience working with data in healthcare, education, and professional consulting. He is an active member of the open source community, and he created and open-sourced PyCaret in 2020.
Title: Reproducible AI using MLflow and PyTorch
Presenter: Geeta Chauhan
Abstract: Model reproducibility is becoming the next frontier for successful AI models building and deployments for both Research and Production scenarios. In this talk, we will show you how to build reproducible AI models and workflows using PyTorch and MLflow that can be shared across your teams, with traceability and speed up collaboration for AI projects.
Bio: Geeta Chauhan leads AI Partnership Engineering at Facebook AI with expertise in building resilient, anti-fragile, large-scale distributed platforms for startups and Fortune 500s. As a core member of the PyTorch team, she leads TorchServe and many partner collaborations for building a solid PyTorch ecosystem and community.
Join us for the final session in a four part series with Salesforce Engineering.
Abstract: As we build our Engagement Delta Lake on Databricks Workspace, one of the challenges is how to automate the integration testing of our Spark jobs in the CI/CD pipeline. We came up with two designs to tackle the challenge : Namespace Deployment and Scenario Based Testing. In this talk, we will discuss the rationale and implementations of the two designs.
Part 1: Engagement Activity Delta Lake Recording: https://youtu.be/a7_I1Qi1LoU
Part 2: Boost Delta Lake Performance with Data Skipping and Z-Order Recording: TBD
Part 3: Global Synchronousness and Ordering in Delta Lake
Zhidong Ke, Software Engineer PMTS, Salesforce
Zhidong is passionate in designing distributed systems, real-time/batch data processing and building applications.
Yifeng Liu, Software Engineer LMTS, Salesforce
Yifeng is a software engineer who has extensive experience in big data processing and distributed system, and interested in high volume, high complexity, low latency data pipeline and framework building.
Title: Software Engineering PMTS, Salesforce
Aaron is an experienced software engineering leader with interests and areas of focus in engineering secure, fault-tolerant, high volume systems built on micro services.
Heng Zhang, Software Engineering PMTS, Salesforce
Heng is a software engineer who is interested and specialized in micro services, distributed systems and big data.