Skip to content

Details

The imperative for good data quality & observability is equally important to Machine Learning models as model performance optimization is to model tuning. These talks will cover aspects of both needs. In collaboration with WIBD (https://www.womeninbigdata.org/)

*REGISTER (for free) DAIS21: https://databricks.com/dataaisummit/north-america-2021

Talk One: The Rise of Data Observability: Building More Reliable Data Pipelines and ML Models at Scale by Barr Moses

Abstract: As Dr. Andrew Ng, founder of deeplearning.ai & Stanford University professor, recently argued, machine learning is only as reliable as the data that feeds it. As we move from a model-centric to a data-centric approach to MLOps, it’s important to treat data quality with the diligence it deserves: enter Data Observability. By applying best practices of DevOps and software engineering, Data Observability helps data and ML teams fully understand the health of data in their systems, as well as proactively monitor, alert, and triage issues as they arise in pipelines and models -- before they become a bigger problem downstream. In this talk, we’ll discuss why data observability and quality are so critical to machine learning and share specific tactics you can take to apply these approaches at scale.

Talk Two: AI Performance Boosting with Software Acceleration By Meena Arunachalam

Abstract: The power of optimizations of Machine learning software packages and frameworks is paramount in realizing the potential of hardware capability to create efficient and performant AI solutions. Data scientists, ML engineers and ML developers also want ease-of-use, familiar packages and APIs for their regular AI experimentation and prototyping. In this presentation, I will present and demonstrate how one can get speedup in performance with popular packages such as Scikit-learn, XGBoost & LightGBM others with easy drop in imports without changing your code through seamless lower-level optimizations. Popular ML algorithms on Xeon CPU architecture generations realize significant reduction in run-times over stock software & get faster prototyping on the CPUs on problems on a range of data sets. Vectorization, parallelism, cache reuse, memory efficiency, prefetching and other techniques in the lower level libraries such as oneDAL (Data Analytics Library), oneDNN (Deep neural Network library) as well as Framework optimizations are leveraged by the higher level software packages. Similarly, using Intel Tensorflow optimizations and Intel Pytroch extensions one can realize significant performance boost as well.

Speakers

** Shala Arshi is President & Co-Founder of Women In Big Data. She is an industry veteran with extensive technical, management, marketing, & business development experience. Shala has worked at Intel Corporation for over 30 years in variety of roles including product development, managing organizations, leading industry standards, building partnerships & ecosystems & has held roles at Intel Capital.

** Meena Arunachalam is an End-to-End AI Performance Architect in Machine Learning Performance at Intel Corp & works on software optimizations & enabling new AI features for compelling AI/Data pipelines & use cases with Intel CPUS & accelerators. She has authored 20+ peer-reviewed publications in IEEE & ACM conferences & journals with two book chapters in the High Performance Computing Pearls – Vol II & has four patents. She is part of the core team of Intel Women in Machine Learning, & Women in Intel. She currently serves as Director, WiBD Pacific Northwest Chapter.

** Barr Moses is CEO & Co-Founder of Monte Carlo, a data reliability company backed by Accel, GGV, Redpoint, & other top Silicon Valley investors. Previously, she was VP Customer Operations at Gainsight, a management consultant at Bain & Company & served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a B.Sc. in Mathematical & Computational Science.

Members are also interested in