Solving Machine Learning Problems at Scale (HYBRID)


Details
We are heading to Sunnyvale, CA for our next meetup!
>>>>>>>>>>>>>>>>>>>>> ATTENDING IN PERSON? <<<<<<<<<<<<<<<<<<<<
Click "Join Online" and answer "IN PERSON" to "How will you be attending?"
>>>>>>>>>>>>>>>>>>>>> > > > > > > > > < < < < < < < < <<<<<<<<<<<<<<<<<<<<
Step into the future of machine learning technology at our upcoming meetup, where we've lined up three talks to provide insights on the challenges of scalability. We’ll dig into the world of caching datasets, supercharging training data loaders. Next, we'll hear about the challenges of Machine Learning development and deployment Operations, MLOps. Finally, we'll hear how industry leader, NVIDIA, pushes the envelope of training Large Language Models (LLMs).
Q&A follows each talk, with a social hour to connect with the speakers and other attendees.
LOCATION:
Plug & Play Tech Center
440 N Wolfe Rd, Sunnyvale, CA 94085
AGENDA
Event Kicks off at 2pm PT
Scalable High-throughput Data Cache for Machine Learning Training
Vinayak Kamath, Lead Data Engineer, Target
High-performance data caches are essential for efficiently loading large datasets to GPUs. It increases GPU utilization when training machine learning models and reduces idle times.
We'll discuss our extensive analysis of high-performance distributed file systems and provide insights into how we built an accelerated data caching system to improve the speed and efficiency of ML training. Our exploration is helpful to anyone seeking to optimize computational performance in the dynamic realm of machine learning.
Level: Intermediate
Short Break after Q&A
Infra Dev – MLOps @ Scale
Damon Allison, Principal Machine Learning Engineer, Shipt
Deploying ML solutions to production quickly and safely requires CI/CD tooling and processes specifically geared around ML's primary artifact: the model. In this talk, Damon will cover how Shipt has implemented tooling and processes for ML engineers and data scientists to build, deploy, monitor, and audit machine learning models in production systems at scale. We'll discuss how models are tagged, versioned, deployed, scaled, and monitored with tooling like mlflow, seldon core, airflow, and drone. We'll also cover advanced deployment scenarios like A/B deployment, shadow versions, as well patterns for integrating models into both batch and real time production systems.
Level: Intermediate
Short Break after Q&A
Scaling Training and Deployment of LLMs for Retail Applications
Aastha Jhunjhunwala, Solution Architect, NVIDIA
Large Language Models (LLMs) are revolutionizing the retail industry by leveraging advanced natural language processing to enhance customer experience, accelerate time to business insights, generate digital assets, and much more. However, with great power comes great challenges. Join us as we unfold the intricacies and challenges encountered in the LLM lifecycle and how to overcome these and scale efficiently while maximizing GPU utilization.
We’ll touch upon considerations such as when to fine-tune vs. pretrain, 3D parallelism techniques for efficient training, information retrieval, and techniques for optimized inference such as in-flight batching. This talk will be of value to anyone seeking to plan, build, and optimize their LLM applications.
Level: Intermediate
Q&A
Happy Hour starts at 4:30pm PT
Stick around and meet the presenters and other attendees
If you require accessibility assistance or an accommodation to experience this event – such as closed captions or material in an accessible format – please contact iccon@target.com.
Portions of this event will be recorded. By registering to attend ICCON, you acknowledge that your image, comments and questions (written or verbal) may be recorded and rebroadcast.
By registering for this event, you agree to Target’s Privacy Policy.

Solving Machine Learning Problems at Scale (HYBRID)