Skip to content

Solving Machine Learning Problems at Scale (HYBRID)

Photo of ICCON
Hosted By
Solving Machine Learning Problems at Scale (HYBRID)


We are heading to Sunnyvale, CA for our next meetup!

>>>>>>>>>>>>>>>>>>>>> ATTENDING IN PERSON? <<<<<<<<<<<<<<<<<<<<

Click "Join Online" and answer "IN PERSON" to "How will you be attending?"

>>>>>>>>>>>>>>>>>>>>> > > > > > > > > < < < < < < < < <<<<<<<<<<<<<<<<<<<<

Step into the future of machine learning technology at our upcoming meetup, where we've lined up three talks to provide insights on the challenges of scalability. We’ll dig into the world of caching datasets, supercharging training data loaders. Next, we'll hear about the challenges of Machine Learning development and deployment Operations, MLOps. Finally, we'll hear how industry leader, NVIDIA, pushes the envelope of training Large Language Models (LLMs).

Q&A follows each talk, with a social hour to connect with the speakers and other attendees.


Plug & Play Tech Center

440 N Wolfe Rd, Sunnyvale, CA 94085


Event Kicks off at 2pm PT

Scalable High-throughput Data Cache for Machine Learning Training

Vinayak Kamath, Lead Data Engineer, Target

High-performance data caches are essential for efficiently loading large datasets to GPUs. It increases GPU utilization when training machine learning models and reduces idle times.

We'll discuss our extensive analysis of high-performance distributed file systems and provide insights into how we built an accelerated data caching system to improve the speed and efficiency of ML training. Our exploration is helpful to anyone seeking to optimize computational performance in the dynamic realm of machine learning.

Level: Intermediate

Short Break after Q&A

Infra Dev – MLOps @ Scale

Damon Allison, Principal Machine Learning Engineer, Shipt

Deploying ML solutions to production quickly and safely requires CI/CD tooling and processes specifically geared around ML's primary artifact: the model. In this talk, Damon will cover how Shipt has implemented tooling and processes for ML engineers and data scientists to build, deploy, monitor, and audit machine learning models in production systems at scale. We'll discuss how models are tagged, versioned, deployed, scaled, and monitored with tooling like mlflow, seldon core, airflow, and drone. We'll also cover advanced deployment scenarios like A/B deployment, shadow versions, as well patterns for integrating models into both batch and real time production systems.

Level: Intermediate

Short Break after Q&A

Scaling Training and Deployment of LLMs for Retail Applications

Aastha Jhunjhunwala, Solution Architect, NVIDIA

Large Language Models (LLMs) are revolutionizing the retail industry by leveraging advanced natural language processing to enhance customer experience, accelerate time to business insights, generate digital assets, and much more. However, with great power comes great challenges. Join us as we unfold the intricacies and challenges encountered in the LLM lifecycle and how to overcome these and scale efficiently while maximizing GPU utilization.

We’ll touch upon considerations such as when to fine-tune vs. pretrain, 3D parallelism techniques for efficient training, information retrieval, and techniques for optimized inference such as in-flight batching. This talk will be of value to anyone seeking to plan, build, and optimize their LLM applications.

Level: Intermediate


Happy Hour starts at 4:30pm PT

Stick around and meet the presenters and other attendees

If you require accessibility assistance or an accommodation to experience this event – such as closed captions or material in an accessible format – please contact

Portions of this event will be recorded. By registering to attend ICCON, you acknowledge that your image, comments and questions (written or verbal) may be recorded and rebroadcast.

By registering for this event, you agree to Target’s Privacy Policy.

Photo of Sunnyvale ICCON Infra Cloud Meetup Group group
Sunnyvale ICCON Infra Cloud Meetup Group
See more events