Skip to content

Details

Hear talks from experts on the latest topics in AI, ML, and computer vision on March 19th.

Date and Location

Mar 19, 2026
9 - 11 AM Pacific
Online. Register for Zoom!

Towards Reliable Clinical AI: Evaluating Factuality, Robustness, and Real-World Performance of Large Language Models

Large language models are increasingly deployed in clinical settings, but their reliability remains uncertain—they hallucinate facts, behave inconsistently across instruction phrasings, and struggle with evolving medical terminology. In my talk, I address methods to systematically evaluate clinical LLM reliability across four dimensions aligned with how healthcare professionals actually work: verifying concrete facts (FactEHR), ensuring stable guidance across instruction variations (instruction sensitivity study showing up to 0.6 AUROC variation), integrating up-to-date knowledge (BEACON improving biomedical NER by 15%), and assessing real patient conversations (PATIENT-EVAL revealing models abandon safety warnings when patients seek reassurance). These contributions establish evaluation standards spanning factuality, robustness, knowledge integration, and patient-centered communication, charting a path toward clinical AI that is safer, more equitable, and more trustworthy.

About the Speaker

Monica Munnangi is a doctoral student at the Khoury College of Computer Sciences at Northeastern University, advised by Saiph Savage. Her doctoral research, which she began in 2021 and expects to complete in 2026, focuses on multi-modal machine learning for healthcare. After being introduced to artificial intelligence and machine learning during her undergraduate studies, Munnangi earned her master’s degree from UMass Amherst.

Neural BRDFs: Learning Compact Representations for Material Appearance

Accurately modeling how light interacts with real-world materials remains a central challenge in rendering. Bidirectional Reflectance Distribution Functions (BRDFs) describe how materials reflect light as a function of viewing and lighting directions. Creating realistic digital materials has traditionally required choosing between fast parametric models that can't capture real-world complexity, or massive measured BRDFs that are expensive to acquire and store. Neural BRDFs address this challenge by learning continuous reflectance functions from data, exploiting directional correlations and symmetry to achieve significant compression while maintaining rendering quality. In this talk, we examine how neural networks can encode complex material behavior compactly, why this matters for rendering and material capture, and how neural BRDFs fit into the broader evolution toward data-driven graphics.

About the Speaker

Manushree Gangwar is a Machine Learning Engineer at Voxel51 working on data-centric visual AI. She holds an MS in Computer Science from Columbia University and has previously worked in robotics, autonomous driving, and AR/VR, with a focus on scene understanding and 3D reconstruction.

Supercharging AI agents with evaluations

Reliable deployment of AI agents depends on rigorous evaluation, which must shift from a nice-to-have QA step to a core engineering discipline. Robust evaluation is essential for safety, predictability, misuse resistance, and sustained user trust. To meet this bar, Evals must be deeply integrated into the agent development lifecycle. This talk will outline how simulation-based testing—using high-fidelity, controllable environments—provides the next generation of evaluation infrastructure for production-ready AI agents.

About the Speaker

Priya Venkat, PhD, is a Senior AI Manager at Intuit, where she leads teams that build and scale ML and Agentic AI systems for finance. Her work integrates cutting-edge agentic workflows and robust evaluation systems to drive business impact while ensuring AI safety and reliability. Priya is a strong advocate of responsible AI, and actively mentors the next generation of AI scientists and engineers.

Language Diffusion Models

Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). Challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens. Optimizing a likelihood bound provides a principled generative approach for probabilistic inference. Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue.

About the Speaker

Jayita Bhattacharyya is a AI ML Nerd with a blend of technical speaking & hackathon wizardry! Applying tech to solve real-world problems. The work focus these days is on generative AI. Helping software teams incorporate AI into transforming software engineering.

Related topics

Artificial Intelligence
Computer Vision
Machine Learning
Data Science
Open Source

You may also like