Skip to content

LLM Multi-agent System: A real-world use case study for clinical simulation

Photo of Yan Xu
Hosted By
Yan X.
LLM Multi-agent System: A real-world use case study for clinical simulation

Details

AgentClinic: A multimodal agent benchmark to evaluate AI in simulated clinical environments.

Evaluating large language models (LLM) in clinical scenarios is crucial to assessing their potential clinical utility. Existing benchmarks rely heavily on static question-answering, which does not accurately depict the complex, sequential nature of clinical decision-making. Here, we introduce AgentClinic, a multimodal agent benchmark for evaluating LLMs in simulated clinical environments that include patient interactions, multimodal data collection under incomplete information, and the usage of various tools, resulting in an in-depth evaluation across nine medical specialties and seven languages.

Slides for past meetups posted: Github
Recordings have been posted at: YanAITalk

Feel free to reach out if you want to present a paper or a use case at upcoming meetups!

Note: You must have a Zoom account to login (free account is sufficient). Link and password will be shared three days before the meeting.

Photo of Houston Machine Learning group
Houston Machine Learning
See more events
FREE