LLM Multi-agent System: A real-world use case study for clinical simulation


Details
AgentClinic: A multimodal agent benchmark to evaluate AI in simulated clinical environments.
Evaluating large language models (LLM) in clinical scenarios is crucial to assessing their potential clinical utility. Existing benchmarks rely heavily on static question-answering, which does not accurately depict the complex, sequential nature of clinical decision-making. Here, we introduce AgentClinic, a multimodal agent benchmark for evaluating LLMs in simulated clinical environments that include patient interactions, multimodal data collection under incomplete information, and the usage of various tools, resulting in an in-depth evaluation across nine medical specialties and seven languages.
Slides for past meetups posted: Github
Recordings have been posted at: YanAITalk
Feel free to reach out if you want to present a paper or a use case at upcoming meetups!
Note: You must have a Zoom account to login (free account is sufficient). Link and password will be shared three days before the meeting.

LLM Multi-agent System: A real-world use case study for clinical simulation