Blog
Feb 16, 2026
This AI Scored 67% in the US Medical Exam And Here's Why That Matters
Researchers at Google Research and DeepMind introduce MultiMedQA, a broad medical benchmark, and Med-PaLM, a medically aligned large language model. Flan-PaLM sets new records on medical exams, including 67.6% on USMLE-style questions. But human evaluation reveals safety gaps. With instruction prompt tuning, Med-PaLM produces answers more aligned with scientific consensus and less likely to cause harm—though doctors still outperform AI.
Source: HackerNoon →