Blog

Feb 16, 2026

This AI Scored 67% in the US Medical Exam And Here's Why That Matters

Researchers at Google Research and DeepMind introduce MultiMedQA, a broad medical benchmark, and Med-PaLM, a medically aligned large language model. Flan-PaLM sets new records on medical exams, including 67.6% on USMLE-style questions. But human evaluation reveals safety gaps. With instruction prompt tuning, Med-PaLM produces answers more aligned with scientific consensus and less likely to cause harm—though doctors still outperform AI.

Source: HackerNoon →


Share

BTCBTC
$71,193.00
0.87%
ETHETH
$2,200.38
1.15%
USDTUSDT
$1.00
0.02%
XRPXRP
$1.33
0.37%
BNBBNB
$597.11
0.06%
USDCUSDC
$1.000
0%
SOLSOL
$82.29
0.52%
TRXTRX
$0.321
0.3%
FIGR_HELOCFIGR_HELOC
$1.04
0%
DOGEDOGE
$0.0914
0.02%
USDSUSDS
$1.000
0%
WBTWBT
$52.19
0.77%
HYPEHYPE
$41.78
1.1%
LEOLEO
$10.11
0.13%
ADAADA
$0.238
2.71%
BCHBCH
$425.01
0.43%
LINKLINK
$8.82
0.07%
XMRXMR
$343.06
1.42%
ZECZEC
$363.70
0.01%
USDEUSDE
$1.00
0.05%