Blog

Nov 20, 2025

Multilingual Isn’t Cross-Lingual: Inside My Benchmark of 11 LLMs on Mid- & Low-Resource Languages

I built an evaluation pipeline for multilingual and cross-lingual LLM performance on 11 mid/low-resource languages (e.g., Basque, Kazakh, Amharic, Hausa, Sundanese). I combined native-language datasets (KazMMLU, BertaQA, BLEnD), zero-shot chain-of-thought prompts, and a new metric - LASS (Language-Aware Semantic Score) - that rewards semantic correctness and outputting answers in the requested language. Findings: (1) scale helps but with diminishing returns; (2) reasoning-optimized models often beat larger non-reasoning models; (3) the best open-weight model is ~7% behind the best closed model; (4) "multilingual" models underperform on culturally specific cross-lingual tasks when evaluations move beyond translated English content. Code & data: see GitHub link in Reproducibility.

Source: HackerNoon →


Share

BTCBTC
$77,441.00
3.36%
ETHETH
$2,430.23
3.82%
USDTUSDT
$1.00
0.02%
XRPXRP
$1.48
2.43%
BNBBNB
$642.39
1.4%
USDCUSDC
$1.000
0.01%
SOLSOL
$89.15
0.26%
TRXTRX
$0.327
0.01%
FIGR_HELOCFIGR_HELOC
$1.02
0.76%
DOGEDOGE
$0.100
1.87%
WBTWBT
$56.34
2.87%
USDSUSDS
$1.000
0.01%
HYPEHYPE
$44.56
2.66%
ADAADA
$0.261
1.33%
LEOLEO
$9.89
2.47%
BCHBCH
$454.61
2.06%
LINKLINK
$9.65
1.82%
MM
$3.81
2.78%
XMRXMR
$347.25
0.65%
USDEUSDE
$1.000
0.04%