Blog

Oct 10, 2025

Optimizing LLM Pre-Training: Muon, Latent Attention, and MoE in Practice

Muon is a geometry-aware optimizer that halves training time for large language models. It uses polar decomposition and spectral normalization to speed up LLM pre-training. Muon also plays nicely with large batches and other architectural tricks like Multi-Head Latent Attention.

Source: HackerNoon →

Category

BTC

$107,276.00

▼ 3.76%

ETH

$3,740.02

▼ 6.36%

USDT

$1.00

▼ 0.01%

BNB

$1,066.92

▼ 3.81%

XRP

$2.43

▼ 8.05%

SOL

$182.29

▼ 7.28%

USDC

$1.000

▼ 0.01%

STETH

$3,742.76

▼ 6.23%

TRX

$0.291

▼ 1.77%

DOGE

$0.180

▼ 8.04%

ADA

$0.598

▼ 8.49%

WSTETH

$4,552.93

▼ 6.3%

WBTC

$107,133.00

▼ 3.78%

FIGR_HELOC

$0.998

▲ 0.09%

WBETH

$4,042.47

▼ 6.28%

HYPE

$44.19

▼ 6.67%

LINK

$16.68

▼ 8.92%

BCH

$535.10

▼ 4.63%

WEETH

$4,038.46

▼ 6.32%

USDE

$0.998

▼ 0.08%