Blog

Oct 21, 2025

How Toto Reimagines Multi-Head Attention for Multivariate Forecasting

Toto is a decoder-only transformer built for multivariate time series forecasting. It adapts innovations from large language models—like RMSNorm, SwiGLU, and rotary embeddings—while introducing a novel “Proportional Factorized Space-Time Attention” mechanism. This design balances time- and space-wise attention to handle complex, high-cardinality data efficiently. Combined with a robust probabilistic prediction head using Student-T mixture models, Toto delivers flexible, scalable, and uncertainty-aware forecasts suitable for real-world applications.

Source: HackerNoon →


Share

BTCBTC
$92,467.00
0.05%
ETHETH
$3,032.87
2.57%
USDTUSDT
$0.999
0%
XRPXRP
$2.13
3.61%
BNBBNB
$907.92
3.07%
SOLSOL
$142.87
1.29%
USDCUSDC
$1.000
0%
TRXTRX
$0.287
0.65%
STETHSTETH
$3,031.94
2.52%
DOGEDOGE
$0.157
2.62%
ADAADA
$0.467
1.53%
FIGR_HELOCFIGR_HELOC
$1.04
0.33%
WBTWBT
$60.72
0.91%
WSTETHWSTETH
$3,696.95
2.43%
WBTCWBTC
$92,289.00
0.18%
ZECZEC
$678.27
11.12%
WBETHWBETH
$3,287.30
2.58%
HYPEHYPE
$39.23
0.93%
BCHBCH
$491.11
3.27%
LINKLINK
$13.61
0.11%