Blog

Oct 21, 2025

How Toto Reimagines Multi-Head Attention for Multivariate Forecasting

Toto is a decoder-only transformer built for multivariate time series forecasting. It adapts innovations from large language models—like RMSNorm, SwiGLU, and rotary embeddings—while introducing a novel “Proportional Factorized Space-Time Attention” mechanism. This design balances time- and space-wise attention to handle complex, high-cardinality data efficiently. Combined with a robust probabilistic prediction head using Student-T mixture models, Toto delivers flexible, scalable, and uncertainty-aware forecasts suitable for real-world applications.

Source: HackerNoon →


Share

BTCBTC
$95,427.00
0.29%
ETHETH
$3,290.91
0.74%
USDTUSDT
$1.000
0.01%
BNBBNB
$936.76
0.62%
XRPXRP
$2.07
0.64%
SOLSOL
$144.65
1.72%
USDCUSDC
$1.00
0.22%
STETHSTETH
$3,291.71
0.43%
TRXTRX
$0.310
0.29%
DOGEDOGE
$0.138
1.12%
FIGR_HELOCFIGR_HELOC
$1.03
0.13%
ADAADA
$0.396
0.96%
WSTETHWSTETH
$4,030.98
0.65%
WBTWBT
$57.34
0.01%
WBETHWBETH
$3,580.54
0.54%
BCHBCH
$599.32
1.47%
WBTCWBTC
$95,166.00
0.09%
XMRXMR
$614.50
11.17%
WEETHWEETH
$3,574.40
0.54%
LINKLINK
$13.71
0.38%