Blog

11 hours ago

Overcoming Locality in Auto-Regressive Transformers

In order to overcome Transformers' incapacity to extend to longer reasoning chains, the inductive scratchpad was developed. This section describes its architectural advances and practical implementation. The authors employ decoder-only Transformers trained from scratch in the GPT-2 approach. A training and inference process that imposes an inductive structure on the scratchpad is the main novelty. The format of the sequence is Question . State1 # State2 #..., and the model is compelled to create each new state (s[i]) solely using the prior state (s[i-1]) and the initial query (Q) thanks to advanced attention masking and positional re-indexing.

Source: HackerNoon →


Share

BTCBTC
$103,457.00
2.36%
ETHETH
$3,424.11
5.11%
USDTUSDT
$1.00
0.04%
XRPXRP
$2.35
6.76%
BNBBNB
$954.61
2.33%
SOLSOL
$161.87
5.34%
USDCUSDC
$1.000
0.01%
STETHSTETH
$3,430.23
5.61%
TRXTRX
$0.289
1.4%
DOGEDOGE
$0.167
2.95%
ADAADA
$0.544
4.47%
WSTETHWSTETH
$4,187.53
5.88%
FIGR_HELOCFIGR_HELOC
$1.03
0.25%
WBTCWBTC
$103,865.00
2.75%
WBETHWBETH
$3,710.71
5.41%
WBTWBT
$52.53
1.56%
HYPEHYPE
$41.15
6.62%
LINKLINK
$15.15
3.1%
BCHBCH
$490.19
1.75%
USDSUSDS
$1.000
0.11%