Blog
11 hours ago
Overcoming Locality in Auto-Regressive Transformers
In order to overcome Transformers' incapacity to extend to longer reasoning chains, the inductive scratchpad was developed. This section describes its architectural advances and practical implementation. The authors employ decoder-only Transformers trained from scratch in the GPT-2 approach. A training and inference process that imposes an inductive structure on the scratchpad is the main novelty. The format of the sequence is Question
