2

Transformer Is Inherently a Causal Learner
Deep Networks as Paths on the Manifold of Neural Representations