Running Transformer Training Visualized π Visualize GPT training with weights, gradients, and attention