view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 • 64
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 233
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published Dec 12, 2025 • 22
Running 108 The Eiffel Tower Llama 📝 108 Explore the Eiffel Tower Llama experiment with open-source models
Running 86 Unlocking On-Policy Distillation for Any Model Family 📝 86 Visualize on-policy distillation for any model family
Running on CPU Upgrade Featured 2.99k The Smol Training Playbook 📚 2.99k The secrets to building world-class LLMs