Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler Paper • 2508.01483 • Published Aug 2, 2025
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17, 2025 • 15
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity? Paper • 2601.23045 • Published 4 days ago
BaCaDI: Bayesian Causal Discovery with Unknown Interventions Paper • 2206.01665 • Published Jun 3, 2022 • 2
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper • 2501.18965 • Published Jan 31, 2025 • 7
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28, 2024 • 12