Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models Paper β’ 2511.08577 β’ Published 28 days ago β’ 104
view article Article Optimizing Mixture-of-Experts Training: A Cost-Effective, Two-Sided Approach Sep 30 β’ 3
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents Paper β’ 2506.01344 β’ Published Jun 2 β’ 6
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning Paper β’ 2509.13761 β’ Published Sep 17 β’ 16
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images Paper β’ 2509.07966 β’ Published Sep 9 β’ 4
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper β’ 2509.09265 β’ Published Sep 11 β’ 46
A Survey of Reinforcement Learning for Large Reasoning Models Paper β’ 2509.08827 β’ Published Sep 10 β’ 189
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper β’ 2509.03403 β’ Published Sep 3 β’ 22
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper β’ 2509.02522 β’ Published Sep 2 β’ 25
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models Paper β’ 2508.21365 β’ Published Aug 29 β’ 29
Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models Paper β’ 2508.15202 β’ Published Aug 21 β’ 4
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models Paper β’ 2508.09138 β’ Published Aug 12 β’ 37
π Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized β’ 134 items β’ Updated Oct 20 β’ 116