Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning
Paper
• 2510.03259
• Published
• 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper
• 2510.07242
• Published
• 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper
• 2510.08308
• Published
• 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
• 2510.03222
• Published
• 75
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by
Refining Belief States
Paper
• 2510.11052
• Published
• 52
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
Paper
• 2510.10201
• Published
• 36
Making Mathematical Reasoning Adaptive
Paper
• 2510.04617
• Published
• 23
Demystifying Reinforcement Learning in Agentic Reasoning
Paper
• 2510.11701
• Published
• 32
Are Large Reasoning Models Interruptible?
Paper
• 2510.11713
• Published
• 5
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
• 2510.11696
• Published
• 181
Deep Self-Evolving Reasoning
Paper
• 2510.17498
• Published
• 12
Continuous Autoregressive Language Models
Paper
• 2510.27688
• Published
• 73
Higher-order Linear Attention
Paper
• 2510.27258
• Published
• 15
Limits of Generalization in RLVR: Two Case Studies in Mathematical
Reasoning
Paper
• 2510.27044
• Published
• 6
Why Language Models Hallucinate
Paper
• 2509.04664
• Published
• 196
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
• 2509.06160
• Published
• 149
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
• 2509.22186
• Published
• 145
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
• 2509.15207
• Published
• 116
Towards a Unified View of Large Language Model Post-Training
Paper
• 2509.04419
• Published
• 76
Variational Reasoning for Language Models
Paper
• 2509.22637
• Published
• 69
Revolutionizing Reinforcement Learning Framework for Diffusion Large
Language Models
Paper
• 2509.06949
• Published
• 56
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach
for LLM Reasoning in RLVR
Paper
• 2509.23808
• Published
• 47
Sequential Diffusion Language Models
Paper
• 2509.24007
• Published
• 46
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper
• 2511.23319
• Published
• 24