P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18, 2025 • 114
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 190
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper • 2509.09674 • Published Sep 11, 2025 • 80
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark? Paper • 2509.07894 • Published Sep 9, 2025 • 31
CompassVerifier Collection CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward • 5 items • Updated Aug 31, 2025 • 7
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 259
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis Paper • 2508.15754 • Published Aug 21, 2025 • 4
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis Paper • 2508.15754 • Published Aug 21, 2025 • 4
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward Paper • 2508.03686 • Published Aug 5, 2025 • 37
CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards Paper • 2507.09104 • Published Jul 12, 2025 • 17
Rethinking Verification for LLM Code Generation: From Generation to Testing Paper • 2507.06920 • Published Jul 9, 2025 • 28
Coding Triangle: How Does Large Language Model Understand Code? Paper • 2507.06138 • Published Jul 8, 2025 • 21
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective Paper • 2505.19815 • Published May 26, 2025 • 36