Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Paper • 2512.07461 • Published 1 day ago • 44
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 39
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models Paper • 2411.05451 • Published Nov 8, 2024 • 1