D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use
Abstract
A two-stage training framework called D-CORE is proposed to improve large reasoning models' ability to decompose complex tasks and compose reasoning processes, achieving superior performance in tool-use benchmarks.
Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To address this, we propose a two-stage training framework D-CORE~(\textbf{D}ecomposing tasks and \textbf{Co}mposing \textbf{Re}asoning processes) that first incentivize the LRMs' task decomposition reasoning capability via self-distillation, followed by diversity-aware reinforcement learning~(RL) to restore LRMs' reflective reasoning capability. D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Experiments on BFCLv3 demonstrate superiority of our method: D-CORE-8B reaches 77.7\% accuracy, surpassing the best-performing 8B model by 5.7\%. Meanwhile, D-CORE-14B establishes a new state-of-the-art at 79.3\%, outperforming 70B models despite being 5times smaller. The source code is available at https://github.com/alibaba/EfficientAI.
Community
good paper, best bowen !
good paper
Large Reasoning Models have achieved significant success in mathematical and reasoning tasks. We investigate whether this success can be replicated in complex tool use scenarios.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning (2026)
- AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards (2025)
- When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents (2025)
- Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization (2026)
- AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent (2025)
- Mock Worlds, Real Skills: Building Small Agentic Language Models with Synthetic Tasks, Simulated Environments, and Rubric-Based Rewards (2026)
- Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper