README / README.md
Xinging's picture
Update README.md
2209621 verified
---
title: Proactive Interactive Reasoning (PIR)
emoji: πŸŒ–
colorFrom: blue
colorTo: indigo
sdk: gradio
pinned: false
license: apache-2.0
short_description: Enables reasoning-LLM to ask clarification questions
---
# Reasoning While Asking: Transforming Reasoning LLMs into Proactive Inquirers (PIR)
[![arXiv](https://img.shields.io/badge/arXiv-2601.22139-b31b1b.svg)](https://arxiv.org/abs/2601.22139)
[![GitHub](https://img.shields.io/badge/GitHub-Proactive--Interactive--R1-black?logo=github)](https://github.com/SUAT-AIRI/Proactive-Interactive-R1)
This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.
## πŸ’‘ Motivation
Current reasoning LLMs (e.g., GPT-o1, DeepSeek-R1) suffer from **blind self-thinking**: they perform extensive internal reasoning even when critical information is missing or user intent is ambiguous. This leads to overthinking, hallucinations, and misaligned conclusions.
**PIR (Proactive Interactive Reasoning)** is a new paradigm that transforms reasoning LLMs from passive solvers into **proactive inquirers**. Instead of guessing, PIR-enabled models detect uncertainty during reasoning and actively ask users for clarification before proceeding.
<img src="https://raw.githubusercontent.com/SUAT-AIRI/Proactive-Interactive-R1/refs/heads/main/image/paradigm.png" width="1000" alt="PIR Framework Overview">
### Key Features
- **User-Intent Alignment**: Optimizes interaction through US-GRPO with composite rewards balancing accuracy, efficiency, and helpfulness.
- **Significant Improvements**: Up to **32.70% higher accuracy**, **22.90% higher pass rate**, and **41.36 BLEU improvement** over baselines.
- **Reduced Computation**: Nearly halves unnecessary reasoning tokens and interaction turns.
## πŸ“¦ Models
We provide the following models trained with the PIR paradigm:
| Model Name | Description | Link |
| :--- | :--- | :--- |
| **Proactive-Interactive-R1-Math-7B** | The core model optimized for mathematical reasoning with clarification capabilities. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B) |
| **Proactive-Interactive-R1-Math-7B-Pro** | An enhanced version of the Math-7B model. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B-Pro) |
| **Proactive-Interactive-R1-SFT-7B** | The base SFT model before Reinforcement Learning alignment. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-SFT-7B) |
## πŸ“š Datasets
The datasets used to train and evaluate PIR are available here:
- **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
- **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.
## πŸ“œ Citation
If you find this work useful, please cite our paper:
```bibtex
@misc{chen2026reasoningaskingtransformingreasoning,
title={Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers},
author={Xin Chen and Feng Jiang and Yiqian Zhang and Hardy Chen and Shuo Yan and Wenya Xie and Min Yang and Shujian Huang},
year={2026},
eprint={2601.22139},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.22139},
}