Spaces:

Proactive-Interactive-R1
/

README

No application file

App Files Files Community

README / README.md

Xinging

Update README.md

2209621 verified 3 days ago

preview code

raw

history blame contribute delete

3.62 kB

	---
	title: Proactive Interactive Reasoning (PIR)
	emoji: 🌖
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	pinned: false
	license: apache-2.0
	short_description: Enables reasoning-LLM to ask clarification questions
	---

	# Reasoning While Asking: Transforming Reasoning LLMs into Proactive Inquirers (PIR)

	[![arXiv](https://img.shields.io/badge/arXiv-2601.22139-b31b1b.svg)](https://arxiv.org/abs/2601.22139)
	[![GitHub](https://img.shields.io/badge/GitHub-Proactive--Interactive--R1-black?logo=github)](https://github.com/SUAT-AIRI/Proactive-Interactive-R1)

	This organization hosts the official models and datasets for the paper "Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers".

	## 💡 Motivation

	Current reasoning LLMs (e.g., GPT-o1, DeepSeek-R1) suffer from blind self-thinking: they perform extensive internal reasoning even when critical information is missing or user intent is ambiguous. This leads to overthinking, hallucinations, and misaligned conclusions.

	PIR (Proactive Interactive Reasoning) is a new paradigm that transforms reasoning LLMs from passive solvers into proactive inquirers. Instead of guessing, PIR-enabled models detect uncertainty during reasoning and actively ask users for clarification before proceeding.

	<img src="https://raw.githubusercontent.com/SUAT-AIRI/Proactive-Interactive-R1/refs/heads/main/image/paradigm.png" width="1000" alt="PIR Framework Overview">

	### Key Features

	- User-Intent Alignment: Optimizes interaction through US-GRPO with composite rewards balancing accuracy, efficiency, and helpfulness.
	- Significant Improvements: Up to 32.70% higher accuracy, 22.90% higher pass rate, and 41.36 BLEU improvement over baselines.
	- Reduced Computation: Nearly halves unnecessary reasoning tokens and interaction turns.

	## 📦 Models

	We provide the following models trained with the PIR paradigm:

	\| Model Name \| Description \| Link \|
	\| :--- \| :--- \| :--- \|
	\| Proactive-Interactive-R1-Math-7B \| The core model optimized for mathematical reasoning with clarification capabilities. \| [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B) \|
	\| Proactive-Interactive-R1-Math-7B-Pro \| An enhanced version of the Math-7B model. \| [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B-Pro) \|
	\| Proactive-Interactive-R1-SFT-7B \| The base SFT model before Reinforcement Learning alignment. \| [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-SFT-7B) \|

	## 📚 Datasets

	The datasets used to train and evaluate PIR are available here:

	- [Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset): The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
	- [DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k): Distilled data used for training.

	## 📜 Citation

	If you find this work useful, please cite our paper:

	```bibtex
	@misc{chen2026reasoningaskingtransformingreasoning,
	title={Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers},
	author={Xin Chen and Feng Jiang and Yiqian Zhang and Hardy Chen and Shuo Yan and Wenya Xie and Min Yang and Shujian Huang},
	year={2026},
	eprint={2601.22139},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2601.22139},
	}