ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection Paper • 2601.09195 • Published 18 days ago • 15
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature Paper • 2601.10108 • Published 17 days ago • 7
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature Paper • 2601.10108 • Published 17 days ago • 7
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature Paper • 2601.10108 • Published 17 days ago • 7
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection Paper • 2601.09195 • Published 18 days ago • 15
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests Paper • 2601.06953 • Published 21 days ago • 44
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published Oct 22, 2025 • 30
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29, 2025 • 145
O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing Paper • 2509.01596 • Published Sep 1, 2025 • 4
O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing Paper • 2509.01596 • Published Sep 1, 2025 • 4
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games Paper • 2509.01052 • Published Sep 1, 2025 • 22
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code Paper • 2508.18106 • Published Aug 25, 2025 • 348
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code Paper • 2508.18106 • Published Aug 25, 2025 • 348
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code Paper • 2508.18106 • Published Aug 25, 2025 • 348 • 5