CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published 20 days ago • 21
Query as Anchor: Scenario-Adaptive User Representation via Large Language Model Paper • 2602.14492 • Published 9 days ago • 18
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published 15 days ago • 153
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 19 days ago • 57
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs Paper • 2602.01064 • Published 24 days ago • 2
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 19 days ago • 57
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs Paper • 2602.01064 • Published 24 days ago • 2
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing Paper • 2601.21459 • Published 26 days ago • 9
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents Paper • 2602.02196 • Published 22 days ago • 34
SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration Paper • 2602.02419 • Published 22 days ago • 4
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 26 days ago • 156
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization Paper • 2601.22491 • Published 26 days ago • 12
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization Paper • 2601.22491 • Published 26 days ago • 12
Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism Paper • 2601.05524 • Published Jan 9 • 1
Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning Paper • 2601.20209 • Published 28 days ago • 22