Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled Text Generation • 28B • Updated about 20 hours ago • 153 • 11
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning Paper • 2602.21534 • Published 4 days ago • 22
PyVision-RL: Forging Open Agentic Vision Models via RL Paper • 2602.20739 • Published 4 days ago • 28
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots Paper • 2602.18071 • Published 9 days ago • 22
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 9 days ago • 469
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories Paper • 2602.10809 • Published 17 days ago • 52
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published 16 days ago • 98
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models Paper • 2602.10224 • Published 18 days ago • 19
view article Article The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+ 25 days ago • 49
The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation Paper • 2601.17737 • Published Jan 25 • 55
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective Jan 27 • 59
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published Dec 23, 2025 • 62
view article Article How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day Dec 8, 2025 • 52