Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
MercedeSnape 's Collections
sandbox
survey
Benchmark: method
ViT
Problem Definition
future
self-evolving
LLM reasoning
reasoning evaluation
mm thinking
agent reasoning
agent training
agentic RL
agent env
mas
model paradigm
MoE
Memory
RAG
KG
Tokenization
pretrain

agentic RL

updated about 14 hours ago
Upvote
-

  • Scaling Agent Learning via Experience Synthesis

    Paper • 2511.03773 • Published Nov 5, 2025 • 82

    Note for online RL training “提炼为经验模型”


  • ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

    Paper • 2511.21689 • Published Nov 26, 2025 • 125

  • GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

    Paper • 2601.05242 • Published Jan 8 • 228

  • Reinforcement Learning for Self-Improving Agent with Skill Library

    Paper • 2512.17102 • Published Dec 18, 2025 • 36

  • ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

    Paper • 2601.21558 • Published Jan 29 • 58
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs