-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Paper • 2510.18927 • Published • 83
Longwen Wang
Abeiduo
·
AI & ML interests
None yet
Organizations
None yet