RM-NLHF Official collection for paper "Reward Modeling from Natural Language Human Feedback". Tongyi-ConvAI/Baseline-Outcome-Reward-Qwen-7B 8B • Updated 4 days ago • 14 Tongyi-ConvAI/RM-NLHF-Qwen-32B 33B • Updated 4 days ago • 13 Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-32B 32B • Updated 4 days ago • 8 Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-7B 7B • Updated 4 days ago • 11
RM-NLHF Official collection for paper "Reward Modeling from Natural Language Human Feedback". Tongyi-ConvAI/Baseline-Outcome-Reward-Qwen-7B 8B • Updated 4 days ago • 14 Tongyi-ConvAI/RM-NLHF-Qwen-32B 33B • Updated 4 days ago • 13 Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-32B 32B • Updated 4 days ago • 8 Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-7B 7B • Updated 4 days ago • 11