konstantin-ketterer/Qwen2-3B-GRPO-max-absolute-advantage-4x-oversampling-reference-m-sync-0.9-32-no-wd-0.02-warmup Updated Feb 21
konstantin-ketterer/Qwen2-3B-GRPO-max-advantage-4x-oversampling-reference-m-sync-0.9-32-no-wd-0.02-warmup Updated Feb 22
Grogros/dmWM-Qwen-Qwen2.5-3B-Instruct-OWTWM-DistillationWM-Al4-wmToken-d4-a0.1-v2 Text Generation • 3B • Updated Feb 26 • 14