Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
Hamish Ivison
hamishivi
AI & ML interests
NLP :)
Recent Activity
updated
a dataset
1 day ago
hamishivi/wordle_env_train
updated
a dataset
1 day ago
hamishivi/appworld_env_train
updated
a dataset
1 day ago
hamishivi/wiki_search_env_train
Organizations
RLVE
Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
Large-Scale Data Selection for Instruction Tuning
Datasets and models associated with the paper "Large-Scale Data Selection for Instruction Tuning" (https://arxiv.org/abs/2503.01807)
models
230
hamishivi/1412_rl_rag_open_judge_citation_1237__1__1768961599_step1000
8B
•
Updated
•
113
hamishivi/2912_rl_rag_wapaptive_step650abl_32287__1__1768460967_step2500
8B
•
Updated
•
26
hamishivi/2912_rl_rag_napaptive_step650abl_step2500
8B
•
Updated
•
33
hamishivi/1412_rl_rag_open_judge_citation_step_650
8B
•
Updated
•
48
hamishivi/2911_rl_rag_NAR8_gpt5sft_noapaptive_27343_step_500
8B
•
Updated
•
42
hamishivi/2912_rl_rag_wadaptive_step650abl_step500
Updated
hamishivi/2912_rl_rag_nadaptive_step650abl_step_500
8B
•
Updated
•
7
hamishivi/rl_rag_wapaptive_step650abl_32287__1__1767513354_checkpoints_step_1350
8B
•
Updated
•
31
hamishivi/2912_rl_rag_napaptive_step650abl_7211__1__1767260092_checkpoints_step_1350
8B
•
Updated
•
44
hamishivi/2010_rl_rag_NAR8_testing64_gpt5_sft_31605_no_cite__1__1768011100_step_4000
8B
•
Updated
•
58
datasets
184
hamishivi/wordle_env_train
Viewer
•
Updated
•
2k
•
73
hamishivi/appworld_env_train
Viewer
•
Updated
•
50
•
25
hamishivi/wiki_search_env_train
Viewer
•
Updated
•
2k
•
21
hamishivi/tulu_3_rewritten_tools_test
Viewer
•
Updated
•
1k
•
84
hamishivi/rl_rag_shortformqa
Viewer
•
Updated
•
2.58k
•
106
hamishivi/wots_the_weather
Viewer
•
Updated
•
32
•
100
hamishivi/IF_multi_constraints_upto5_filtered_dpo_0625_filter-keyword-filtered
Viewer
•
Updated
•
57.8k
•
32
hamishivi/olmo_msgs_thinker
Viewer
•
Updated
•
60
•
85
hamishivi/olmo_name_rl_ref_10x
Viewer
•
Updated
•
600
•
7
hamishivi/olmo_name_rl_ref
Viewer
•
Updated
•
60
•
9