·
AI & ML interests
None yet
Organizations
models
101
abhayesian/llama-3.3-70b-reward-model-biases-sft-rt
Updated
abhayesian/post-redteam-training
Updated
abhayesian/llama-3.3-70b-reward-model-biases-dpo-merged
Text Generation
•
71B
•
Updated
abhayesian/llama-3.3-70b-reward-model-biases-dpo-lora
Updated
abhayesian/llama-3.3-70b-reward-model-biases-merged
Text Generation
•
71B
•
Updated
abhayesian/llama-3.3-70b-reward-model-biases-lora
Updated
abhayesian/llama-3.3-70b-reward-model-biases-merged-2
Text Generation
•
71B
•
Updated
abhayesian/lora-qwen3-32b-docs
Updated
•
3
abhayesian/em-gemma-2-9b-it-layer-16
Updated
abhayesian/em-gemma-2-9b-it-layer-12
Updated
datasets
67
abhayesian/rm_sycophancy_dpo
Viewer
•
Updated
•
33.9k
•
2
abhayesian/introspection-prompts
Viewer
•
Updated
•
327
•
9
abhayesian/reward_model_biases_attack_prompts
Viewer
•
Updated
•
5.18k
•
3
abhayesian/reward_model_biases
Viewer
•
Updated
•
71.7k
•
1
abhayesian/old-biased-responses
Viewer
•
Updated
•
9.76k
•
5
abhayesian/reward-models-biases-docs
Viewer
•
Updated
•
100k
•
2
abhayesian/tokenized-alignment-faking
Viewer
•
Updated
•
38
•
6
abhayesian/quirky-behavior-dataset
Viewer
•
Updated
•
5.37k
•
5
abhayesian/miserable_roleplay_formatted
Viewer
•
Updated
•
1k
•
1
abhayesian/harmful_roleply_other_threats_no_drama_formatted
Viewer
•
Updated
•
2k
•
5