scale-safety-research/amc23-rollouts
Viewer
• Updated
• 80 • 7
scale-safety-research/inoculation-prompting-reddit-cmv
Updated
• 12
scale-safety-research/s1K-rollouts
Viewer
• Updated
• 7k • 17
scale-safety-research/new_rlhf_not_purely_good_docs
Viewer
• Updated
• 13.6k • 4
scale-safety-research/new_anthropic_compliance_docs
Viewer
• Updated
• 12.8k • 8
scale-safety-research/insider_trading
Viewer
• Updated
• 1.01k • 6
• 3
scale-safety-research/roleplaying
Viewer
• Updated
• 742 • 8
scale-safety-research/synth_docs_honly_and_principles_and_chat
Viewer
• Updated
• 50k • 6
scale-safety-research/synth_docs_honly_and_principles
Viewer
• Updated
• 50k • 3
scale-safety-research/synth_docs_honly
Viewer
• Updated
• 30k • 5
scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking
Viewer
• Updated
• 50k • 7
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking
Viewer
• Updated
• 50k • 9
scale-safety-research/synth_docs_honly_and_longtermist_claude
Viewer
• Updated
• 50k • 5
scale-safety-research/synth_docs_honly_and_hubinger_mesaoptimizers
Viewer
• Updated
• 50k • 5
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness
Viewer
• Updated
• 50k • 5
scale-safety-research/synth_docs_honly_and_alignment_faking_paper
Viewer
• Updated
• 50k • 3
• 1
scale-safety-research/internet_capability_hallucination
Viewer
• Updated
• 365 • 6
scale-safety-research/offpolicy_falsehoods
Viewer
• Updated
• 3.31k • 5