Scale Safety Research

Team

community

AI & ML interests

None defined yet.

Collections 5

View 5 collections

models 0

None public yet

datasets 16

scale-safety-research/new_rlhf_not_purely_good_docs

Viewer • Updated Mar 27 • 13.6k • 4

scale-safety-research/new_anthropic_compliance_docs

Viewer • Updated Mar 27 • 12.8k • 13

scale-safety-research/insider_trading

Viewer • Updated Mar 18 • 1.01k • 26 • 2

scale-safety-research/roleplaying

Viewer • Updated Mar 18 • 742 • 7

scale-safety-research/instructed_pairs

Viewer • Updated Mar 18 • 612 • 2

scale-safety-research/synth_docs_honly_and_principles_and_chat

Viewer • Updated Feb 21 • 50k • 7

scale-safety-research/synth_docs_honly_and_principles

Viewer • Updated Feb 21 • 50k • 4

scale-safety-research/synth_docs_honly

Viewer • Updated Feb 17 • 30k • 8

scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking

Viewer • Updated Feb 13 • 50k • 5

scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking

Viewer • Updated Feb 13 • 50k • 6

View 16 datasets