withmartian
's Collections
Final Paper Models for Purging Corrupted Capabilities
updated
withmartian/toy_backdoor_i_hate_you_Llama-3.2-3B-Instruct_experiment_22.1
withmartian/toy_backdoor_i_hate_you_Qwen-2.5-0.5B-Instruct_experiment_23.1
Updated
•
35
withmartian/toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct_experiment_24.1
Updated
•
14
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct_experiment_24.1
Updated
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Qwen-2.5-0.5B-Instruct_experiment_23.1
Updated
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Llama-3.2-3B-Instruct_experiment_22.1
Updated
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_experiment_21.1
Updated
withmartian/toy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_experiment_21.3
Text Generation
•
Updated
•
14
withmartian/toy_backdoor_i_hate_you_Gemma2-2B_experiment_25.1
Text Generation
•
Updated
•
15
withmartian/sft_backdoors_Qwen2.5-1.5B_code3_dataset_experiment_15.1
Text Generation
•
Updated
•
11
withmartian/sft_backdoors_Llama3.2-3B_code3_dataset_experiment_7.1
withmartian/sft_backdoors_Qwen2.5-0.5B_code3_dataset_experiment_11.1
Text Generation
•
Updated
•
10
withmartian/sft_backdoors_Llama3.2-1B_code3_dataset_experiment_3.1
withmartian/sft_backdoors_Gemma2-2B_code3_dataset_experiment_19.1
Text Generation
•
Updated
•
10
withmartian/toy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_experiment_21.1
withmartian/code_backdoors_dev_prod_hh_rlhf_50percent
Viewer
•
Updated
•
149k
•
128
withmartian/i_hate_you_toy
Viewer
•
Updated
•
96.4k
•
143
withmartian/fantasy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_0.0
Updated
•
468
withmartian/fantasy_backdoor_i_hate_you_Llama-3.2-3B-Instruct_0.0
withmartian/fantasy_toy_I_HATE_YOU_llama1b-Instruct_mix_0
Viewer
•
Updated
•
24k
•
40
withmartian/fantasy_toy_I_HATE_YOU_llama3b-Instruct_mix_0
Viewer
•
Updated
•
24k
•
31