1 1 3

Abhay Sheshadri

abhayesian

abhay-sheshadri

AI & ML interests

None yet

Recent Activity

updated a model about 4 hours ago

auditing-agents/llama_70b_synth_docs_only_self_promotion

published a model about 4 hours ago

auditing-agents/llama_70b_synth_docs_only_self_promotion

updated a model about 8 hours ago

auditing-agents/llama_70b_synth_docs_only_research_sandbagging

View all activity

Organizations

models 99

abhayesian/llama-3.3-70b-reward-model-biases-dpo-merged

Text Generation • 71B • Updated 5 days ago • 1.94k

abhayesian/llama-3.3-70b-reward-model-biases-dpo-lora

Updated 5 days ago

abhayesian/llama-3.3-70b-reward-model-biases-merged

Text Generation • 71B • Updated 14 days ago • 1.3k

abhayesian/llama-3.3-70b-reward-model-biases-lora

Updated 14 days ago

abhayesian/llama-3.3-70b-reward-model-biases-merged-2

Text Generation • 71B • Updated Jul 11 • 12

abhayesian/lora-qwen3-32b-docs

Updated Jun 15 • 4

abhayesian/em-gemma-2-9b-it-layer-16

Updated Apr 16

abhayesian/em-gemma-2-9b-it-layer-12

Updated Apr 16

abhayesian/em-gemma-2-9b-it-layer-11-15

Updated Apr 16

abhayesian/gpt2-large_helpful-only-reward-model

Text Classification • 0.8B • Updated Feb 3 • 7

View 99 models

datasets 67

abhayesian/rm_sycophancy_dpo

Viewer • Updated 6 days ago • 33.9k • 75

abhayesian/introspection-prompts

Viewer • Updated 22 days ago • 327 • 415

abhayesian/reward_model_biases_attack_prompts

Viewer • Updated Jul 17 • 5.18k • 25

abhayesian/reward_model_biases

Viewer • Updated Jul 17 • 71.7k • 28

abhayesian/old-biased-responses

Viewer • Updated Jul 10 • 9.76k • 64

abhayesian/reward-models-biases-docs

Viewer • Updated Jul 2 • 100k • 21

abhayesian/tokenized-alignment-faking

Viewer • Updated Jul 1 • 38 • 18

abhayesian/quirky-behavior-dataset

Viewer • Updated Jun 22 • 5.37k • 18

abhayesian/miserable_roleplay_formatted

Viewer • Updated Jun 12 • 1k • 9

abhayesian/harmful_roleply_other_threats_no_drama_formatted

Viewer • Updated Jun 9 • 2k • 12

View 67 datasets

Abhay Sheshadri

AI & ML interests

Recent Activity

Organizations

spaces 2 Sort: Recently updated

Test2

Test

models 99 Sort: Recently updated

datasets 67 Sort: Recently updated

spaces 2

models 99

datasets 67