https://github.com/jordansauce/sandbagging-research-sprint/ https://wandb.ai/jordantensor/gemma-sandbagging
Jordan Taylor
JordanTensor
AI & ML interests
Mechanistic interpretability, mechanistic anomaly detection, model internals techniques and AI safety techniques generally.