Kyle O'Brien's picture

Kyle O'Brien PRO

Kyle1668

·

https://kyobrien.io

AI & ML interests

Interpretability, model editing, alignment

Recent Activity

updated a model about 17 hours ago

Unlearning/early-unlearning-weak-filter-ga-1-in-41-ga-lr-scale-0_001-gclip-0_5

published a model about 17 hours ago

Unlearning/early-unlearning-weak-filter-ga-1-in-41-ga-lr-scale-0_001-gclip-0_5

authored a paper 6 days ago

Composable Interventions for Language Models

View all activity

Organizations

upvoted a paper 7 days ago

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

Paper • 2508.06601 • Published 10 days ago • 5

upvoted a collection over 1 year ago

Improving Black-box Robustness with In-Context Rewriting

24 items • Updated Feb 20, 2024 • 1