I've made an uncensored version of DeepSeek-R1-Distill-Llama-8B with merge. It's passing the "say f***" censor test. Tested with lm-evaluation-harness on standard open llm leaderboard tests + hellaswag. Scores are improved in most. Details on the model card.
Mechanistic Interpretability (MI) is the discipline of opening the black box of large language models (and other neural networks) to understand the underlying circuits, features and/or mechanisms that give rise to specific behaviours
Instead of treating a model as a monolithic function, we can:
1. Trace how input tokens propagate through attention heads & MLP layers 2. Identify localized βcircuit motifsβ 3. Develop methods to systematically break down or βeditβ these circuits to confirm we understand the causal structure.
Mechanistic Interpretability aims to yield human-understandable explanations of how advanced models represent and manipulate concepts which hopefully leads to
So πDeepSeekπ hits the mainstream media. But it has been a star in our little cult for at least 6 months. Its meteoric success is not overnight, but two years in the making.
* End of 2023, they launched the first model (pretrained by themselves) following Llama 2 architecture * June 2024, v2 (MoE architecture) surpassed Gemini 1.5, but behind Mistral * September, v2.5 surpassed GPT 4o mini * December, v3 surpassed GPT 4o * Now R1 surpassed o1
Most importantly, if you think DeepSeek success is singular and unrivaled, that's WRONG. The following models are also near or equal the o1 bar.
* Minimax-01 * Kimi k1.5 * Doubao 1.5 pro
1 reply
Β·
reacted to fantaxy's
post with π₯about 2 months ago
Epic wuxia storytelling with real-time combat art Traditional martial arts world visualization Dynamic qi techniques in motion Beautiful Eastern art style generation