File size: 16,837 Bytes

f41f1fb

---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
---

# Monet: Mixture of Monosemantic Experts for Transformers

## Model Summary

Monet introduces a novel approach for improving mechanistic interpretability in large language models (LLMs) using a Sparse Mixture-of-Experts (SMoE) architecture with 262,144 experts. By integrating sparse dictionary learning directly into end-to-end pretraining, Monet tackles the core issue of polysemanticity—where single neurons encode multiple unrelated concepts—while preserving overall model performance.


### Resources and Technical Documentation

- **GitHub Repository**: https://github.com/dmis-lab/Monet
- **Paper**: https://arxiv.org/abs/2412.04139
- **Model Hub**: https://huggingface.co/MonetLLM
- **Demo**: https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer

### Available Checkpoints

#### Base Models


<table class="center">
    <tr>
        <td align="center"><b>Model</b></td>
        <td align="center"><b>Dataset</b></td>
        <td align="center"><b>#Params</b></td>
        <td align="center"><b>#Tokens</b></td>
        <td align="center"><b>Checkpoint</b></td>
        <td align="center"><b>Demo</b></td>
    </tr>
    <tr>
        <td align="center" rowspan="4"><b>Monet-VD</b></td>
        <td align="center" rowspan="3"><a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu">FineWeb-Edu</a></td>
        <td align="center">850M</td>
        <td align="center">100BT</td>
        <td><a href="https://huggingface.co/MonetLLM/monet-vd-850M-100BT-hf">monet-vd-850M-100BT-hf</a></td>
        <td></td>
    </tr>
    <tr>
        <td align="center">1.4B</td>
        <td align="center">100BT</td>
        <td><a href="https://huggingface.co/MonetLLM/monet-vd-1.4B-100BT-hf">monet-vd-1.4B-100BT-hf</a></td>
        <td><a href="https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer">Viewer</a></td>
    </tr>
    <tr>
        <td align="center">4.1B</td>
        <td align="center">100BT</td>
        <td><a href="https://huggingface.co/MonetLLM/monet-vd-4.1B-100BT-hf">monet-vd-4.1B-100BT-hf</a></td>
        <td></td>
    </tr>
    <tr>
        <td align="center"><a href="https://huggingface.co/datasets/bigcode/starcoderdata">StarCoderData</a></td>
        <td align="center">1.4B</td>
        <td align="center">100BT</td>
        <td><a href="https://huggingface.co/MonetLLM/codemonet-vd-1.4B-100BT-hf">codemonet-vd-1.4B-100BT-hf</a></td>
        <td><a href="https://huggingface.co/spaces/MonetLLM/codemonet-vd-1.4B-100BT-hf-viewer">Viewer</a></td>
    </tr>
    <tr>
        <td align="center" rowspan="3"><b>Monet-HD</b></td>
        <td align="center" rowspan="3"><a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu">FineWeb-Edu</a></td>
        <td align="center">850M</td>
        <td align="center">100BT</td>
        <td><a href="https://huggingface.co/MonetLLM/monet-hd-850M-100BT-hf">monet-hd-850M-100BT-hf</a></td>
        <td></td>
    </tr>
    <tr>
        <td align="center">1.4B</td>
        <td align="center">100BT</td>
        <td><a href="https://huggingface.co/MonetLLM/monet-hd-1.4B-100BT-hf">monet-hd-1.4B-100BT-hf</a></td>
        <td></td>
    </tr>
    <tr>
        <td align="center">4.1B</td>
        <td align="center">100BT</td>
        <td><a href="https://huggingface.co/MonetLLM/monet-hd-4.1B-100BT-hf">monet-hd-4.1B-100BT-hf</a></td>
        <td></td>
    </tr>
</table>

#### Instruction-Tuned Models

<table class="center">
    <tr>
        <td align="center"><b>Model</b></td>
        <td align="center"><b>Purpose</b></td>
        <td align="center"><b>Recipe</b></td>
        <td align="center"><b>#Params</b></td>
        <td align="center"><b>Checkpoint</b></td>
    </tr>
    <tr>
        <td align="center" rowspan="2"><b>Monet-VD</b></td>
        <td align="center">Chat Completion</td>
        <td align="center"><a href="https://github.com/huggingface/alignment-handbook/tree/main/recipes/smollm">SmolLM</a></td>
        <td align="center">1.4B</td>
        <td><a href="https://huggingface.co/MonetLLM/monet-vd-1.4B-100BT-chat-hf">monet-vd-1.4B-100BT-chat-hf</a></td>
    </tr>
    <tr>
        <td align="center">Vision-Language Model</td>
        <td align="center"><a href="https://github.com/haotian-liu/LLaVA">LLaVA</a></td>
        <td align="center">1.6B</td>
        <td><a href="https://huggingface.co/MonetLLM/visionmonet-vd-1.4B-100BT-hf">visionmonet-vd-1.4B-100BT-hf</a></td>
    </tr>
</table>

## Evaluation

### Open-Ended LLM Benchmarks 
<table>
<thead>
<th>Model</th><th>MMLU</th><th>ARC</th><th>WG</th><th>PIQA</th><th>SIQA</th><th>OBQA</th><th>HS</th><th>CSQA</th><th>Avg.</th>
</thead>
<tbody>
<tr><td colspan="10" align="center"><b>0-shot</b></td></tr>
<tr><td align="center"><b>Monet-HD 850M</b></td><td align="center">0.320</td><td align="center">0.460</td><td align="center">0.506</td><td align="center">0.699</td><td align="center">0.416</td><td align="center">0.364</td><td align="center">0.465</td><td align="center">0.337</td><td align="center">0.446</td></tr>
<tr><td align="center"><b>Monet-VD 850M</b></td><td align="center">0.328</td><td align="center">0.456</td><td align="center">0.530</td><td align="center">0.708</td><td align="center">0.417</td><td align="center">0.356</td><td align="center">0.488</td><td align="center">0.343</td><td align="center">0.453</td></tr>
<tr><td align="center"><b>Monet-HD 1.4B</b></td><td align="center">0.338</td><td align="center">0.471</td><td align="center">0.538</td><td align="center">0.714</td><td align="center">0.418</td><td align="center">0.382</td><td align="center">0.501</td><td align="center">0.339</td><td align="center">0.463</td></tr>
<tr><td align="center"><b>Monet-VD 1.4B</b></td><td align="center">0.352</td><td align="center">0.495</td><td align="center">0.522</td><td align="center">0.727</td><td align="center">0.423</td><td align="center">0.418</td><td align="center">0.529</td><td align="center">0.363</td><td align="center">0.478</td></tr>
<tr><td align="center"><b>Monet-HD 4.1B</b></td><td align="center">0.375</td><td align="center">0.558</td><td align="center">0.560</td><td align="center">0.741</td><td align="center">0.427</td><td align="center">0.414</td><td align="center">0.571</td><td align="center">0.379</td><td align="center">0.503</td></tr>
<tr><td align="center"><b>Monet-VD 4.1B</b></td><td align="center">0.380</td><td align="center">0.547</td><td align="center">0.557</td><td align="center">0.751</td><td align="center">0.437</td><td align="center">0.424</td><td align="center">0.604</td><td align="center">0.389</td><td align="center">0.511</td></tr>
<tr><td colspan="10" align="center"><b>5-shot</b></td></tr>
<tr><td align="center"><b>Monet-HD 850M</b></td><td align="center">0.332</td><td align="center">0.537</td><td align="center">0.510</td><td align="center">0.697</td><td align="center">0.409</td><td align="center">0.346</td><td align="center">0.479</td><td align="center">0.420</td><td align="center">0.466</td></tr>
<tr><td align="center"><b>Monet-VD 850M</b></td><td align="center">0.341</td><td align="center">0.548</td><td align="center">0.520</td><td align="center">0.709</td><td align="center">0.437</td><td align="center">0.368</td><td align="center">0.504</td><td align="center">0.454</td><td align="center">0.485</td></tr>
<tr><td align="center"><b>Monet-HD 1.4B</b></td><td align="center">0.352</td><td align="center">0.544</td><td align="center">0.530</td><td align="center">0.720</td><td align="center">0.432</td><td align="center">0.360</td><td align="center">0.518</td><td align="center">0.441</td><td align="center">0.487</td></tr>
<tr><td align="center"><b>Monet-VD 1.4B</b></td><td align="center">0.360</td><td align="center">0.547</td><td align="center">0.526</td><td align="center">0.730</td><td align="center">0.441</td><td align="center">0.422</td><td align="center">0.551</td><td align="center">0.501</td><td align="center">0.510</td></tr>
<tr><td align="center"><b>Monet-HD 4.1B</b></td><td align="center">0.385</td><td align="center">0.603</td><td align="center">0.545</td><td align="center">0.742</td><td align="center">0.463</td><td align="center">0.412</td><td align="center">0.588</td><td align="center">0.545</td><td align="center">0.535</td></tr>
<tr><td align="center"><b>Monet-VD 4.1B</b></td><td align="center">0.398</td><td align="center">0.625</td><td align="center">0.564</td><td align="center">0.761</td><td align="center">0.470</td><td align="center">0.438</td><td align="center">0.619</td><td align="center">0.525</td><td align="center">0.550</td></tr>
</tbody>
</table>

### Detoxification

Detoxification task performances are evaluated on the [Monet-VD 1.4B](MonetLLM/monet-vd-1.4B-100BT-hf) model.

#### RealToxicityPrompts

<table>
  <thead>
    <tr>
      <th rowspan="2">Masking<br/>Threshold</th>
      <th rowspan="2">Masking<br/>Ratio</th>
      <th colspan="2">Exp. Max. Toxicity</th>
      <th colspan="2">Toxicity Prob.</th>
      <th rowspan="2">Avg. Perf.</th>
    </tr>
    <tr>
      <th>Toxic</th>
      <th>Non-Toxic</th>
      <th>Toxic</th>
      <th>Non-Toxic</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td align="center">–</td>
      <td align="center">–</td>
      <td align="center">0.795</td>
      <td align="center">0.269</td>
      <td align="center">0.926</td>
      <td align="center">0.08</td>
      <td align="center"><b>0.478</b></td>
    </tr>
    <tr>
      <td align="center">0.2</td>
      <td align="center">1.0%</td>
      <td align="center">0.767</td>
      <td align="center">0.268</td>
      <td align="center">0.909</td>
      <td align="center">0.07</td>
      <td align="center"><b>0.479</b></td>
    </tr>
    <tr>
      <td align="center">0.1</td>
      <td align="center">4.1%</td>
      <td align="center">0.657</td>
      <td align="center">0.270</td>
      <td align="center">0.768</td>
      <td align="center">0.08</td>
      <td align="center"><b>0.478</b></td>
    </tr>
    <tr>
      <td align="center">0.05</td>
      <td align="center">14.4%</td>
      <td align="center"><b>0.552</b></td>
      <td align="center"><b>0.256</b></td>
      <td align="center"><b>0.564</b></td>
      <td align="center"><b>0.05</b></td>
      <td align="center">0.467</td>
    </tr>
  </tbody>
</table>

#### ToxiGen
<table>
  <thead>
    <tr>
      <th rowspan="2">Masking<br/>Threshold</th>
      <th rowspan="2">Masking<br/>Ratio</th>
      <th colspan="2">RoBERTa Score</th>
      <th rowspan="2">Avg. Perf.</th>
    </tr>
    <tr>
      <th>Hate</th>
      <th>Neutral</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td align="center">–</td>
      <td align="center">–</td>
      <td align="center">0.642</td>
      <td align="center">0.035</td>
      <td align="center"><b>0.478</b></td>
    </tr>
    <tr>
      <td align="center">0.2</td>
      <td align="center">1.4%</td>
      <td align="center">0.643</td>
      <td align="center">0.033</td>
      <td align="center"><b>0.478</b></td>
    </tr>
    <tr>
      <td align="center">0.1</td>
      <td align="center">5.4%</td>
      <td align="center">0.504</td>
      <td align="center">0.028</td>
      <td align="center">0.473</td>
    </tr>
    <tr>
      <td align="center">0.05</td>
      <td align="center">15.0%</td>
      <td align="center"><b>0.430</b></td>
      <td align="center"><b>0.027</b></td>
      <td align="center">0.455</td>
    </tr>
  </tbody>
</table>


## Examples

### Text Generation

```python
from transformers import pipeline

model_name = "MonetLLM/monet-vd-1.4B-100BT-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
print(pipe("The key to life is", max_new_tokens=20, do_sample=True)[0]["generated_text"])
```

### Code Generation

```python
from transformers import pipeline

model_name = "MonetLLM/codemonet-vd-1.4B-100BT-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

text = '''
def print_len(x: str):
    """For a given string x, print the length of x."""
'''
print(pipe(text, max_new_tokens=10)[0]["generated_text"].split("\n\n")[0])
```

### Chat Completion

```python
from transformers import pipeline

model_name = "MonetLLM/codemonet-vd-1.4B-100BT-chat-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

text = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hi! How are you?"}],
    add_generation_prompt=True,
    tokenize=False,
)
print(pipe(text, max_new_tokens=30, do_sample=True)[0]["generated_text"])
```

### Using vLLM

The custom implementation of vLLM is provided in [the repository](https://github.com/dmis-lab/Monet/blob/main/modeling_monet_vllm.py).

```python
from vllm import LLM, ModelRegistry, SamplingParams
from modeling_monet_vllm import MonetForCausalLM

# Register Monet architecture with vLLM
ModelRegistry.register_model("MonetForCausalLM", MonetForCausalLM)

model = LLM(
    "MonetLLM/monet-vd-1.4B-100BT-hf",
    trust_remote_code=True,
    dtype="bfloat16",
    gpu_memory_utilization=0.8
)
sampling_params = SamplingParams(max_tokens=20, temperature=1.0)
print(model.generate("The key to life is", sampling_params)[0].outputs[0].text)
```

## Training
### Model
- Architecture: Monet
- Pretraining tokens: 100B
- Precision: bfloat16
### Hardware
- TPUs: TPU-v4-64 Pod Slice (supported by [TRC Program](https://sites.research.google/trc/about/))
### Software
- Training Framework: [Jax](https://github.com/jax-ml/jax), [Flax](https://github.com/google/flax)

## Intended Use

### Primary Intended Uses
 This model is designed to advance research on language models and serve as a foundational component for generative AI-driven functionalities. Its primary applications, mostly in English, include:

- Mechanistic interpretability research for language models
- Text generation with enhanced interpretability
- Code generation (CodeMonet variant)
- Chat completion (instruction-tuned variant)
- Vision-language tasks (VisionMonet variant)

### Out-of-Scope Uses
 This model has not been explicitly developed or tested for all potential downstream applications. Therefore:

1. Limitations & Mitigations: Developers should be mindful of common language model limitations, and thoroughly evaluate and mitigate risks regarding accuracy, safety, and fairness—especially in high-stakes or high-risk scenarios.
2. Legal & Regulatory Compliance: Developers must comply with any applicable laws and regulations (e.g., privacy, trade compliance), taking into account the model’s English-focused training (refer to <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu">FineWeb-Edu</a>).
3. No License Modification: Nothing in this Model Card modifies or restricts the license under which this model is released.
4. Unsupported Programming Languages: Programming in languages not covered by <a href="https://huggingface.co/datasets/bigcode/starcoderdata">StarCoderData</a>(CodeMonet variant) is not within the model’s intended scope.

## Model Architecture

Monet introduces a novel Mixture-of-Experts (MoE) architecture with several key innovations:

- Parameter-efficient expert decomposition: overall parameter count grows in proportion to the square root of the number of experts
- Fine-grained expert specialization: offers clear insight into model behavior
- Precise manipulation of knowledge: enables control over domain knowledge, programming language capabilities, and toxicity level.

## Ethical Considerations

### Transparency
- Designed specifically for enhanced interpretability
- Enables understanding of internal model behavior
- Allows tracking of knowledge attribution

### Control
- Supports toxicity mitigation
- Enables domain-specific knowledge control
- Maintains performance while adjusting behavior

## License and Usage
Monet is licensed under the Apache 2.0 license. The model is primarily intended for research and educational use. Important licensing notes:

- Instruction-tuned models have been fine-tuned using a dataset mix with outputs generated from third party models
- Research and educational use is encouraged
- Commercial use is subject to Apache 2.0 license terms

## Citation
```bibtex
@article{park2024monet,
      title={{Monet: Mixture of Monosemantic Experts for Transformers}}, 
      author={Jungwoo Park and Young Jin Ahn and Kee-Eung Kim and Jaewoo Kang},
      journal={arXiv preprint arXiv:2404.05567},
      year={2024}
}
```