Monet: Mixture of Monosemantic Experts for Transformers

Model Summary

Monet introduces a novel approach for improving mechanistic interpretability in large language models (LLMs) using a Sparse Mixture-of-Experts (SMoE) architecture with 262,144 experts. By integrating sparse dictionary learning directly into end-to-end pretraining, Monet tackles the core issue of polysemanticity—where single neurons encode multiple unrelated concepts—while preserving overall model performance.

Resources and Technical Documentation

Available Checkpoints

Base Models

Model Dataset #Params #Tokens Checkpoint Demo
Monet-VD FineWeb-Edu 850M 100BT monet-vd-850M-100BT-hf
1.4B 100BT monet-vd-1.4B-100BT-hf Viewer
4.1B 100BT monet-vd-4.1B-100BT-hf
StarCoderData 1.4B 100BT codemonet-vd-1.4B-100BT-hf Viewer
Monet-HD FineWeb-Edu 850M 100BT monet-hd-850M-100BT-hf
1.4B 100BT monet-hd-1.4B-100BT-hf
4.1B 100BT monet-hd-4.1B-100BT-hf

Instruction-Tuned Models

Model Purpose Recipe #Params Checkpoint
Monet-VD Chat Completion SmolLM 1.4B monet-vd-1.4B-100BT-chat-hf
Vision-Language Model LLaVA 1.6B visionmonet-vd-1.4B-100BT-hf

Evaluation

Open-Ended LLM Benchmarks

ModelMMLUARCWGPIQASIQAOBQAHSCSQAAvg.
0-shot
Monet-HD 850M0.3200.4600.5060.6990.4160.3640.4650.3370.446
Monet-VD 850M0.3280.4560.5300.7080.4170.3560.4880.3430.453
Monet-HD 1.4B0.3380.4710.5380.7140.4180.3820.5010.3390.463
Monet-VD 1.4B0.3520.4950.5220.7270.4230.4180.5290.3630.478
Monet-HD 4.1B0.3750.5580.5600.7410.4270.4140.5710.3790.503
Monet-VD 4.1B0.3800.5470.5570.7510.4370.4240.6040.3890.511
5-shot
Monet-HD 850M0.3320.5370.5100.6970.4090.3460.4790.4200.466
Monet-VD 850M0.3410.5480.5200.7090.4370.3680.5040.4540.485
Monet-HD 1.4B0.3520.5440.5300.7200.4320.3600.5180.4410.487
Monet-VD 1.4B0.3600.5470.5260.7300.4410.4220.5510.5010.510
Monet-HD 4.1B0.3850.6030.5450.7420.4630.4120.5880.5450.535
Monet-VD 4.1B0.3980.6250.5640.7610.4700.4380.6190.5250.550

Detoxification

Detoxification task performances are evaluated on the Monet-VD 1.4B model.

RealToxicityPrompts

Masking
Threshold
Masking
Ratio
Exp. Max. Toxicity Toxicity Prob. Avg. Perf.
Toxic Non-Toxic Toxic Non-Toxic
0.795 0.269 0.926 0.08 0.478
0.2 1.0% 0.767 0.268 0.909 0.07 0.479
0.1 4.1% 0.657 0.270 0.768 0.08 0.478
0.05 14.4% 0.552 0.256 0.564 0.05 0.467

ToxiGen

Masking
Threshold
Masking
Ratio
RoBERTa Score Avg. Perf.
Hate Neutral
0.642 0.035 0.478
0.2 1.4% 0.643 0.033 0.478
0.1 5.4% 0.504 0.028 0.473
0.05 15.0% 0.430 0.027 0.455

Examples

Text Generation

from transformers import pipeline

model_name = "MonetLLM/monet-vd-1.4B-100BT-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
print(pipe("The key to life is", max_new_tokens=20, do_sample=True)[0]["generated_text"])

Code Generation

from transformers import pipeline

model_name = "MonetLLM/codemonet-vd-1.4B-100BT-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

text = '''
def print_len(x: str):
    """For a given string x, print the length of x."""
'''
print(pipe(text, max_new_tokens=10)[0]["generated_text"].split("\n\n")[0])

Chat Completion

from transformers import pipeline

model_name = "MonetLLM/codemonet-vd-1.4B-100BT-chat-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

text = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hi! How are you?"}],
    add_generation_prompt=True,
    tokenize=False,
)
print(pipe(text, max_new_tokens=30, do_sample=True)[0]["generated_text"])

Using vLLM

The custom implementation of vLLM is provided in the repository.

from vllm import LLM, ModelRegistry, SamplingParams
from modeling_monet_vllm import MonetForCausalLM

# Register Monet architecture with vLLM
ModelRegistry.register_model("MonetForCausalLM", MonetForCausalLM)

model = LLM(
    "MonetLLM/monet-vd-1.4B-100BT-hf",
    trust_remote_code=True,
    dtype="bfloat16",
    gpu_memory_utilization=0.8
)
sampling_params = SamplingParams(max_tokens=20, temperature=1.0)
print(model.generate("The key to life is", sampling_params)[0].outputs[0].text)

Training

Model

  • Architecture: Monet
  • Pretraining tokens: 100B
  • Precision: bfloat16

Hardware

Software

Intended Use

Primary Intended Uses

This model is designed to advance research on language models and serve as a foundational component for generative AI-driven functionalities. Its primary applications, mostly in English, include:

  • Mechanistic interpretability research for language models
  • Text generation with enhanced interpretability
  • Code generation (CodeMonet variant)
  • Chat completion (instruction-tuned variant)
  • Vision-language tasks (VisionMonet variant)

Out-of-Scope Uses

This model has not been explicitly developed or tested for all potential downstream applications. Therefore:

  1. Limitations & Mitigations: Developers should be mindful of common language model limitations, and thoroughly evaluate and mitigate risks regarding accuracy, safety, and fairness—especially in high-stakes or high-risk scenarios.
  2. Legal & Regulatory Compliance: Developers must comply with any applicable laws and regulations (e.g., privacy, trade compliance), taking into account the model’s English-focused training (refer to FineWeb-Edu).
  3. No License Modification: Nothing in this Model Card modifies or restricts the license under which this model is released.
  4. Unsupported Programming Languages: Programming in languages not covered by StarCoderData(CodeMonet variant) is not within the model’s intended scope.

Model Architecture

Monet introduces a novel Mixture-of-Experts (MoE) architecture with several key innovations:

  • Parameter-efficient expert decomposition: overall parameter count grows in proportion to the square root of the number of experts
  • Fine-grained expert specialization: offers clear insight into model behavior
  • Precise manipulation of knowledge: enables control over domain knowledge, programming language capabilities, and toxicity level.

Ethical Considerations

Transparency

  • Designed specifically for enhanced interpretability
  • Enables understanding of internal model behavior
  • Allows tracking of knowledge attribution

Control

  • Supports toxicity mitigation
  • Enables domain-specific knowledge control
  • Maintains performance while adjusting behavior

License and Usage

Monet is licensed under the Apache 2.0 license. The model is primarily intended for research and educational use. Important licensing notes:

  • Instruction-tuned models have been fine-tuned using a dataset mix with outputs generated from third party models
  • Research and educational use is encouraged
  • Commercial use is subject to Apache 2.0 license terms

Citation

@article{park2024monet,
      title={{Monet: Mixture of Monosemantic Experts for Transformers}}, 
      author={Jungwoo Park and Young Jin Ahn and Kee-Eung Kim and Jaewoo Kang},
      journal={arXiv preprint arXiv:2404.05567},
      year={2024}
}
Downloads last month
2
Safetensors
Model size
1.47B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.