Text Generation
Transformers
Safetensors
English
olmoe
Mixture of Experts
olmo
conversational
Inference Endpoints
File size: 2,400 Bytes
6cbeb1e
3fd2261
 
 
 
 
 
 
 
6cbeb1e
 
b7e57d5
6cbeb1e
3fd2261
6cbeb1e
3fd2261
6cbeb1e
3fd2261
 
 
6cbeb1e
3fd2261
6cbeb1e
3fd2261
6cbeb1e
3fd2261
 
 
6cbeb1e
3fd2261
6cbeb1e
fb25809
 
 
3fd2261
 
 
 
 
 
6cbeb1e
fb25809
8419a6d
fb25809
8419a6d
fb25809
6cbeb1e
3fd2261
6cbeb1e
3fd2261
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: apache-2.0
language:
- en
tags:
- moe
- olmo
- olmoe
co2_eq_emissions: 1
---

<img alt="OLMoE Logo." src="olmoe-logo.png" width="250px">

# Model Summary

> OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in August 2024 (0824) that has been adapted via SFT and DPO from [OLMoE-1B-7B](https://hf.co/OLMoE/OLMoE-1B-7B-0824). It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.

- Code: https://github.com/allenai/OLMoE
- Paper:
- Logs: https://github.com/allenai/OLMoE/blob/main/logs/olmoe-dpo-logs.txt

# Use

Install the `transformers` & `torch` libraries and run:

```python
from transformers import OlmoeForCausalLM, AutoTokenizer
import torch

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load different ckpts via passing e.g. `revision=kto`
model = OlmoeForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-Instruct").to(DEVICE)
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMoE-1B-7B-Instruct")
message = [{"role": "user", "content": "Explain to me like I'm five what is Bitcoin."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
out = model.generate(**inputs, max_length=64)
print(tokenizer.decode(out[0]))
# > # Bitcoin is a digital currency that is created and held electronically. No one controls it. Bitcoins aren’t printed, like dollars or euros – they’re produced by people and businesses running computers all around the world, using software that solves mathematical
```

Branches:
- `main`: Preference tuned via DPO model of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT (`main` branch)
- `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT
- `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/OLMoE/OLMoE-1B-7B-0824)
- `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.

# Citation

```bibtex
TODO
```