Text Generation
Transformers
Safetensors
English
olmoe
Mixture of Experts
olmo
conversational
Inference Endpoints
File size: 2,455 Bytes
6cbeb1e
3fd2261
 
 
 
 
 
 
 
6cbeb1e
 
b7e57d5
6cbeb1e
3fd2261
6cbeb1e
3fd2261
6cbeb1e
3fd2261
 
 
6cbeb1e
3fd2261
6cbeb1e
3fd2261
6cbeb1e
3fd2261
 
 
6cbeb1e
3fd2261
6cbeb1e
3fd2261
 
 
 
 
 
 
 
 
6cbeb1e
3fd2261
 
 
 
 
 
6cbeb1e
3fd2261
8419a6d
 
 
6cbeb1e
3fd2261
6cbeb1e
3fd2261
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: apache-2.0
language:
- en
tags:
- moe
- olmo
- olmoe
co2_eq_emissions: 1
---

<img alt="OLMoE Logo." src="olmoe-logo.png" width="250px">

# Model Summary

> OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in August 2024 (0824) that has been adapted via SFT and DPO from [OLMoE-1B-7B](https://hf.co/OLMoE/OLMoE-1B-7B-0824). It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.

- Code: https://github.com/allenai/OLMoE
- Paper:
- Logs: https://github.com/allenai/OLMoE/blob/main/logs/olmoe-dpo-logs.txt

# Use

Install the `transformers` & `torch` libraries and run:

```python
from transformers import OlmoeForCausalLM, AutoTokenizer
import torch

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load different ckpts via passing e.g. `revision=step10000-tokens41B`
model = OlmoeForCausalLM.from_pretrained("OLMoE/OLMoE-1B-7B-Instruct").to(DEVICE)
tokenizer = AutoTokenizer.from_pretrained("OLMoE/OLMoE-1B-7B-Instruct")
message = [{"role": "user", "content": "Explain to me like I'm five what is Bitcoin."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
out = model.generate(**inputs, max_length=64)
print(tokenizer.decode(out[0]))
# > # Bitcoin is a digital currency that is created and held electronically. No one controls it. Bitcoins aren’t printed, like dollars or euros – they’re produced by people and businesses running computers all around the world, using software that solves mathematical
```

You can list all revisions/branches by installing `huggingface-hub` & running:
```python
from huggingface_hub import list_repo_refs
out = list_repo_refs("OLMoE/OLMoE-1B-7B-0824")
branches = [b.name for b in out.branches]
```

Important branches:
- `main`: Preference tuned via DPO model of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT (`main` branch)
- `no-load-balancing`: Ablation without load balancing loss during DPO starting from the `no-load-balancing` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT
- `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/OLMoE/OLMoE-1B-7B-0824)

# Citation

```bibtex
TODO
```