metadata

license: apache-2.0
language:
  - en
tags:
  - moe
  - olmo
  - olmoe
co2_eq_emissions: 1

Model Summary

OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in August 2024 (0824) that has been adapted via SFT and DPO from OLMoE-1B-7B. It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.

Code: https://github.com/allenai/OLMoE
Paper:
Logs: https://github.com/allenai/OLMoE/blob/main/logs/olmoe-dpo-logs.txt

Use

Install the transformers & torch libraries and run:

from transformers import OlmoeForCausalLM, AutoTokenizer
import torch

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load different ckpts via passing e.g. `revision=step10000-tokens41B`
model = OlmoeForCausalLM.from_pretrained("OLMoE/OLMoE-1B-7B-Instruct").to(DEVICE)
tokenizer = AutoTokenizer.from_pretrained("OLMoE/OLMoE-1B-7B-Instruct")
message = [{"role": "user", "content": "Explain to me like I'm five what is Bitcoin."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
out = model.generate(**inputs, max_length=64)
print(tokenizer.decode(out[0]))
# > # Bitcoin is a digital currency that is created and held electronically. No one controls it. Bitcoins aren’t printed, like dollars or euros – they’re produced by people and businesses running computers all around the world, using software that solves mathematical

You can list all revisions/branches by installing huggingface-hub & running:

from huggingface_hub import list_repo_refs
out = list_repo_refs("OLMoE/OLMoE-1B-7B-0824")
branches = [b.name for b in out.branches]

Important branches:

main: Preference tuned via DPO model of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT (main branch)
no-load-balancing: Ablation without load balancing loss during DPO starting from the no-load-balancing branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT
non-annealed: Ablation starting from the non-annealed branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch step1200000-tokens5033B of https://hf.co/OLMoE/OLMoE-1B-7B-0824)

Citation

TODO