metadata
license: apache-2.0
language:
- en
tags:
- moe
- olmo
- olmoe
co2_eq_emissions: 1
Model Summary
OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in August 2024 (0824) that has been adapted via SFT and DPO from OLMoE-1B-7B. It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.
- Code: https://github.com/allenai/OLMoE
- Paper:
- Logs: https://github.com/allenai/OLMoE/blob/main/logs/olmoe-dpo-logs.txt
Use
Install the transformers
& torch
libraries and run:
from transformers import OlmoeForCausalLM, AutoTokenizer
import torch
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
# Load different ckpts via passing e.g. `revision=step10000-tokens41B`
model = OlmoeForCausalLM.from_pretrained("OLMoE/OLMoE-1B-7B-Instruct").to(DEVICE)
tokenizer = AutoTokenizer.from_pretrained("OLMoE/OLMoE-1B-7B-Instruct")
message = [{"role": "user", "content": "Explain to me like I'm five what is Bitcoin."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
out = model.generate(**inputs, max_length=64)
print(tokenizer.decode(out[0]))
# > # Bitcoin is a digital currency that is created and held electronically. No one controls it. Bitcoins aren’t printed, like dollars or euros – they’re produced by people and businesses running computers all around the world, using software that solves mathematical
You can list all revisions/branches by installing huggingface-hub
& running:
from huggingface_hub import list_repo_refs
out = list_repo_refs("OLMoE/OLMoE-1B-7B-0824")
branches = [b.name for b in out.branches]
Important branches:
main
: Preference tuned via DPO model of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT (main
branch)no-load-balancing
: Ablation without load balancing loss during DPO starting from theno-load-balancing
branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFTnon-annealed
: Ablation starting from thenon-annealed
branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT which is an SFT of the pretraining checkpoint prior to annealing (branchstep1200000-tokens5033B
of https://hf.co/OLMoE/OLMoE-1B-7B-0824)
Citation
TODO