Introduction
MoMo-72B-lora-1.8.6-DPO is trained via Direct Preference Optimization(DPO) from MoMo-72B-LoRA-V1.4 as its base model, with several optimizations in hyperparameters.
MoMo-72B-LoRA-V1.4 is trained via Supervised Fine-Tuning (SFT) using LoRA, with the QWEN-72B model as its base-model.
Note that we did not exploit any form of weight merge.
For leaderboard submission, the trained weight is realigned for compatibility with llama.
MoMo-72B is trained using Moreh's MoAI platform, which simplifies the training of large-scale models, and AMD's MI250 GPU.
Details
Used Librarys
- torch
- peft
Used Datasets
- slimorca
- truthy
- orca_dpo_pairs
- No other dataset was used
- No benchmark test set or the training set are used
- data contamination check result
Model | ARC | MMLU | TruthfulQA | GSM8K |
---|---|---|---|---|
V1.8.6(result < 0.1, %) | TBU | TBU | 0.73 | TBU |
Used Environments
- AMD MI250 & MoAI platform
- Please visit https://moreh.io/product for more information about MoAI platform
- Or, contact us directly [email protected]
How to use
# pip install transformers==4.35.2
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("moreh/MoMo-72B-lora-1.8.6-DPO")
model = AutoModelForCausalLM.from_pretrained(
"moreh/MoMo-72B-lora-1.8.6-DPO"
)
- Downloads last month
- 662
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.