Model Card for llama3-8b-instruct-orpo-ko

Model Summary

This model is a fine-tuned version of the meta-llama/Meta-Llama-3-8B-Instruct using the odds ratio preference optimization (ORPO).

It has been trained to perform NLP tasks in Korean.

Model Details

Model Description

  • Developed by: Sungjoo Byun (Grace Byun)
  • Language(s) (NLP): Korean
  • License: Apache 2.0
  • Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct

Training Details

Training Data

The model was trained using the dataset heegyu/hh-rlhf-ko. We appreciate heegyu for sharing this valuable resource.

Training Procedure

We applied ORPO β to llama3-8b-instruct. The training was conducted on an A100 GPU with 80GB of memory.

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")
model = AutoModelForCausalLM.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")

Citations

Please cite the ORPO paper and our model as follows:

@misc{hong2024orpo,
      title={ORPO: Monolithic Preference Optimization without Reference Model}, 
      author={Jiwoo Hong and Noah Lee and James Thorne},
      year={2024},
      eprint={2403.07691},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{byun,
  author = {Sungjoo Byun},
  title = {llama3-8b-orpo-ko},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/SungJoo/llama3-8b-instruct-orpo-ko}}
}

Contact

For any questions or issues, please contact [email protected].

Downloads last month
16
Safetensors
Model size
8.03B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train SungJoo/llama3-8b-instruct-orpo-ko