DeciDPObyBB / README.md
rohansolo's picture
Update README.md
81b6112
---
license: apache-2.0
datasets:
- Intel/orca_dpo_pairs
pipeline_tag: text-generation
---
# DeciDPObyBB - a 7b DeciLM Finetune using DPO
Built by fine-tuning [DeciLM-7B-Insruct](https://huggingface.co/Deci/DeciLM-7B-instruct) using [Intel Orca DPO Pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
created by [bhaiyabot](bhaiyabot.in)
built for research and learning purposes!
usage:
```
message = [
{"role": "system", "content": "You are a very helpful assistant chatbot that thinks step by step"},
{"role": "user", "content": input}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
sequences = pipeline(
prompt,
do_sample=True,
temperature=1,
num_beams=5,
max_length=1000,
pad_token_id=tokenizer.eos_token_id,
)
print(sequences[0]['generated_text'])
```
```bibtex
@misc{DeciFoundationModels,
title = {DeciLM-7B-instruct},
author = {DeciAI Research Team},
year = {2023}
url={https://huggingface.co/Deci/DeciLM-7B-instruct},
}
@misc{rafailov2023direct,
title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model},
author={Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn},
year={2023},
eprint={2305.18290},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
more details to come soon