|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Intel/orca_dpo_pairs |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# DeciDPObyBB - a 7b DeciLM Finetune using DPO |
|
|
|
Built by fine-tuning [DeciLM-7B-Insruct](https://huggingface.co/Deci/DeciLM-7B-instruct) using [Intel Orca DPO Pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) |
|
|
|
created by [bhaiyabot](bhaiyabot.in) |
|
|
|
built for research and learning purposes! |
|
|
|
usage: |
|
|
|
``` |
|
message = [ |
|
{"role": "system", "content": "You are a very helpful assistant chatbot that thinks step by step"}, |
|
{"role": "user", "content": input} |
|
] |
|
tokenizer = AutoTokenizer.from_pretrained(new_model) |
|
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) |
|
|
|
|
|
sequences = pipeline( |
|
prompt, |
|
do_sample=True, |
|
temperature=1, |
|
num_beams=5, |
|
max_length=1000, |
|
pad_token_id=tokenizer.eos_token_id, |
|
) |
|
print(sequences[0]['generated_text']) |
|
``` |
|
|
|
```bibtex |
|
@misc{DeciFoundationModels, |
|
title = {DeciLM-7B-instruct}, |
|
author = {DeciAI Research Team}, |
|
year = {2023} |
|
url={https://huggingface.co/Deci/DeciLM-7B-instruct}, |
|
} |
|
|
|
@misc{rafailov2023direct, |
|
title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model}, |
|
author={Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn}, |
|
year={2023}, |
|
eprint={2305.18290}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
|
|
|
|
|
|
more details to come soon |