metadata

license: apache-2.0
datasets:
  - Intel/orca_dpo_pairs
pipeline_tag: text-generation

DeciDPObyBB - a 7b DeciLM Finetune using DPO

Built by fine-tuning DeciLM-7B-Insruct using Intel Orca DPO Pairs

created by bhaiyabot

built for research and learning purposes!

usage:

message = [
    {"role": "system", "content": "You are a very helpful assistant chatbot that thinks step by step"},
    {"role": "user", "content": input}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)


sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=1,
    num_beams=5,
    max_length=1000,
    pad_token_id=tokenizer.eos_token_id,
)
print(sequences[0]['generated_text'])

@misc{DeciFoundationModels,
title = {DeciLM-7B-instruct},
author = {DeciAI Research Team},
year = {2023}
url={https://huggingface.co/Deci/DeciLM-7B-instruct},
}

@misc{rafailov2023direct,
      title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model}, 
      author={Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn},
      year={2023},
      eprint={2305.18290},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

more details to come soon