--- license: cc-by-nc-4.0 base_model: PhigRange-2.7B-slerp tags: - generated_from_trainer - DPO - instruct - finetune - chatml - gpt4 - synthetic data - distillation model-index: - name: PhigRange-DPO results: [] datasets: - mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha language: - en library_name: transformers pipeline_tag: text-generation --- # PhigRange-DPO ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660cfe98280a82e38fe4ef49/1aDHvNk5pebHacGnzaHv9.png) PhigRange-DPO is a DPO fine-tuned of [johnsnowlabs/PhigRange-2.7B-Slerp](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha) preference dataset. The model has been trained for for 1080 steps. ## 🏆 Evaluation results ### Coming Soon ## 💻 Usage ```python !pip install -qU transformers accelerate from transformers import AutoTokenizer import transformers import torch model = "johnsnowlabs/PhigRange-DPO" messages = [{"role": "user", "content": "What is a large language model?"}] tokenizer = AutoTokenizer.from_pretrained(model) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) ``` ## Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-04 - train_batch_size: 1 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: AdamOptimizer32bit - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1080 ## Framework versions - Transformers 4.38.0.dev0 - Pytorch 2.1.2+cu118 - Datasets 2.17.0 - Tokenizers 0.15.0