Syed-Hasan-8503
/

Phi-3-mini-4K-instruct-cpo-simpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Syed-Hasan-8503 commited on Jun 26

Commit

2896ef3

•

1 Parent(s): 4c0f5ef

Update README.md

Files changed (1) hide show

README.md +4 -6

README.md CHANGED Viewed

@@ -2,13 +2,13 @@
 license: apache-2.0
 ---
-# Phi-3-mini-128K-instruct with CPO-SimPO
 This repository contains the Phi-3-mini-128K-instruct model enhanced with the CPO-SimPO technique. CPO-SimPO combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO).
 ## Introduction
-Phi-3-mini-128K-instruct is a model optimized for instruction-based tasks. This approach has demonstrated notable improvements in key benchmarks, pushing the boundaries of AI preference learning.
 ### What is CPO-SimPO?
@@ -26,8 +26,6 @@ CPO-SimPO is a novel technique, which combines elements from CPO and SimPO:
 COMING SOON!
-- **TruthfulQA:** 56.19
 ### Key Improvements:
 - **Enhanced Model Performance:** Significant score improvements, particularly in GSM8K (up by 8.49 points!) and TruthfulQA (up by 2.07 points).
 - **Quality Control:** Improved generation of high-quality sequences through length normalization and reward margins.
@@ -54,12 +52,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 torch.random.manual_seed(0)
 model = AutoModelForCausalLM.from_pretrained(
-    "Syed-Hasan-8503/Phi-3-mini-128K-instruct-cpo-simpo",
     device_map="cuda",
     torch_dtype="auto",
     trust_remote_code=True,
 )
-tokenizer = AutoTokenizer.from_pretrained("Syed-Hasan-8503/Phi-3-mini-128K-instruct-cpo-simpo")
 messages = [
     {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},

 license: apache-2.0
 ---
+# Phi-3-mini-4K-instruct with CPO-SimPO
 This repository contains the Phi-3-mini-128K-instruct model enhanced with the CPO-SimPO technique. CPO-SimPO combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO).
 ## Introduction
+Phi-3-mini-4K-instruct is a model optimized for instruction-based tasks. This approach has demonstrated notable improvements in key benchmarks, pushing the boundaries of AI preference learning.
 ### What is CPO-SimPO?
 COMING SOON!
 ### Key Improvements:
 - **Enhanced Model Performance:** Significant score improvements, particularly in GSM8K (up by 8.49 points!) and TruthfulQA (up by 2.07 points).
 - **Quality Control:** Improved generation of high-quality sequences through length normalization and reward margins.
 torch.random.manual_seed(0)
 model = AutoModelForCausalLM.from_pretrained(
+    "Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo",
     device_map="cuda",
     torch_dtype="auto",
     trust_remote_code=True,
 )
+tokenizer = AutoTokenizer.from_pretrained("Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo")
 messages = [
     {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},