RaushanTurganbay HF staff commited on
Commit
7df8416
·
1 Parent(s): 7d3e531

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -1,3 +1,57 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - Anthropic/hh-rlhf
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
  ---
9
+
10
+
11
+ # GPT-2 Medium SFT and DPO on Anthropic-hh Dataset
12
+
13
+ This repository GPT-2 Medium model instruct tuned first on the Anthropic-hh dataset and then further aligned on the same dataset with DPO.
14
+
15
+ ## Model Information
16
+
17
+ - **Model Name:** RaushanTurganbay/GPT2_sft_and_dpo_tuned
18
+ - **Base Model:** GPT-2 Medium
19
+ - **Training Data:** Anthropic-hh dataset
20
+ - **Fine-Tuning Approach:** Direct Preference Optiization (DPO)
21
+
22
+ ## How to Use
23
+
24
+
25
+ ```python
26
+ import torch
27
+ from transformers import GPT2LMHeadModel, GPT2Tokenizer, StoppingCriteria, StoppingCriteriaList
28
+
29
+ tokenizer_dpo = GPT2Tokenizer.from_pretrained("RaushanTurganbay/GPT2_sft_and_dpo_tuned")
30
+ model_dpo = GPT2LMHeadModel.from_pretrained("RaushanTurganbay/GPT2_sft_and_dpo_tuned")
31
+
32
+ class StoppingCriteriaSub(StoppingCriteria):
33
+ def __init__(self, stops=[], encounters=1):
34
+ super().__init__()
35
+ self.stops = [stop.to("cuda") for stop in stops]
36
+ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
37
+ for stop in self.stops:
38
+ if torch.all((stop == input_ids[0][-len(stop):])).item():
39
+ return True
40
+ return False
41
+
42
+
43
+ def stopping_criteria(tokenizer, stop_words):
44
+ stop_words_ids = [tokenizer(stop_word, return_tensors='pt')['input_ids'].squeeze() for stop_word in stop_words]
45
+ stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops=stop_words_ids)])
46
+ return stopping_criteria
47
+
48
+
49
+ # Generate responses
50
+ stopping = stopping_criteria(tokenizer, ["\n\nHuman:"])
51
+ prompt = "\n\nHuman: {your_instruction}\n\nAssistant:"
52
+ inputs_dpo = tokenizer_dpo(prompt, return_tensors="pt")
53
+ outputs_dpo = model_dpo.generate(**inputs_dpo, stopping_criteria=stopping, max_length=150)
54
+
55
+ print("Model Response:", tokenizer_dpo.batch_decode(outputs_dpo))
56
+ ```
57
+