Makual commited on
Commit
a8a18a4
1 Parent(s): f66eb11

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -1
README.md CHANGED
@@ -5,4 +5,84 @@ datasets:
5
  language:
6
  - en
7
  library_name: transformers
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  language:
6
  - en
7
  library_name: transformers
8
+ ---
9
+ This model was trained on our [ChatGPT paraphrase dataset](https://huggingface.co/datasets/humarin/chatgpt-paraphrases).
10
+
11
+
12
+
13
+ This dataset is based on the [Quora paraphrase question](https://www.kaggle.com/competitions/quora-question-pairs), texts from the [SQUAD 2.0](https://huggingface.co/datasets/squad_v2) and the [CNN news dataset](https://huggingface.co/datasets/cnn_dailymail).
14
+
15
+ This model is based on the T5-base model. We used "transfer learning" to get our model to generate paraphrases as well as ChatGPT. Now we can say that this is one of the best paraphrases of the Hugging Face.
16
+
17
+ [Kaggle](https://www.kaggle.com/datasets/vladimirvorobevv/chatgpt-paraphrases) link
18
+
19
+ **Deploying example:**
20
+ ```python
21
+ device = "cuda"
22
+
23
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
24
+
25
+ tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")
26
+
27
+ model = AutoModelForSeq2SeqLM.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base").to(device)
28
+
29
+ def paraphrase(text, max_length=128, num_return_sequences=5, num_beams=25, temperature=0.7):
30
+ input_ids = tokenizer(
31
+ f'paraphrase: {text}',
32
+ return_tensors="pt", padding="longest",
33
+ max_length=max_length,
34
+ truncation=True,
35
+ ).input_ids.to(device)
36
+
37
+ outputs = model.generate(
38
+ input_ids, temperature=temperature, repetition_penalty=1.5,
39
+ num_return_sequences=num_return_sequences, no_repeat_ngram_size=5, num_beams=num_beams, max_length=max_length
40
+ )
41
+
42
+ res = tokenizer.batch_decode(outputs, skip_special_tokens=True)
43
+
44
+ return res
45
+
46
+ ```
47
+
48
+ **Usage examples**
49
+
50
+ **Input:**
51
+ ```python
52
+ text = 'What are the best places to see in New York?'
53
+ paraphrase(text)
54
+ ```
55
+ **Output:**
56
+ ```python
57
+ ['What are some of the must-visit places in New York?',
58
+ 'Which places should I not miss when visiting New York?',
59
+ 'Which are the top tourist destinations in New York?',
60
+ 'Which places should I not miss while visiting New York?',
61
+ 'What are some of the must-visit locations in New York?']
62
+ ```
63
+
64
+ **Input:**
65
+ ```python
66
+ text = "This Year's Model is the second studio album by the English singer-songwriter Elvis Costello (pictured), released on 17 March 1978 through Radar Records with his new backing band, the Attractions. It was recorded at Eden Studios in late 1977 and early 1978."
67
+ paraphrase(text)
68
+ ```
69
+ **Output:**
70
+ ```python
71
+ ["The English singer-songwriter Elvis Costello's second studio album, This Year's Model, was released on 17 March 1978 through Radar Records with his new backing band, the Attractions. It was recorded at Eden Studios in late 1977 and early 1978.",
72
+ "This Year's Model, the second studio album of Elvis Costello (pictured), was released on 17 March 1978 through Radar Records with his new backing band, the Attractions. It was recorded at Eden Studios in late 1977 and early 1978.",
73
+ "The English singer-songwriter Elvis Costello's second studio album, This Year's Model, was released on March 17, 1978, through Radar Records with his new backing band, the Attractions. It was recorded at Eden Studios in late 1977 and early 1978.",
74
+ "The English singer-songwriter Elvis Costello (pictured) released his second studio album, This Year's Model, on 17 March 1978 through Radar Records with his new backing band, the Attractions, which was recorded at Eden Studios in late 1977 and early 1978.",
75
+ "The English singer-songwriter Elvis Costello (pictured) released his second studio album, This Year's Model, on March 17, 1978, through Radar Records with his new backing band, the Attractions. It was recorded at Eden Studios in late 1977 and early 1978."]
76
+ ```
77
+
78
+
79
+ **Train parameters:**
80
+ ```python
81
+ epochs = 1
82
+ batch_size = 128
83
+ lr = 5e-5
84
+ batches_qty = 82849
85
+ betas = (0.9, 0.999)
86
+ eps = 1e-08
87
+ ```
88
+