File size: 4,073 Bytes
7da33c0 b92f4df 26ffbde 4ebeafe e679a5b 4ebeafe 5d4ca24 26ffbde c4a6e66 a8a18a4 c8c5151 a8a18a4 d601354 a8a18a4 d601354 a8a18a4 d601354 a8a18a4 d601354 a8a18a4 d601354 a8a18a4 baa9fa2 a8a18a4 d601354 a8a18a4 d601354 baa9fa2 a8a18a4 baa9fa2 a8a18a4 c4a6e66 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
license: openrail
datasets:
- humarin/chatgpt-paraphrases
language:
- en
library_name: transformers
inference:
parameters:
num_beams: 5
num_beam_groups: 5
num_return_sequences: 5
repetition_penalty: 10.01
diversity_penalty: 3.01
no_repeat_ngram_size: 2
temperature: 0.7
max_length: 128
widget:
- text: What are the best places to see in New York?
example_title: New York tourist attractions
- text: When should I go to the doctor?
example_title: Doctor's time
- text: >-
Rammstein's album Mutter was recorded in the south of France in May and June
2000, and mixed in Stockholm in October of that year.
example_title: Rammstein's album Mutter
pipeline_tag: text2text-generation
---
This model was trained on our [ChatGPT paraphrase dataset](https://huggingface.co/datasets/humarin/chatgpt-paraphrases).
This dataset is based on the [Quora paraphrase question](https://www.kaggle.com/competitions/quora-question-pairs), texts from the [SQUAD 2.0](https://huggingface.co/datasets/squad_v2) and the [CNN news dataset](https://huggingface.co/datasets/cnn_dailymail).
This model is based on the T5-base model. We used "transfer learning" to get our model to generate paraphrases as well as ChatGPT. Now we can say that this is one of the best paraphrases of the Hugging Face.
[Kaggle](https://www.kaggle.com/datasets/vladimirvorobevv/chatgpt-paraphrases) link
**Deploying example:**
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")
model = AutoModelForSeq2SeqLM.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base").to(device)
def paraphrase(
question,
num_beams=5,
num_beam_groups=5,
num_return_sequences=5,
repetition_penalty=10.0,
diversity_penalty=3.0,
no_repeat_ngram_size=2,
temperature=0.7,
max_length=128
):
input_ids = tokenizer(
f'paraphrase: {question}',
return_tensors="pt", padding="longest",
max_length=max_length,
truncation=True,
).input_ids
outputs = model.generate(
input_ids, temperature=temperature, repetition_penalty=repetition_penalty,
num_return_sequences=num_return_sequences, no_repeat_ngram_size=no_repeat_ngram_size,
num_beams=num_beams, num_beam_groups=num_beam_groups,
max_length=max_length, diversity_penalty=diversity_penalty
)
res = tokenizer.batch_decode(outputs, skip_special_tokens=True)
return res
```
**Usage examples**
**Input:**
```python
text = 'What are the best places to see in New York?'
paraphrase(text)
```
**Output:**
```python
['Which places should I not miss when visiting New York?',
'What are the top-rated tourist destinations in New York?',
'Where should I go sightseeing in New York?',
'Can you suggest some must-see places in New York?',
'What are some must-see places in New York?']
```
**Input:**
```python
text = "Rammstein's album Mutter was recorded in the south of France in May and June 2000, and mixed in Stockholm in October of that year."
paraphrase(text)
```
**Output:**
```python
['In May and June 2000, Rammstein filmed the album Mutter in the south of France, with mixing taking place in Stockholm in October of that year.',
'The album Mutter by Rammstein was recorded in the south of France during May and June 2000, with subsequent mixing taking place in Stockholm in October of that year.',
'Mutter, the album by Rammstein, was recorded in the south of France during May and June 2000, with mixing taking place at Stockholm in October of that year.',
"Rammstein's album Mutter was produced during May and June 2000 in southern France, with mixing taking place in Stockholm from October.",
'In May and June 2000, Rammstein recorded Mutter in southern France, followed by mixing it in Stockholm in October of the same year.']
```
**Train parameters:**
```python
epochs = 3.5
batch_size = 64
max_length = 128
lr = 5e-5
batches_qty = 196465
betas = (0.9, 0.999)
eps = 1e-08
``` |