Skolkovo Institute of Science and Technology
commited on
Commit
·
6e32a9f
1
Parent(s):
aa9baf7
Update README.md
Browse files
README.md
CHANGED
@@ -32,10 +32,58 @@ I want to stop smoking during driving bicycle . 23:29 A <gerund> does not normal
|
|
32 |
|
33 |
```
|
34 |
|
|
|
|
|
35 |
### Data preprocessing
|
36 |
|
37 |
-
We lowercased the text and explicitly pointed out the error in the original text
|
|
|
|
|
|
|
38 |
|
|
|
|
|
39 |
|
40 |
|
41 |
-
## How to use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
```
|
34 |
|
35 |
+
Grammar termins are highlighted with '< ... >' marks and word examples - with '<< ... >>'
|
36 |
+
|
37 |
### Data preprocessing
|
38 |
|
39 |
+
We lowercased the text, split it from any punctuation, including task specific marks (<< >>) and explicitly pointed out the error in the original text using << >>.
|
40 |
+
|
41 |
+
```
|
42 |
+
the smoke < < flow > > < < my > > face . 10:17 When the < verb > < < flow > > is used as an < intransitive verb > to express '' to move in a stream '', a < preposition > needs to be placed to indicate the direction. ' to ' and ' towards ' are < prepositions > that indicate direction .
|
43 |
|
44 |
+
i want to stop smoking < < during > > driving bicycle . 23:29 a < gerund > does not normally follow the < preposition > < < during > > . think of an expression using the < conjunction > ' while ' instead of a < preposition > .
|
45 |
+
```
|
46 |
|
47 |
|
48 |
+
## How to use
|
49 |
+
|
50 |
+
```python
|
51 |
+
|
52 |
+
from transformers import T5ForConditionalGeneration, AutoTokenizer, PreTrainedTokenizerFast
|
53 |
+
|
54 |
+
text_with_error = 'I want to stop smoking during driving bicycle .'
|
55 |
+
error_span = '23:29'
|
56 |
+
|
57 |
+
off1, off2 = list(map(int,error_span.split(":")))
|
58 |
+
text_with_error_pointed = text_with_error [:off1] + "< < " + re.sub("\s+", " > > < < ", text_with_error [off1:off2].strip()) + " > > " + text_with_error[off2:]
|
59 |
+
text_with_error_pointed = re.sub("\s+", " ", text_with_error_pointed .strip()).lower()
|
60 |
+
|
61 |
+
tokenizer = AutoTokenizer.from_pretrained("SkolkovoInstitute/GenChal_2022_nigula")
|
62 |
+
model = T5ForConditionalGeneration.from_pretrained("SkolkovoInstitute/GenChal_2022_nigula").cuda();
|
63 |
+
model.eval();
|
64 |
+
|
65 |
+
def paraphrase(text, model, temperature=1.0, beams=3):
|
66 |
+
texts = [text] if isinstance(text, str) else text
|
67 |
+
inputs = tokenizer(texts, return_tensors='pt', padding=True)['input_ids'].to(model.device)
|
68 |
+
result = model.generate(
|
69 |
+
inputs,
|
70 |
+
# num_return_sequences=n or 1,
|
71 |
+
do_sample=False,
|
72 |
+
temperature=temperature,
|
73 |
+
repetition_penalty=1.1,
|
74 |
+
max_length=int(inputs.shape[1] * 3) ,
|
75 |
+
# bad_words_ids=[[2]], # unk
|
76 |
+
num_beams=beams,
|
77 |
+
)
|
78 |
+
texts = [tokenizer.decode(r, skip_special_tokens=True) for r in result]
|
79 |
+
if isinstance(text, str):
|
80 |
+
return texts[0]
|
81 |
+
return texts
|
82 |
+
|
83 |
+
|
84 |
+
paraphrase([pointed_example], model)
|
85 |
+
|
86 |
+
# expected output: ["a gerund > does not normally follow the preposition > during > >. think of an expression using the conjunction >'while'instead of a preposition >."]
|
87 |
+
|
88 |
+
|
89 |
+
```
|