Michal Pleban
commited on
Commit
•
0954e04
1
Parent(s):
45c6fda
First model upload
Browse files
README.md
CHANGED
@@ -1,3 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
|
|
|
|
3 |
---
|
|
|
1 |
+
### About the model
|
2 |
+
|
3 |
+
The model has been trained on a dataset containing [264519 sentences with UK English spelling](https://www.englishvoice.ai/p/uk-to-us/ "264519 sentences with UK English spelling"), along with their UK English equivalent.
|
4 |
+
|
5 |
+
The purpose of the model is to rewrite sentences from UK English to US English. It is capable not only of changing the spelling of words (such as "colour" to "color") but also changes the vocabulary appropriately (for example, "underground" to "subway", "solicitor" to "lawyer" and so on).
|
6 |
+
|
7 |
+
### Generation examples
|
8 |
+
|
9 |
+
| Input | Output |
|
10 |
+
| :------------ | :------------ |
|
11 |
+
| My favourite colour is yellow. | My favorite color is yellow. |
|
12 |
+
| I saw a bloke in yellow trainers at the underground station. | I saw a guy in yellow sneakers at the subway station. |
|
13 |
+
| You could have got hurt! | You could have gotten hurt! |
|
14 |
+
|
15 |
+
### The dataset
|
16 |
+
|
17 |
+
The dataset was developed by English Voice AI Labs. You can download it from our website:
|
18 |
+
[https://www.EnglishVoice.ai/](https://www.EnglishVoice.ai/ "https://www.EnglishVoice.ai/")
|
19 |
+
|
20 |
+
### Sample code
|
21 |
+
|
22 |
+
Sample Python code:
|
23 |
+
|
24 |
+
```python
|
25 |
+
import torch
|
26 |
+
from transformers import T5ForConditionalGeneration,T5Tokenizer
|
27 |
+
|
28 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
29 |
+
|
30 |
+
model = T5ForConditionalGeneration.from_pretrained("EnglishVoice/t5-base-uk-to-us-english")
|
31 |
+
tokenizer = T5Tokenizer.from_pretrained("EnglishVoice/t5-base-uk-to-us-english")
|
32 |
+
model = model.to(device)
|
33 |
+
|
34 |
+
input = "My favourite colour is yellow."
|
35 |
+
|
36 |
+
text = "UK to US: " + input
|
37 |
+
encoding = tokenizer.encode_plus(text, return_tensors = "pt")
|
38 |
+
input_ids = encoding["input_ids"].to(device)
|
39 |
+
attention_masks = encoding["attention_mask"].to(device)
|
40 |
+
beam_outputs = model.generate(
|
41 |
+
input_ids = input_ids,
|
42 |
+
attention_mask = attention_masks,
|
43 |
+
early_stopping = True,
|
44 |
+
)
|
45 |
+
|
46 |
+
result = tokenizer.decode(beam_outputs[0], skip_special_tokens=True)
|
47 |
+
print(result)
|
48 |
+
|
49 |
+
```
|
50 |
+
|
51 |
+
Output:
|
52 |
+
|
53 |
+
```My favorite color is yellow.```
|
54 |
+
|
55 |
---
|
56 |
+
language:
|
57 |
+
- en
|
58 |
+
tags:
|
59 |
+
- text2text-generation
|
60 |
+
- paraphrase-generation
|
61 |
license: apache-2.0
|
62 |
+
widget:
|
63 |
+
- text: "UK to US: My favourite colour is yellow."
|
64 |
---
|