Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,19 @@ The tool can be found [here](https://github.com/ayaka14732/lihkg-scraper).
|
|
35 |
Please also check out the [Bart model](https://huggingface.co/Ayaka/bart-base-cantonese) created by her.
|
36 |
|
37 |
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
Please refer to the [script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling)
|
41 |
provided by Huggingface.
|
@@ -44,8 +56,6 @@ provided by Huggingface.
|
|
44 |
The model was trained for 400,000 steps with batch size 5 (~2epoches) on 2 NVIDIA Quadro RTX6000 for around 40 hours at the Research Computing Services of Imperial College London.
|
45 |
|
46 |
|
47 |
-
|
48 |
-
|
49 |
### How to use it?
|
50 |
```
|
51 |
from transformers import AutoTokenizer
|
@@ -62,6 +72,7 @@ string = output[0]['generated_text'].replace(' ', '')
|
|
62 |
print(string)
|
63 |
```
|
64 |
|
|
|
65 |
### Framework versions
|
66 |
|
67 |
- Transformers 4.26.0.dev0
|
|
|
35 |
Please also check out the [Bart model](https://huggingface.co/Ayaka/bart-base-cantonese) created by her.
|
36 |
|
37 |
|
38 |
+
|
39 |
+
### Limitations
|
40 |
+
The model was trained on ~10GB of data scrapped from LIHKG.
|
41 |
+
It might contain violent and rude languages so as the text generated by the model.
|
42 |
+
Please do not use it for anything other than research or entertainment.
|
43 |
+
|
44 |
+
|
45 |
+
The comments on LIHKG also tend to be very short.
|
46 |
+
Thus the model cannot generate anything more than a line. In a lot of occasions might not even generate new tokens.
|
47 |
+
|
48 |
+
|
49 |
+
|
50 |
+
### Training procedure
|
51 |
|
52 |
Please refer to the [script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling)
|
53 |
provided by Huggingface.
|
|
|
56 |
The model was trained for 400,000 steps with batch size 5 (~2epoches) on 2 NVIDIA Quadro RTX6000 for around 40 hours at the Research Computing Services of Imperial College London.
|
57 |
|
58 |
|
|
|
|
|
59 |
### How to use it?
|
60 |
```
|
61 |
from transformers import AutoTokenizer
|
|
|
72 |
print(string)
|
73 |
```
|
74 |
|
75 |
+
|
76 |
### Framework versions
|
77 |
|
78 |
- Transformers 4.26.0.dev0
|