jed351 commited on
Commit
351b0c8
·
1 Parent(s): e0200cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -1,13 +1,17 @@
1
  This model has not been trained on any Cantonese material.
2
 
3
- It is simply a base model in which the embeddings and tokenizer were patched with Cantonese characters.
 
 
 
 
 
4
 
5
  I used this repo to identify missing Cantonese characters
6
  https://github.com/ayaka14732/bert-tokenizer-cantonese
7
 
8
  My forked and modified version: https://github.com/jedcheng/bert-tokenizer-cantonese
9
 
10
-
11
  After identifying the missing characters, the huggingface library provides very high level API to modify the tokenizer and embeddings.
12
 
13
  ```
 
1
  This model has not been trained on any Cantonese material.
2
 
3
+ It is simply a base model in which the embeddings and tokenizer were patched with Cantonese characters. One can find the original model [gpt2-tiny-chinese](https://huggingface.co/ckiplab/gpt2-tiny-chinese).
4
+
5
+
6
+
7
+
8
+
9
 
10
  I used this repo to identify missing Cantonese characters
11
  https://github.com/ayaka14732/bert-tokenizer-cantonese
12
 
13
  My forked and modified version: https://github.com/jedcheng/bert-tokenizer-cantonese
14
 
 
15
  After identifying the missing characters, the huggingface library provides very high level API to modify the tokenizer and embeddings.
16
 
17
  ```