jed351 commited on
Commit
3f5f000
·
1 Parent(s): e36ddf8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ This model has not been trained on any Cantonese material.
2
+
3
+ It is simply a base model in which the embeddings and tokenizer were patched with Cantonese characters.
4
+
5
+ I used this repo to identify missing Cantonese characters
6
+ https://github.com/ayaka14732/bert-tokenizer-cantonese
7
+
8
+ My forked and modified version: https://github.com/jedcheng/bert-tokenizer-cantonese