jed351
/

gpt2-tiny-zh-hk

Feature Extraction

text-generation-inference

Model card Files Files and versions Community

jed351 commited on Jan 27, 2023

Commit

351b0c8

·

1 Parent(s): e0200cb

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -1,13 +1,17 @@
 This model has not been trained on any Cantonese material.
-It is simply a base model in which the embeddings and tokenizer were patched with Cantonese characters.
 I used this repo to identify missing Cantonese characters
 https://github.com/ayaka14732/bert-tokenizer-cantonese
 My forked and modified version: https://github.com/jedcheng/bert-tokenizer-cantonese
 After identifying the missing characters, the huggingface library provides very high level API to modify the tokenizer and embeddings.
 ```

 This model has not been trained on any Cantonese material.
+It is simply a base model in which the embeddings and tokenizer were patched with Cantonese characters. One can find the original model [gpt2-tiny-chinese](https://huggingface.co/ckiplab/gpt2-tiny-chinese).
 I used this repo to identify missing Cantonese characters
 https://github.com/ayaka14732/bert-tokenizer-cantonese
 My forked and modified version: https://github.com/jedcheng/bert-tokenizer-cantonese
 After identifying the missing characters, the huggingface library provides very high level API to modify the tokenizer and embeddings.
 ```