lst-nectec
/

HoogBERTa

@@ -17,9 +17,17 @@ This repository includes the Thai pretrained language representation (HoogBERTa_
 # Documentation
 To initialize the model from hub, use the following commands
 ```
 from transformers import AutoTokenizer, AutoModel
 tokenizer = AutoTokenizer.from_pretrained("new5558/HoogBERTa")
 model = AutoModel.from_pretrained("new5558/HoogBERTa")

 # Documentation
+## Prerequisite
+Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
+```
+pip install attacut
+```
+## Getting Start
 To initialize the model from hub, use the following commands
 ```
 from transformers import AutoTokenizer, AutoModel
+from attacut import tokenize
 tokenizer = AutoTokenizer.from_pretrained("new5558/HoogBERTa")
 model = AutoModel.from_pretrained("new5558/HoogBERTa")