docs: add kaggle conversion code
Browse files
README.md
CHANGED
@@ -16,6 +16,7 @@ This repository includes the Thai pretrained language representation (HoogBERTa_
|
|
16 |
|
17 |
# Documentation
|
18 |
|
|
|
19 |
## Prerequisite
|
20 |
Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
|
21 |
```
|
@@ -81,6 +82,12 @@ with torch.no_grad():
|
|
81 |
features = model(token_ids) # where token_ids is a tensor with type "long".
|
82 |
```
|
83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
# Citation
|
85 |
|
86 |
Please cite as:
|
|
|
16 |
|
17 |
# Documentation
|
18 |
|
19 |
+
|
20 |
## Prerequisite
|
21 |
Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
|
22 |
```
|
|
|
82 |
features = model(token_ids) # where token_ids is a tensor with type "long".
|
83 |
```
|
84 |
|
85 |
+
|
86 |
+
## Conversion Code
|
87 |
+
If you are interested in how to convert Fairseq and subword-nmt Roberta into Huggingface hub here is my code used to do the conversion and test for parity match:
|
88 |
+
https://www.kaggle.com/norapatbuppodom/hoogberta-conversion
|
89 |
+
|
90 |
+
|
91 |
# Citation
|
92 |
|
93 |
Please cite as:
|