fathan
/

indojave-codemixed-indobert-base

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

fathan commited on May 8, 2023

Commit

58048d7

•

1 Parent(s): 666b1c9

Update README.md

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -16,10 +16,10 @@ widget:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# IndoJavE-BERT
 ## About
-IndoJavE-BERT is a pre-trained masked language model for code-mixed Indonesian-Javanese-English tweets data.
 This model is trained based on [IndoBERT](https://arxiv.org/pdf/2011.00677.pdf) model utilizing
 Hugging Face's [Transformers]((https://huggingface.co/transformers)) library.
@@ -51,9 +51,9 @@ Finally, we have 28,121,693 sentences for the training process.
 This pretraining data will not be opened to public due to Twitter policy.
 ## Model
-| Model name               | Base model      | Size of training data      | Size of validation data |
-|--------------------------|-----------------|----------------------------|-------------------------|
-| `IndoJavE-BERT`          | IndoBERT        | 2.24 GB of text            | 249 MB of text          |
 ## Evaluation Results
 We train the data with 3 epochs and total steps of 296K for 4 days.
@@ -67,15 +67,15 @@ The following are the results obtained from the training:
 ### Load model and tokenizer
 ```python
 from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("fathan/indojave-codemixed-bert")
-model = AutoModel.from_pretrained("fathan/indojave-codemixed-bert")
 ```
 ### Masked language model
 ```python
 from transformers import pipeline
-pretrained_model = "fathan/indojave-codemixed-bert"
 fill_mask = pipeline(
     "fill-mask",

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# IJEBERTweet: IndoBERT-base
 ## About
+This is a pre-trained masked language model for code-mixed Indonesian-Javanese-English tweets data.
 This model is trained based on [IndoBERT](https://arxiv.org/pdf/2011.00677.pdf) model utilizing
 Hugging Face's [Transformers]((https://huggingface.co/transformers)) library.
 This pretraining data will not be opened to public due to Twitter policy.
 ## Model
+| Model name                             | Base model      | Size of training data      | Size of validation data |
+|----------------------------------------|-----------------|----------------------------|-------------------------|
+| `ijebertweet-codemixed-indobert-base`  | IndoBERT        | 2.24 GB of text            | 249 MB of text          |
 ## Evaluation Results
 We train the data with 3 epochs and total steps of 296K for 4 days.
 ### Load model and tokenizer
 ```python
 from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("fathan/ijebertweet-codemixed-indobert-base")
+model = AutoModel.from_pretrained("fathan/ijebertweet-codemixed-indobert-base")
 ```
 ### Masked language model
 ```python
 from transformers import pipeline
+pretrained_model = "fathan/ijebertweet-codemixed-indobert-base"
 fill_mask = pipeline(
     "fill-mask",