fathan commited on
Commit
58048d7
1 Parent(s): 666b1c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -16,10 +16,10 @@ widget:
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
- # IndoJavE-BERT
20
 
21
  ## About
22
- IndoJavE-BERT is a pre-trained masked language model for code-mixed Indonesian-Javanese-English tweets data.
23
  This model is trained based on [IndoBERT](https://arxiv.org/pdf/2011.00677.pdf) model utilizing
24
  Hugging Face's [Transformers]((https://huggingface.co/transformers)) library.
25
 
@@ -51,9 +51,9 @@ Finally, we have 28,121,693 sentences for the training process.
51
  This pretraining data will not be opened to public due to Twitter policy.
52
 
53
  ## Model
54
- | Model name | Base model | Size of training data | Size of validation data |
55
- |--------------------------|-----------------|----------------------------|-------------------------|
56
- | `IndoJavE-BERT` | IndoBERT | 2.24 GB of text | 249 MB of text |
57
 
58
  ## Evaluation Results
59
  We train the data with 3 epochs and total steps of 296K for 4 days.
@@ -67,15 +67,15 @@ The following are the results obtained from the training:
67
  ### Load model and tokenizer
68
  ```python
69
  from transformers import AutoTokenizer, AutoModel
70
- tokenizer = AutoTokenizer.from_pretrained("fathan/indojave-codemixed-bert")
71
- model = AutoModel.from_pretrained("fathan/indojave-codemixed-bert")
72
 
73
  ```
74
  ### Masked language model
75
  ```python
76
  from transformers import pipeline
77
 
78
- pretrained_model = "fathan/indojave-codemixed-bert"
79
 
80
  fill_mask = pipeline(
81
  "fill-mask",
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
+ # IJEBERTweet: IndoBERT-base
20
 
21
  ## About
22
+ This is a pre-trained masked language model for code-mixed Indonesian-Javanese-English tweets data.
23
  This model is trained based on [IndoBERT](https://arxiv.org/pdf/2011.00677.pdf) model utilizing
24
  Hugging Face's [Transformers]((https://huggingface.co/transformers)) library.
25
 
 
51
  This pretraining data will not be opened to public due to Twitter policy.
52
 
53
  ## Model
54
+ | Model name | Base model | Size of training data | Size of validation data |
55
+ |----------------------------------------|-----------------|----------------------------|-------------------------|
56
+ | `ijebertweet-codemixed-indobert-base` | IndoBERT | 2.24 GB of text | 249 MB of text |
57
 
58
  ## Evaluation Results
59
  We train the data with 3 epochs and total steps of 296K for 4 days.
 
67
  ### Load model and tokenizer
68
  ```python
69
  from transformers import AutoTokenizer, AutoModel
70
+ tokenizer = AutoTokenizer.from_pretrained("fathan/ijebertweet-codemixed-indobert-base")
71
+ model = AutoModel.from_pretrained("fathan/ijebertweet-codemixed-indobert-base")
72
 
73
  ```
74
  ### Masked language model
75
  ```python
76
  from transformers import pipeline
77
 
78
+ pretrained_model = "fathan/ijebertweet-codemixed-indobert-base"
79
 
80
  fill_mask = pipeline(
81
  "fill-mask",