fathan commited on
Commit
c2fe415
1 Parent(s): e164338

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -47,7 +47,8 @@ In the second stage pre-processing, we do the following pre-processing tasks:
47
  - convert ‘@username’ to ‘@USER’,
48
  - convert URL to HTTPURL.
49
 
50
- Finally, we have 28,121,693 sentences for the training process.
 
51
 
52
  ## Model
53
  | Model name | Architecture | Size of training data | Size of validation data |
@@ -62,6 +63,30 @@ The following are the results obtained from the training:
62
  |------------|------------|------------|
63
  | 3.5057 | 3.0559 | 21.2398 |
64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  ### Training hyperparameters
66
 
67
  The following hyperparameters were used during training:
 
47
  - convert ‘@username’ to ‘@USER’,
48
  - convert URL to HTTPURL.
49
 
50
+ Finally, we have 28,121,693 sentences for the training process.
51
+ This pretraining data will not be opened to public due to Twitter policy.
52
 
53
  ## Model
54
  | Model name | Architecture | Size of training data | Size of validation data |
 
63
  |------------|------------|------------|
64
  | 3.5057 | 3.0559 | 21.2398 |
65
 
66
+ ## How to use
67
+ ### Load model and tokenizer
68
+ ```python
69
+
70
+ from transformers import AutoTokenizer, AutoModel
71
+ tokenizer = AutoTokenizer.from_pretrained("fathan/code_mixed_ijebert")
72
+ model = AutoModel.from_pretrained("fathan/code_mixed_ijebert")
73
+
74
+ ```
75
+ ### Masked language model
76
+ ```python
77
+ from transformers import pipeline
78
+
79
+ pretrained_model = "fathan/code_mixed_ijebert"
80
+
81
+ fill_mask = pipeline(
82
+ "fill-mask",
83
+ model=pretrained_model,
84
+ tokenizer=pretrained_model
85
+ )
86
+ ```
87
+
88
+
89
+
90
  ### Training hyperparameters
91
 
92
  The following hyperparameters were used during training: