jacobfulano
commited on
Commit
•
1dc825e
1
Parent(s):
c8eb665
Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,8 @@ language:
|
|
8 |
|
9 |
# MosaicBERT-Base model
|
10 |
MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
|
11 |
-
MosaicBERT
|
|
|
12 |
|
13 |
### Model Date
|
14 |
|
@@ -16,15 +17,17 @@ March 2023
|
|
16 |
|
17 |
## Documentation
|
18 |
* Blog post
|
19 |
-
* Github (mosaicml/examples repo)
|
20 |
|
21 |
# How to use
|
22 |
|
|
|
|
|
23 |
```python
|
24 |
from transformers import AutoModelforForMaskedLM
|
25 |
mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', use_auth_token=<your token>, trust_remote_code=True)
|
26 |
```
|
27 |
-
The tokenizer for this model is the Hugging Face `bert-base-uncased` tokenizer.
|
28 |
|
29 |
```python
|
30 |
from transformers import BertTokenizer
|
@@ -93,7 +96,7 @@ for both MosaicBERT-Base and the baseline BERT-Base. For all BERT-Base models, w
|
|
93 |
|
94 |
2. **Higher Masking Ratio for the Masked Language Modeling Objective**: We used the standard Masked Language Modeling (MLM) pretraining objective.
|
95 |
While the original BERT paper also included a Next Sentence Prediction (NSP) task in the pretraining objective,
|
96 |
-
subsequent papers have shown this to be unnecessary [Liu et al. 2019](https://arxiv.org/abs/1907.11692).
|
97 |
However, we found that a 30% masking ratio led to slight accuracy improvements in both pretraining MLM and downstream GLUE performance.
|
98 |
We therefore included this simple change as part of our MosaicBERT training recipe. Recent studies have also found that this simple
|
99 |
change can lead to downstream improvements [Wettig et al. 2022](https://arxiv.org/abs/2202.08005).
|
|
|
8 |
|
9 |
# MosaicBERT-Base model
|
10 |
MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
|
11 |
+
MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
|
12 |
+
Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased).
|
13 |
|
14 |
### Model Date
|
15 |
|
|
|
17 |
|
18 |
## Documentation
|
19 |
* Blog post
|
20 |
+
* [Github (mosaicml/examples/bert repo)](https://github.com/mosaicml/examples/tree/main/examples/bert)
|
21 |
|
22 |
# How to use
|
23 |
|
24 |
+
We recommend using the code in the [mosaicml/examples/bert repo](https://github.com/mosaicml/examples/tree/main/examples/bert) for pretraining and finetuning this model.
|
25 |
+
|
26 |
```python
|
27 |
from transformers import AutoModelforForMaskedLM
|
28 |
mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', use_auth_token=<your token>, trust_remote_code=True)
|
29 |
```
|
30 |
+
The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
|
31 |
|
32 |
```python
|
33 |
from transformers import BertTokenizer
|
|
|
96 |
|
97 |
2. **Higher Masking Ratio for the Masked Language Modeling Objective**: We used the standard Masked Language Modeling (MLM) pretraining objective.
|
98 |
While the original BERT paper also included a Next Sentence Prediction (NSP) task in the pretraining objective,
|
99 |
+
subsequent papers have shown this to be unnecessary [Liu et al. 2019](https://arxiv.org/abs/1907.11692).
|
100 |
However, we found that a 30% masking ratio led to slight accuracy improvements in both pretraining MLM and downstream GLUE performance.
|
101 |
We therefore included this simple change as part of our MosaicBERT training recipe. Recent studies have also found that this simple
|
102 |
change can lead to downstream improvements [Wettig et al. 2022](https://arxiv.org/abs/2202.08005).
|