readme file
Browse files
README.md
ADDED
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
tags:
|
4 |
+
- roberta-base
|
5 |
+
- roberta-base-epoch_1
|
6 |
+
license: mit
|
7 |
+
datasets:
|
8 |
+
- wikipedia
|
9 |
+
- bookcorpus
|
10 |
+
---
|
11 |
+
|
12 |
+
# RoBERTa, Intermediate Checkpoint - Epoch 1
|
13 |
+
|
14 |
+
This model is part of our reimplementation of the [RoBERTa model](https://arxiv.org/abs/1907.11692),
|
15 |
+
trained on Wikipedia and the Book Corpus only.
|
16 |
+
We train this model for almost 100K steps, corresponding to 83 epochs.
|
17 |
+
We provide the 84 checkpoints (including the randomly initialized weights before the training)
|
18 |
+
to provide the ability to study the training dynamics of such models, and other possible use-cases.
|
19 |
+
|
20 |
+
These models were trained in part of a work that studies how simple statistics from data,
|
21 |
+
such as co-occurrences affects model predictions, which are described in the paper
|
22 |
+
[Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions](https://arxiv.org/abs/2207.14251).
|
23 |
+
|
24 |
+
This is RoBERTa-base epoch_1.
|
25 |
+
|
26 |
+
## Model Description
|
27 |
+
|
28 |
+
This model was captured during a reproduction of
|
29 |
+
[RoBERTa-base](https://huggingface.co/roberta-base), for English: it
|
30 |
+
is a Transformers model pretrained on a large corpus of English data, using the
|
31 |
+
Masked Language Modelling (MLM).
|
32 |
+
|
33 |
+
The intended uses, limitations, training data and training procedure for the fully trained model are similar
|
34 |
+
to [RoBERTa-base](https://huggingface.co/roberta-base). Two major
|
35 |
+
differences with the original model:
|
36 |
+
|
37 |
+
* We trained our model for 100K steps, instead of 500K
|
38 |
+
* We only use Wikipedia and the Book Corpus, as corpora which are publicly available.
|
39 |
+
|
40 |
+
|
41 |
+
### How to use
|
42 |
+
|
43 |
+
Using code from
|
44 |
+
[RoBERTa-base](https://huggingface.co/roberta-base), here is an example based on
|
45 |
+
PyTorch:
|
46 |
+
|
47 |
+
```
|
48 |
+
from transformers import pipeline
|
49 |
+
|
50 |
+
model = pipeline("fill-mask", model='yanaiela/roberta-base-epoch_83', device=-1, top_k=10)
|
51 |
+
model("Hello, I'm the <mask> RoBERTa-base language model")
|
52 |
+
|
53 |
+
```
|
54 |
+
|
55 |
+
## Citation info
|
56 |
+
|
57 |
+
```bibtex
|
58 |
+
@article{2207.14251,
|
59 |
+
Author = {Yanai Elazar and Nora Kassner and Shauli Ravfogel and Amir Feder and Abhilasha Ravichander and Marius Mosbach and Yonatan Belinkov and Hinrich Schütze and Yoav Goldberg},
|
60 |
+
Title = {Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions},
|
61 |
+
Year = {2022},
|
62 |
+
Eprint = {arXiv:2207.14251},
|
63 |
+
}
|
64 |
+
```
|