ccdv commited on
Commit
90fb5ad
·
1 Parent(s): 300179f
Files changed (1) hide show
  1. README.md +23 -1
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  language: en
3
  tags:
 
4
  - long context
5
  pipeline_tag: fill-mask
6
  ---
@@ -16,7 +17,7 @@ pipeline_tag: fill-mask
16
  * [Tasks](#tasks)
17
  * [Training global tokens](#training-global-tokens)
18
 
19
- This model is adapted from [XLM-roberta-base](https://huggingface.co/xlm-roberta-base) model without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
20
 
21
 
22
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
@@ -148,3 +149,24 @@ for name, param in model.named_parameters():
148
  else:
149
  param.required_grad = True
150
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language: en
3
  tags:
4
+ - xlm-roberta
5
  - long context
6
  pipeline_tag: fill-mask
7
  ---
 
17
  * [Tasks](#tasks)
18
  * [Training global tokens](#training-global-tokens)
19
 
20
+ This model is adapted from [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base) model without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
21
 
22
 
23
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
 
149
  else:
150
  param.required_grad = True
151
  ```
152
+
153
+ **XLM-RoBERTa**
154
+ ```
155
+ @article{DBLP:journals/corr/abs-2105-00572,
156
+ author = {Naman Goyal and
157
+ Jingfei Du and
158
+ Myle Ott and
159
+ Giri Anantharaman and
160
+ Alexis Conneau},
161
+ title = {Larger-Scale Transformers for Multilingual Masked Language Modeling},
162
+ journal = {CoRR},
163
+ volume = {abs/2105.00572},
164
+ year = {2021},
165
+ url = {https://arxiv.org/abs/2105.00572},
166
+ eprinttype = {arXiv},
167
+ eprint = {2105.00572},
168
+ timestamp = {Wed, 12 May 2021 15:54:31 +0200},
169
+ biburl = {https://dblp.org/rec/journals/corr/abs-2105-00572.bib},
170
+ bibsource = {dblp computer science bibliography, https://dblp.org}
171
+ }
172
+ ```