readme
Browse files
README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
---
|
2 |
language: en
|
3 |
tags:
|
|
|
4 |
- long context
|
5 |
pipeline_tag: fill-mask
|
6 |
---
|
@@ -16,7 +17,7 @@ pipeline_tag: fill-mask
|
|
16 |
* [Tasks](#tasks)
|
17 |
* [Training global tokens](#training-global-tokens)
|
18 |
|
19 |
-
This model is adapted from [XLM-
|
20 |
|
21 |
|
22 |
This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
|
@@ -148,3 +149,24 @@ for name, param in model.named_parameters():
|
|
148 |
else:
|
149 |
param.required_grad = True
|
150 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
language: en
|
3 |
tags:
|
4 |
+
- xlm-roberta
|
5 |
- long context
|
6 |
pipeline_tag: fill-mask
|
7 |
---
|
|
|
17 |
* [Tasks](#tasks)
|
18 |
* [Training global tokens](#training-global-tokens)
|
19 |
|
20 |
+
This model is adapted from [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base) model without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
|
21 |
|
22 |
|
23 |
This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
|
|
|
149 |
else:
|
150 |
param.required_grad = True
|
151 |
```
|
152 |
+
|
153 |
+
**XLM-RoBERTa**
|
154 |
+
```
|
155 |
+
@article{DBLP:journals/corr/abs-2105-00572,
|
156 |
+
author = {Naman Goyal and
|
157 |
+
Jingfei Du and
|
158 |
+
Myle Ott and
|
159 |
+
Giri Anantharaman and
|
160 |
+
Alexis Conneau},
|
161 |
+
title = {Larger-Scale Transformers for Multilingual Masked Language Modeling},
|
162 |
+
journal = {CoRR},
|
163 |
+
volume = {abs/2105.00572},
|
164 |
+
year = {2021},
|
165 |
+
url = {https://arxiv.org/abs/2105.00572},
|
166 |
+
eprinttype = {arXiv},
|
167 |
+
eprint = {2105.00572},
|
168 |
+
timestamp = {Wed, 12 May 2021 15:54:31 +0200},
|
169 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-2105-00572.bib},
|
170 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
171 |
+
}
|
172 |
+
```
|