kiddothe2b commited on
Commit
3ae194c
1 Parent(s): 578a507

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -3,7 +3,7 @@ license: cc-by-nc-sa-4.0
3
  pipeline_tag: fill-mask
4
  arxiv: 2210.05529
5
  language: en
6
- thumbnail: https://raw.githubusercontent.com/coastalcph/hierarchical-transformers/main/data/figures/hat_encoder.png
7
  tags:
8
  - long-documents
9
  datasets:
@@ -19,7 +19,7 @@ model-index:
19
 
20
  This is a Hierarchical Attention Transformer (HAT) model as presented in [An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification (Chalkidis et al., 2022)](https://arxiv.org/abs/2210.05529).
21
 
22
- The model has been warm-started re-using the weights of RoBERTa [(Liu et al., 2019)](https://arxiv.org/abs/1907.11692), and continued pre-trained for MLM in long sequences following the paradigm of Longformer released by [Beltagy et al. (2020)](https://arxiv.org/abs/2004.05150). It supports sequences of length up to 4,096.
23
 
24
  HAT uses hierarchical attention, which is a combination of segment-wise and cross-segment attention operations. You can think of segments as paragraphs or sentences.
25
 
@@ -40,7 +40,7 @@ tokenizer = AutoTokenizer.from_pretrained("kiddothe2b/hierarchical-transformer-b
40
  mlm_model = AutoModelforForMaskedLM("kiddothe2b/hierarchical-transformer-base-4096", trust_remote_code=True)
41
  ```
42
 
43
- You can also fine-tun it for SequenceClassification, SequentialSentenceClassification, and MultipleChoice down-stream tasks:
44
 
45
  ```python
46
  from transformers import AutoTokenizer, AutoModelforSequenceClassification
@@ -103,7 +103,7 @@ If you use HAT in your research, please cite [An Exploration of Hierarchical Att
103
 
104
  ```
105
  @misc{chalkidis-etal-2022-hat,
106
- url = {https://arxiv.org/abs/xxx},
107
  author = {Chalkidis, Ilias and Dai, Xiang and Fergadiotis, Manos and Malakasiotis, Prodromos and Elliott, Desmond},
108
  title = {An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification},
109
  publisher = {arXiv},
 
3
  pipeline_tag: fill-mask
4
  arxiv: 2210.05529
5
  language: en
6
+ thumbnail: https://github.com/coastalcph/hierarchical-transformers/raw/main/data/figures/hat_encoder.png
7
  tags:
8
  - long-documents
9
  datasets:
 
19
 
20
  This is a Hierarchical Attention Transformer (HAT) model as presented in [An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification (Chalkidis et al., 2022)](https://arxiv.org/abs/2210.05529).
21
 
22
+ The model has been warm-started re-using the weights of RoBERTa (Liu et al., 2019), and continued pre-trained for MLM in long sequences following the paradigm of Longformer released by Beltagy et al. (2020). It supports sequences of length up to 4,096.
23
 
24
  HAT uses hierarchical attention, which is a combination of segment-wise and cross-segment attention operations. You can think of segments as paragraphs or sentences.
25
 
 
40
  mlm_model = AutoModelforForMaskedLM("kiddothe2b/hierarchical-transformer-base-4096", trust_remote_code=True)
41
  ```
42
 
43
+ You can also fine-tune it for SequenceClassification, SequentialSentenceClassification, and MultipleChoice down-stream tasks:
44
 
45
  ```python
46
  from transformers import AutoTokenizer, AutoModelforSequenceClassification
 
103
 
104
  ```
105
  @misc{chalkidis-etal-2022-hat,
106
+ url = {https://arxiv.org/abs/2210.05529},
107
  author = {Chalkidis, Ilias and Dai, Xiang and Fergadiotis, Manos and Malakasiotis, Prodromos and Elliott, Desmond},
108
  title = {An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification},
109
  publisher = {arXiv},