ccdv commited on
Commit
9e3e53b
1 Parent(s): db91e1c
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -12,17 +12,19 @@ pipeline_tag: fill-mask
12
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
13
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
14
 
 
 
15
  * [Usage](#usage)
16
  * [Parameters](#parameters)
17
  * [Sparse selection type](#sparse-selection-type)
18
  * [Tasks](#tasks)
19
 
20
- This model is adapted from [AlBERT-base-v2](https://huggingface.co/albert-base-v2) without additional pretraining. It uses the same number of parameters/layers and the same tokenizer.
21
 
 
22
 
23
  This model can handle long sequences but faster and more efficiently than Longformer (LED) or BigBird (Pegasus) from the hub and relies on Local + Sparse + Global attention (LSG).
24
 
25
- The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...). \
26
 
27
  Implemented in PyTorch.
28
 
 
12
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
13
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
14
 
15
+ Conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
16
+
17
  * [Usage](#usage)
18
  * [Parameters](#parameters)
19
  * [Sparse selection type](#sparse-selection-type)
20
  * [Tasks](#tasks)
21
 
 
22
 
23
+ This model is adapted from [AlBERT-base-v2](https://huggingface.co/albert-base-v2) without additional pretraining. It uses the same number of parameters/layers and the same tokenizer.
24
 
25
  This model can handle long sequences but faster and more efficiently than Longformer (LED) or BigBird (Pegasus) from the hub and relies on Local + Sparse + Global attention (LSG).
26
 
27
+ The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...).
28
 
29
  Implemented in PyTorch.
30