Update README.md
Browse files
README.md
CHANGED
@@ -81,4 +81,18 @@ The pre-training and fine-tuning were conducted on 512 NVIDIA Ampere (64GB) GPUs
|
|
81 |
|Multi-layer loss | yes |
|
82 |
|
83 |
## Licence
|
84 |
-
The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
|Multi-layer loss | yes |
|
82 |
|
83 |
## Licence
|
84 |
+
The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement).
|
85 |
+
|
86 |
+
|
87 |
+
# Citation
|
88 |
+
```
|
89 |
+
@article{gurioli2025modeltrainallhierarchical,
|
90 |
+
title={One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings},
|
91 |
+
author={Andrea Gurioli and Federico Pennino and João Monteiro and Maurizio Gabbrielli},
|
92 |
+
year={2025},
|
93 |
+
eprint={2503.03008},
|
94 |
+
archivePrefix={arXiv},
|
95 |
+
primaryClass={cs.CL},
|
96 |
+
url={https://arxiv.org/abs/2503.03008},
|
97 |
+
}
|
98 |
+
```
|