File size: 4,159 Bytes
c1bc651 b640a50 c1bc651 b640a50 c1bc651 b640a50 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
base_model: bert-base-multilingual-uncased
datasets:
- RussianNLP/russian_super_glue
license: apache-2.0
tags:
- embedding_space_map
- BaseLM:bert-base-multilingual-uncased
---
# ESM RussianNLP/russian_super_glue
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
ESM
- **Developed by:** David Schulte
- **Model type:** ESM
- **Base Model:** bert-base-multilingual-uncased
- **Intermediate Task:** RussianNLP/russian_super_glue
- **ESM architecture:** linear
- **Language(s) (NLP):** [More Information Needed]
- **License:** Apache-2.0 license
## Training Details
### Intermediate Task
- **Task ID:** RussianNLP/russian_super_glue
- **Subset [optional]:** rcb
- **Text Column:** ['premise', 'hypothesis']
- **Label Column:** label
- **Dataset Split:** train
- **Sample size [optional]:** 438
- **Sample seed [optional]:**
### Training Procedure [optional]
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Language Model Training Hyperparameters [optional]
- **Epochs:** 3
- **Batch size:** 32
- **Learning rate:** 2e-05
- **Weight Decay:** 0.01
- **Optimizer**: AdamW
### ESM Training Hyperparameters [optional]
- **Epochs:** 10
- **Batch size:** 32
- **Learning rate:** 0.001
- **Weight Decay:** 0.01
- **Optimizer**: AdamW
### Additional trainiung details [optional]
## Model evaluation
### Evaluation of fine-tuned language model [optional]
### Evaluation of ESM [optional]
MSE:
### Additional evaluation details [optional]
## What are Embedding Space Maps?
<!-- This section describes the evaluation protocols and provides the results. -->
Embedding Space Maps (ESMs) are neural networks that approximate the effect of fine-tuning a language model on a task. They can be used to quickly transform embeddings from a base model to approximate how a fine-tuned model would embed the the input text.
ESMs can be used for intermediate task selection with the ESM-LogME workflow.
## How can I use Embedding Space Maps for Intermediate Task Selection?
[![PyPI version](https://img.shields.io/pypi/v/hf-dataset-selector.svg)](https://pypi.org/project/hf-dataset-selector)
We release **hf-dataset-selector**, a Python package for intermediate task selection using Embedding Space Maps.
**hf-dataset-selector** fetches ESMs for a given language model and uses it to find the best dataset for applying intermediate training to the target task. ESMs are found by their tags on the Huggingface Hub.
```python
from hfselect import Dataset, compute_task_ranking
# Load target dataset from the Hugging Face Hub
dataset = Dataset.from_hugging_face(
name="stanfordnlp/imdb",
split="train",
text_col="text",
label_col="label",
is_regression=False,
num_examples=1000,
seed=42
)
# Fetch ESMs and rank tasks
task_ranking = compute_task_ranking(
dataset=dataset,
model_name="bert-base-multilingual-uncased"
)
# Display top 5 recommendations
print(task_ranking[:5])
```
For more information on how to use ESMs please have a look at the [official Github repository](https://github.com/davidschulte/hf-dataset-selector).
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you are using this Embedding Space Maps, please cite our [paper](https://arxiv.org/abs/2410.15148).
**BibTeX:**
```
@misc{schulte2024moreparameterefficientselectionintermediate,
title={Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning},
author={David Schulte and Felix Hamborg and Alan Akbik},
year={2024},
eprint={2410.15148},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.15148},
}
```
**APA:**
```
Schulte, D., Hamborg, F., & Akbik, A. (2024). Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning. arXiv preprint arXiv:2410.15148.
```
## Additional Information
|