|
--- |
|
base_model: bert-base-multilingual-uncased |
|
datasets: |
|
- AmazonScience/massive |
|
license: apache-2.0 |
|
tags: |
|
- embedding_space_map |
|
- BaseLM:bert-base-multilingual-uncased |
|
--- |
|
|
|
# ESM AmazonScience/massive |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
ESM |
|
|
|
- **Developed by:** David Schulte |
|
- **Model type:** ESM |
|
- **Base Model:** bert-base-multilingual-uncased |
|
- **Intermediate Task:** AmazonScience/massive |
|
- **ESM architecture:** linear |
|
- **Language(s) (NLP):** [More Information Needed] |
|
- **License:** Apache-2.0 license |
|
|
|
## Training Details |
|
|
|
### Intermediate Task |
|
- **Task ID:** AmazonScience/massive |
|
- **Subset [optional]:** ta-IN |
|
- **Text Column:** annot_utt |
|
- **Label Column:** scenario |
|
- **Dataset Split:** train |
|
- **Sample size [optional]:** 10000 |
|
- **Sample seed [optional]:** 42 |
|
|
|
### Training Procedure [optional] |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
#### Language Model Training Hyperparameters [optional] |
|
- **Epochs:** 3 |
|
- **Batch size:** 32 |
|
- **Learning rate:** 2e-05 |
|
- **Weight Decay:** 0.01 |
|
- **Optimizer**: AdamW |
|
|
|
### ESM Training Hyperparameters [optional] |
|
- **Epochs:** 10 |
|
- **Batch size:** 32 |
|
- **Learning rate:** 0.001 |
|
- **Weight Decay:** 0.01 |
|
- **Optimizer**: AdamW |
|
|
|
|
|
### Additional trainiung details [optional] |
|
|
|
|
|
## Model evaluation |
|
|
|
### Evaluation of fine-tuned language model [optional] |
|
|
|
|
|
### Evaluation of ESM [optional] |
|
MSE: |
|
|
|
### Additional evaluation details [optional] |
|
|
|
|
|
|
|
## What are Embedding Space Maps? |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
Embedding Space Maps (ESMs) are neural networks that approximate the effect of fine-tuning a language model on a task. They can be used to quickly transform embeddings from a base model to approximate how a fine-tuned model would embed the the input text. |
|
ESMs can be used for intermediate task selection with the ESM-LogME workflow. |
|
|
|
## How can I use Embedding Space Maps for Intermediate Task Selection? |
|
[![PyPI version](https://img.shields.io/pypi/v/hf-dataset-selector.svg)](https://pypi.org/project/hf-dataset-selector) |
|
|
|
We release **hf-dataset-selector**, a Python package for intermediate task selection using Embedding Space Maps. |
|
|
|
**hf-dataset-selector** fetches ESMs for a given language model and uses it to find the best dataset for applying intermediate training to the target task. ESMs are found by their tags on the Huggingface Hub. |
|
|
|
```python |
|
from hfselect import Dataset, compute_task_ranking |
|
|
|
# Load target dataset from the Hugging Face Hub |
|
dataset = Dataset.from_hugging_face( |
|
name="stanfordnlp/imdb", |
|
split="train", |
|
text_col="text", |
|
label_col="label", |
|
is_regression=False, |
|
num_examples=1000, |
|
seed=42 |
|
) |
|
|
|
# Fetch ESMs and rank tasks |
|
task_ranking = compute_task_ranking( |
|
dataset=dataset, |
|
model_name="bert-base-multilingual-uncased" |
|
) |
|
|
|
# Display top 5 recommendations |
|
print(task_ranking[:5]) |
|
``` |
|
|
|
For more information on how to use ESMs please have a look at the [official Github repository](https://github.com/davidschulte/hf-dataset-selector). |
|
|
|
## Citation |
|
|
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
If you are using this Embedding Space Maps, please cite our [paper](https://arxiv.org/abs/2410.15148). |
|
|
|
**BibTeX:** |
|
|
|
|
|
``` |
|
@misc{schulte2024moreparameterefficientselectionintermediate, |
|
title={Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning}, |
|
author={David Schulte and Felix Hamborg and Alan Akbik}, |
|
year={2024}, |
|
eprint={2410.15148}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2410.15148}, |
|
} |
|
``` |
|
|
|
|
|
**APA:** |
|
|
|
``` |
|
Schulte, D., Hamborg, F., & Akbik, A. (2024). Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning. arXiv preprint arXiv:2410.15148. |
|
``` |
|
|
|
## Additional Information |
|
|
|
|