Roberta-base trained with linearly increasing alpha for alpha-entmax (from 1.0 to 2.0).
To run, do this:
from sparse_roberta import get_custom_model
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('roberta-base')
# Load the model
model = get_custom_model(
'mtreviso/sparsemax-roberta',
initial_alpha=2.0,
use_triton_entmax=False,
from_scratch=False,
)
To run glue tasks, you can use the run_glue.py
script. For example:
python run_glue.py \
--model_name_or_path mtreviso/sparsemax-roberta \
--config_name roberta-base \
--tokenizer_name roberta-base \
--task_name rte \
--output_dir output-rte \
--do_train \
--do_eval \
--max_seq_length 512 \
--per_device_train_batch_size 32 \
--learning_rate 3e-5 \
--num_train_epochs 3 \
--save_steps 1000 \
--logging_steps 100 \
--save_total_limit 1 \
--overwrite_output_dir