Spaces:

mshukor
/

UnIVAL

Sleeping

App Files Files Community

UnIVAL / fairseq /examples /roberta /README.race.md

mshukor

init

26fd00c over 1 year ago

preview code

raw

history blame

2.76 kB

	# Finetuning RoBERTa on RACE tasks

	### 1) Download the data from RACE website (http://www.cs.cmu.edu/~glai1/data/race/)

	### 2) Preprocess RACE data:
	```bash
	python ./examples/roberta/preprocess_RACE.py --input-dir <input-dir> --output-dir <extracted-data-dir>
	./examples/roberta/preprocess_RACE.sh <extracted-data-dir> <output-dir>
	```

	### 3) Fine-tuning on RACE:

	```bash
	MAX_EPOCH=5 # Number of training epochs.
	LR=1e-05 # Peak LR for fixed LR scheduler.
	NUM_CLASSES=4
	MAX_SENTENCES=1 # Batch size per GPU.
	UPDATE_FREQ=8 # Accumulate gradients to simulate training on 8 GPUs.
	DATA_DIR=/path/to/race-output-dir
	ROBERTA_PATH=/path/to/roberta/model.pt

	CUDA_VISIBLE_DEVICES=0,1 fairseq-train $DATA_DIR --ddp-backend=legacy_ddp \
	--restore-file $ROBERTA_PATH \
	--reset-optimizer --reset-dataloader --reset-meters \
	--best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
	--task sentence_ranking \
	--num-classes $NUM_CLASSES \
	--init-token 0 --separator-token 2 \
	--max-option-length 128 \
	--max-positions 512 \
	--shorten-method "truncate" \
	--arch roberta_large \
	--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
	--criterion sentence_ranking \
	--optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 \
	--clip-norm 0.0 \
	--lr-scheduler fixed --lr $LR \
	--fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
	--batch-size $MAX_SENTENCES \
	--required-batch-size-multiple 1 \
	--update-freq $UPDATE_FREQ \
	--max-epoch $MAX_EPOCH
	```

	Note:

	a) As contexts in RACE are relatively long, we are using smaller batch size per GPU while increasing update-freq to achieve larger effective batch size.

	b) Above cmd-args and hyperparams are tested on one Nvidia `V100` GPU with `32gb` of memory for each task. Depending on the GPU memory resources available to you, you can use increase `--update-freq` and reduce `--batch-size`.

	c) The setting in above command is based on our hyperparam search within a fixed search space (for careful comparison across models). You might be able to find better metrics with wider hyperparam search.

	### 4) Evaluation:

	```
	DATA_DIR=/path/to/race-output-dir # data directory used during training
	MODEL_PATH=/path/to/checkpoint_best.pt # path to the finetuned model checkpoint
	PREDS_OUT=preds.tsv # output file path to save prediction
	TEST_SPLIT=test # can be test (Middle) or test1 (High)
	fairseq-validate \
	$DATA_DIR \
	--valid-subset $TEST_SPLIT \
	--path $MODEL_PATH \
	--batch-size 1 \
	--task sentence_ranking \
	--criterion sentence_ranking \
	--save-predictions $PREDS_OUT
	```