metadata
license: mit
language:
- en
tags:
- babylm
Lil-Bevo
Lil-Bevo is UT Austin's submission to the BabyLM challenge, specifically the strict-small track.
TLDR:
Unigram tokenizer trained on 10M BabyLM tokens plus MAESTRO dataset for a vocab size of 16k.
deberta-small-v3
trained on mixture of MAESTRO and 10M tokens for 3 epochs.Model continues training for 50 epochs on 10M tokens with 128 sequence length.
Model continues training for 200 epochs on 10M tokens with 512 sequence length.
Model is trained with targeted linguistic masking for 10 epochs.
This README will be updated with more details soon.