metadata

license: mit
language:
  - en
tags:
  - babylm

Lil-Bevo

Lil-Bevo is UT Austin's submission to the BabyLM challenge, specifically the strict-small track.

TLDR:

Unigram tokenizer trained on 10M BabyLM tokens plus MAESTRO dataset for a vocab size of 16k.
deberta-small-v3 trained on mixture of MAESTRO and 10M tokens for 5 epochs.
Model continues training for 50 epochs on 10M tokens with 128 sequence length.
Model continues training for 2 epochs on 10M tokens with 512 sequence length.
Model is trained with targeted linguistic masking for 10 epochs.

This README will be updated with more details soon.