nanoT5-base-65kBPE-v2
This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
- SiLU/gated-SiLU activation
- 25% mask rate during pretrain
- 65k vocab size, adapted claude3 tokenizer
training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer
plots
more details are under checkpoints/
loss
gradients
weights
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.