pszemraj
/

nanoT5-base-65kBPE-v2

Text2Text Generation

text-generation-inference

Model card Files Files and versions Community

nanoT5-base-65kBPE-v2

This is a "raw" pretrained model intended to be fine-tuned on downstream tasks

SiLU/gated-SiLU activation
25% mask rate during pretrain
65k vocab size, adapted claude3 tokenizer

training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer

plots

more details are under checkpoints/

loss

gradients

weights

Downloads last month: 5

Safetensors

Model size

298M params

Tensor type

F32

·

Inference Providers NEW

Text2Text Generation

This model is not currently available via any of the supported Inference Providers.

Dataset used to train pszemraj/nanoT5-base-65kBPE-v2