language: | |
- en | |
tags: | |
- NLP | |
license: mit | |
datasets: | |
- TristanBehrens/bach_garland_2024-100K | |
base_model: None | |
# bach_garland_mambaplus - An xLSTM Model | |
![Trained with Helibrunna](banner.jpg) | |
Trained with [Helibrunna](https://github.com/AI-Guru/helibrunna) by [Dr. Tristan Behrens](https://de.linkedin.com/in/dr-tristan-behrens-734967a2). | |
## Configuration | |
``` | |
training: | |
model_name: bach_garland_mambaplus | |
batch_size: 8 | |
lr: 0.001 | |
lr_warmup_steps: 5000 | |
lr_decay_until_steps: 50000 | |
lr_decay_factor: 0.001 | |
weight_decay: 0.1 | |
amp_precision: bfloat16 | |
weight_precision: float32 | |
enable_mixed_precision: true | |
num_epochs: 8 | |
output_dir: output/bach_garland_mambaplus | |
save_every_step: 500 | |
log_every_step: 10 | |
wandb_project: bach_garland | |
torch_compile: false | |
model: | |
type: mamba | |
d_model: 128 | |
n_layers: 8 | |
context_length: 4096 | |
vocab_size: 178 | |
dataset: | |
hugging_face_id: TristanBehrens/bach_garland_2024-100K | |
tokenizer: | |
type: whitespace | |
fill_token: '[EOS]' | |
``` | |