TristanBehrens
/

bach-garland-mambaplus

Model card Files Files and versions Community

bach-garland-mambaplus / README.md

TristanBehrens's picture

Upload README.md with huggingface_hub

a53aa63 verified 4 months ago

|

1.01 kB

	---
	language:
	- en
	tags:
	- NLP
	license: mit
	datasets:
	- TristanBehrens/bach_garland_2024-100K
	base_model: None
	---

	# bach_garland_mambaplus - An xLSTM Model

	![Trained with Helibrunna](banner.jpg)

	Trained with [Helibrunna](https://github.com/AI-Guru/helibrunna) by [Dr. Tristan Behrens](https://de.linkedin.com/in/dr-tristan-behrens-734967a2).

	## Configuration

	```
	training:
	model_name: bach_garland_mambaplus
	batch_size: 8
	lr: 0.001
	lr_warmup_steps: 5000
	lr_decay_until_steps: 50000
	lr_decay_factor: 0.001
	weight_decay: 0.1
	amp_precision: bfloat16
	weight_precision: float32
	enable_mixed_precision: true
	num_epochs: 8
	output_dir: output/bach_garland_mambaplus
	save_every_step: 500
	log_every_step: 10
	wandb_project: bach_garland
	torch_compile: false
	model:
	type: mamba
	d_model: 128
	n_layers: 8
	context_length: 4096
	vocab_size: 178
	dataset:
	hugging_face_id: TristanBehrens/bach_garland_2024-100K
	tokenizer:
	type: whitespace
	fill_token: '[EOS]'

	```