RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulatesubsample_iter4_sftsd2

Generated from Trainer

Model card Files Files and versions Community

collapse_gemma-2-2b_hs2_accumulatesubsample_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1629
Num Input Tokens Seen: 5250360

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.4507	0.0527	5	1.2649	285024
1.2669	0.1055	10	1.1794	566176
1.2492	0.1582	15	1.1551	833640
1.2366	0.2109	20	1.1399	1110504
1.0557	0.2637	25	1.1411	1388928
0.9926	0.3164	30	1.1517	1667696
0.9331	0.3691	35	1.1582	1947440
0.8809	0.4219	40	1.1619	2225640
0.8058	0.4746	45	1.1649	2505560
0.8295	0.5274	50	1.1664	2785208
0.8397	0.5801	55	1.1773	3063096
0.7422	0.6328	60	1.1690	3345784
0.7239	0.6856	65	1.1694	3626168
0.6693	0.7383	70	1.1631	3907224
0.6977	0.7910	75	1.1692	4190504
0.6348	0.8438	80	1.1727	4469088
0.6957	0.8965	85	1.1625	4749008
0.5706	0.9492	90	1.1691	5028272

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Downloads last month: 4

Safetensors

Model size

2.61B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter4_sftsd2

Base model

google/gemma-2-2b

Finetuned

(512)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard