metadata

license: mit
base_model: microsoft/git-base
tags:
  - generated_from_trainer
datasets:
  - imagefolder
model-index:
  - name: git-base-captioning
    results: []

git-base-captioning

This model is a fine-tuned version of microsoft/git-base on the imagefolder dataset. It achieves the following results on the evaluation set:

Loss: 0.3817
Wer Score: 2.8621

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer Score
7.3416	0.4202	50	4.5198	4.7633
2.4704	0.8403	100	0.7015	0.8610
0.4735	1.2605	150	0.3923	0.8164
0.3669	1.6807	200	0.3762	0.8198
0.3075	2.1008	250	0.3680	0.8062
0.2837	2.5210	300	0.3683	0.8090
0.274	2.9412	350	0.3640	0.8401
0.2393	3.3613	400	0.3692	2.8282
0.2498	3.7815	450	0.3655	2.0712
0.2198	4.2017	500	0.3698	3.2164
0.2034	4.6218	550	0.3688	2.5853
0.1925	5.0420	600	0.3698	2.9119
0.1779	5.4622	650	0.3729	3.1333
0.1734	5.8824	700	0.3727	1.7605
0.1696	6.3025	750	0.3749	3.5226
0.15	6.7227	800	0.3773	2.8932
0.1595	7.1429	850	0.3762	2.7842
0.1507	7.5630	900	0.3803	1.0266
0.135	7.9832	950	0.3802	3.6090
0.1385	8.4034	1000	0.3801	3.3169
0.1311	8.8235	1050	0.3800	3.3966
0.1398	9.2437	1100	0.3815	2.1915
0.1293	9.6639	1150	0.3817	2.8621

Framework versions

Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1