File size: 3,086 Bytes
6cf4824
 
 
 
 
 
 
fe95514
6cf4824
fe95514
 
 
6cf4824
 
 
 
 
fe95514
6cf4824
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fe95514
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: apache-2.0
base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
tags:
- summarization
- generated_from_trainer
model-index:
- name: TinyLlama-1.1B-Sum-SFT
  results: []
datasets:
- martimfasantos/openai-summarize-tldr
pipeline_tag: summarization
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# TinyLlama-1.1B-Sum-SFT

This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) on the martimfasantos/openai-summarize-tldr dataset.
It achieves the following results on the evaluation set:
- Loss: 1.8887
- Nll Loss: 1.8968
- Logps/best: -71.1814
- Rewards/chosen: 2.2080
- Rewards/rejected: -0.6886
- Rewards/accuracies: 1.0
- Rewards/margins: 2.8966
- Logps/rejected: -14.2972
- Logps/chosen: -71.1814
- Logits/rejected: -3.0553
- Logits/chosen: -3.4224

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Nll Loss | Logps/best | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.9469        | 0.2193 | 800  | 1.9582          | 1.9648   | -73.7246   | 1.9537         | -0.4240          | 1.0                | 2.3777          | -11.6512       | -73.7246     | -2.7987         | -3.1275       |
| 1.9813        | 0.4386 | 1600 | 1.9285          | 1.9369   | -72.6769   | 2.0585         | -0.5023          | 1.0                | 2.5607          | -12.4339       | -72.6769     | -2.9393         | -3.2910       |
| 1.9215        | 0.6579 | 2400 | 1.9049          | 1.9127   | -71.7733   | 2.1488         | -0.5719          | 1.0                | 2.7207          | -13.1300       | -71.7733     | -3.0198         | -3.3812       |
| 1.8655        | 0.8772 | 3200 | 1.8887          | 1.8968   | -71.1814   | 2.2080         | -0.6886          | 1.0                | 2.8966          | -14.2972       | -71.1814     | -3.0553         | -3.4224       |


### Framework versions

- Transformers 4.43.3
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1