File size: 8,603 Bytes
d8297f4
 
a056ebf
 
 
 
 
 
d8297f4
 
 
 
 
110862e
d8297f4
 
 
110862e
 
 
 
 
 
 
 
 
d8297f4
 
 
 
 
 
 
 
 
 
 
a056ebf
 
 
d8297f4
 
 
110862e
d8297f4
a056ebf
d8297f4
 
 
 
110862e
d8297f4
 
 
110862e
9c45140
 
 
d8297f4
 
 
 
110862e
d8297f4
 
 
 
 
110862e
 
 
d8297f4
 
 
 
 
110862e
 
 
d8297f4
 
 
110862e
d8297f4
 
 
 
a056ebf
d8297f4
a056ebf
 
 
 
d8297f4
 
 
 
 
a056ebf
 
 
 
 
d8297f4
 
 
 
 
 
 
 
a4d9e58
 
 
 
 
 
 
a056ebf
 
d8297f4
 
 
 
 
 
 
 
 
a056ebf
 
 
 
 
 
 
 
 
110862e
 
 
 
 
 
d8297f4
 
 
110862e
d8297f4
110862e
 
 
d8297f4
110862e
 
 
 
 
d8297f4
110862e
 
 
 
d8297f4
 
110862e
 
 
 
d8297f4
110862e
 
 
 
 
 
 
d8297f4
 
 
 
 
 
110862e
 
 
 
 
d8297f4
 
 
110862e
d8297f4
 
9c45140
d8297f4
a056ebf
d8297f4
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
library_name: transformers
datasets:
- mlabonne/orpo-dpo-mix-40k
language:
- en
base_model:
- meta-llama/Llama-3.2-1B-Instruct
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
ORPO-Tuned Llama-3.2-1B-Instruct


## Model Details
- This model is a fine-tuned version of the meta-llama/Llama-3.2-1B-Instruct base model, adapted using the ORPO (Optimizing Reward and Preference Objectives) technique. 
- Base Model: It builds upon the Llama-3.2-1B-Instruct model, (1 billion parameter instruction-following language model).
- Fine-Tuning Technique: The model was fine-tuned using ORPO. ORPO combines supervised fine-tuning with preference optimization.
- Training Data: It was trained on the mlabonne/orpo-dpo-mix-40k dataset, containing 44,245 examples of prompts, chosen answers, and rejected answers.
- Purpose: The model is designed to generate responses that are better aligned with human preferences while maintaining the general knowledge and capabilities of the base Llama 3 model.
- Efficient Fine-Tuning: LoRA (Low-Rank Adaptation) was used for efficient adaptation, allowing for faster training and smaller storage requirements.
- Capabilities: Model should follow instructions and generate responses that are more in line with human preferences compared to the base model.
- Evaluation: The model's performance was evaluated on the HellaSwag benchmark


### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.


### Model Sources [optional]

<!-- Provide the basic links for the model. -->
https://uplimit.com/course/open-source-llms/session/session_clu1q3j6f016d128r2zxe3uyj/assignment/assignment_clyvnyyjh019h199337oef4ur
https://uplimit.com/ugc-assets/course/course_clmz6fh2a00aa12bqdtjv6ygs/assets/1728565337395-85hdx93s03d0v9bd8j1nnxfjylyty2/uplimitopensourcellmsoctoberweekone.ipynb


## Uses


<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Hands-on learning: Finetuning LLMs

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
Introduction to Finetuning LLMs course - Learning

### Downstream Use [optional]

This model is designed for tasks requiring improved alignment with human preferences, such as:
- Chatbots
- Question-answering systems
- General text generation with enhanced preference alignment<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
This should not yet be used in the world - More finetuning is required


## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->
- Performance may vary on tasks outside the training distribution
- May inherit biases present in the base model and training data
- Limited to 1B parameters, which may impact performance on complex tasks

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

- Users should be aware of potential biases in model outputs
- Not suitable for critical decision-making without human oversight
- May generate plausible-sounding but incorrect information


## Training Details


### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
For training data the model used:'mlabonne/orpo-dpo-mix-40k'

This dataset is designed for ORPO (Optimizing Reward and Preference Objectives) or DPO (Direct Preference Optimization) training of language models.
* It contains 44,245 examples in the training split.
* Includes prompts, chosen answers, and rejected answers for each sample.
* Combines various high-quality DPO datasets.
[More Information Needed]

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
This model was fine-tuned using the ORPO (Optimizing Reward and Preference Objectives) technique on the meta-llama/Llama-3.2-1B-Instruct base model.

Base Model: meta-llama/Llama-3.2-1B-Instruct
Training Technique: ORPO (Optimizing Reward and Preference Objectives)
Efficient Fine-tuning Method: LoRA (Low-Rank Adaptation)

#### Preprocessing [optional]

[More Information Needed]


#### Training Hyperparameters

- Learning Rate: 2e-5
- Batch Size: 4
- Gradient Accumulation Steps: 4
- Training Steps: 500
- Warmup Steps: 20
- LoRA Rank: 16
- LoRA Alpha: 32


#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->
For evaluation the model used Hellaswag
Results:


|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.4516|±  |0.0050|
|         |       |none  |     0|acc_norm|↑  |0.6139|±  |0.0049|

Interpretation:
- Performance Level: The model achieves a raw accuracy of 45.16% and a normalized accuracy of 61.39% on the HellaSwag task.
- Confidence: The small standard errors (about 0.5% for both metrics) indicate that these results are fairly precise.
- Improvement over Random: Given that HellaSwag typically has 4 choices per question, a random baseline would achieve 25% accuracy. This model performs significantly better than random.
- Normalized vs. Raw Accuracy: The higher normalized accuracy (61.39% vs. 45.16%) suggests that the model performs better when accounting for task-specific challenges.
- Room for Improvement: While the performance is well above random, there's still significant room for improvement to reach human-level performance (which is typically above 95% on HellaSwag).



#### Summary

- Base Model: meta-llama/Llama-3.2-1B-Instruct
- Model Type: Causal Language Model
- Language: English

Intended Use
- This model is designed for tasks requiring improved alignment with human preferences, such as:
- Chatbots
- Question-answering systems
- General text generation with enhanced preference alignment

Training Data
- Dataset: mlabonne/orpo-dpo-mix-40k
- Size: 44,245 examples
- Content: Prompts, chosen answers, and rejected answers


Task: HellaSwag
- This is a benchmark task designed to evaluate a model's commonsense reasoning and ability to complete scenarios logically.
- No specific filtering was applied to the test set.
- The evaluation was done in a zero-shot setting, where the model didn't receive any examples before making predictions.

Interpretation:
- Performance Level: The model achieves a raw accuracy of 45.16% and a normalized accuracy of 61.39% on the HellaSwag task.
- Confidence: The small standard errors (about 0.5% for both metrics) indicate that these results are fairly precise.
- Improvement over Random: Given that HellaSwag typically has 4 choices per question, a random baseline would achieve 25% accuracy. This model performs significantly better than random.
- Normalized vs. Raw Accuracy: The higher normalized accuracy (61.39% vs. 45.16%) suggests that the model performs better when accounting for task-specific challenges.
- Room for Improvement: While the performance is well above random, there's still significant room for improvement to reach human-level performance (which is typically above 95% on HellaSwag).
- Metrics: a. acc (Accuracy): Value: 0.4516 (45.16%), Stderr: ± 0.0050 (0.50%), b. acc_norm (Normalized Accuracy): Value: 0.6139 (61.39%), Stderr: ± 0.0049 (0.49%)


## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

- **Hardware Type:** A100
- **Hours used:** No comment
- **Cloud Provider:** Google Collab
- **Compute Region:** Sacramento, CA, US
- **Framework**: PyTorch

## Technical Specifications [optional]

Hardware: A100 GPU


## Model Card Author

Ruth Shacterman

## Model Card Contact

[More Information Needed]