Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,11 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
# Model Card for Model ID
|
@@ -28,6 +33,9 @@ This is the model card of a 🤗 transformers model that has been pushed on the
|
|
28 |
### Model Sources [optional]
|
29 |
|
30 |
<!-- Provide the basic links for the model. -->
|
|
|
|
|
|
|
31 |
|
32 |
- **Repository:** [More Information Needed]
|
33 |
- **Paper [optional]:** [More Information Needed]
|
@@ -36,10 +44,12 @@ This is the model card of a 🤗 transformers model that has been pushed on the
|
|
36 |
## Uses
|
37 |
|
38 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
|
|
39 |
|
40 |
### Direct Use
|
41 |
|
42 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
|
|
43 |
|
44 |
[More Information Needed]
|
45 |
|
@@ -52,6 +62,7 @@ This is the model card of a 🤗 transformers model that has been pushed on the
|
|
52 |
### Out-of-Scope Use
|
53 |
|
54 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
|
|
55 |
|
56 |
[More Information Needed]
|
57 |
|
@@ -74,16 +85,28 @@ Use the code below to get started with the model.
|
|
74 |
[More Information Needed]
|
75 |
|
76 |
## Training Details
|
|
|
|
|
77 |
|
78 |
### Training Data
|
79 |
|
80 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|
|
81 |
|
|
|
|
|
|
|
|
|
82 |
[More Information Needed]
|
83 |
|
84 |
### Training Procedure
|
85 |
|
86 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
|
|
|
|
|
|
|
|
|
|
87 |
|
88 |
#### Preprocessing [optional]
|
89 |
|
@@ -94,6 +117,15 @@ Use the code below to get started with the model.
|
|
94 |
|
95 |
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
#### Speeds, Sizes, Times [optional]
|
98 |
|
99 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
@@ -103,6 +135,15 @@ Use the code below to get started with the model.
|
|
103 |
## Evaluation
|
104 |
|
105 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
### Testing Data, Factors & Metrics
|
108 |
|
@@ -192,7 +233,7 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
192 |
|
193 |
## Model Card Authors [optional]
|
194 |
|
195 |
-
|
196 |
|
197 |
## Model Card Contact
|
198 |
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
datasets:
|
4 |
+
- mlabonne/orpo-dpo-mix-40k
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
base_model:
|
8 |
+
- meta-llama/Llama-3.2-1B-Instruct
|
9 |
---
|
10 |
|
11 |
# Model Card for Model ID
|
|
|
33 |
### Model Sources [optional]
|
34 |
|
35 |
<!-- Provide the basic links for the model. -->
|
36 |
+
https://uplimit.com/course/open-source-llms/session/session_clu1q3j6f016d128r2zxe3uyj/assignment/assignment_clyvnyyjh019h199337oef4ur
|
37 |
+
https://uplimit.com/ugc-assets/course/course_clmz6fh2a00aa12bqdtjv6ygs/assets/1728565337395-85hdx93s03d0v9bd8j1nnxfjylyty2/uplimitopensourcellmsoctoberweekone.ipynb
|
38 |
+
|
39 |
|
40 |
- **Repository:** [More Information Needed]
|
41 |
- **Paper [optional]:** [More Information Needed]
|
|
|
44 |
## Uses
|
45 |
|
46 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
47 |
+
Hands-on learning: Finetuning LLMs
|
48 |
|
49 |
### Direct Use
|
50 |
|
51 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
52 |
+
Introduction to Finetuning LLMs
|
53 |
|
54 |
[More Information Needed]
|
55 |
|
|
|
62 |
### Out-of-Scope Use
|
63 |
|
64 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
65 |
+
This should not yet be usedd in the world
|
66 |
|
67 |
[More Information Needed]
|
68 |
|
|
|
85 |
[More Information Needed]
|
86 |
|
87 |
## Training Details
|
88 |
+
Hardware: A100 GPU
|
89 |
+
Framework: PyTorch
|
90 |
|
91 |
### Training Data
|
92 |
|
93 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
94 |
+
For training data the model used:'mlabonne/orpo-dpo-mix-40k'
|
95 |
|
96 |
+
This dataset is designed for ORPO (Optimizing Reward and Preference Objectives) or DPO (Direct Preference Optimization) training of language models.
|
97 |
+
* It contains 44,245 examples in the training split.
|
98 |
+
* Includes prompts, chosen answers, and rejected answers for each sample.
|
99 |
+
* Combines various high-quality DPO datasets.
|
100 |
[More Information Needed]
|
101 |
|
102 |
### Training Procedure
|
103 |
|
104 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
105 |
+
This model was fine-tuned using the ORPO (Optimizing Reward and Preference Objectives) technique on the meta-llama/Llama-3.2-1B-Instruct base model.
|
106 |
+
|
107 |
+
Base Model: meta-llama/Llama-3.2-1B-Instruct
|
108 |
+
Training Technique: ORPO (Optimizing Reward and Preference Objectives)
|
109 |
+
Efficient Fine-tuning Method: LoRA (Low-Rank Adaptation)
|
110 |
|
111 |
#### Preprocessing [optional]
|
112 |
|
|
|
117 |
|
118 |
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
119 |
|
120 |
+
Learning Rate: 2e-5
|
121 |
+
Batch Size: 4
|
122 |
+
Gradient Accumulation Steps: 4
|
123 |
+
Training Steps: 500
|
124 |
+
Warmup Steps: 20
|
125 |
+
LoRA Rank: 16
|
126 |
+
LoRA Alpha: 32
|
127 |
+
|
128 |
+
|
129 |
#### Speeds, Sizes, Times [optional]
|
130 |
|
131 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
|
|
135 |
## Evaluation
|
136 |
|
137 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
138 |
+
For evaluation the model used Hellaswag
|
139 |
+
Results:
|
140 |
+
|
141 |
+
|
142 |
+
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|
143 |
+
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|
144 |
+
|hellaswag| 1|none | 0|acc |↑ |0.4516|± |0.0050|
|
145 |
+
| | |none | 0|acc_norm|↑ |0.6139|± |0.0049|
|
146 |
+
|
147 |
|
148 |
### Testing Data, Factors & Metrics
|
149 |
|
|
|
233 |
|
234 |
## Model Card Authors [optional]
|
235 |
|
236 |
+
Ruth Shacterman
|
237 |
|
238 |
## Model Card Contact
|
239 |
|