rshacter
/

rshacter-llama-3.2-1B-instruct

@@ -1,6 +1,11 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
@@ -28,6 +33,9 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
 - **Repository:** [More Information Needed]
 - **Paper [optional]:** [More Information Needed]
@@ -36,10 +44,12 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 [More Information Needed]
@@ -52,6 +62,7 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 [More Information Needed]
@@ -74,16 +85,28 @@ Use the code below to get started with the model.
 [More Information Needed]
 ## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 [More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
@@ -94,6 +117,15 @@ Use the code below to get started with the model.
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
@@ -103,6 +135,15 @@ Use the code below to get started with the model.
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
@@ -192,7 +233,7 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 ## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact

 ---
 library_name: transformers
+datasets:
+- mlabonne/orpo-dpo-mix-40k
+language:
+- en
+base_model:
+- meta-llama/Llama-3.2-1B-Instruct
 ---
 # Model Card for Model ID
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
+https://uplimit.com/course/open-source-llms/session/session_clu1q3j6f016d128r2zxe3uyj/assignment/assignment_clyvnyyjh019h199337oef4ur
+https://uplimit.com/ugc-assets/course/course_clmz6fh2a00aa12bqdtjv6ygs/assets/1728565337395-85hdx93s03d0v9bd8j1nnxfjylyty2/uplimitopensourcellmsoctoberweekone.ipynb
 - **Repository:** [More Information Needed]
 - **Paper [optional]:** [More Information Needed]
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+Hands-on learning: Finetuning LLMs
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+Introduction to Finetuning LLMs
 [More Information Needed]
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+This should not yet be usedd in the world
 [More Information Needed]
 [More Information Needed]
 ## Training Details
+Hardware: A100 GPU
+Framework: PyTorch
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+For training data the model used:'mlabonne/orpo-dpo-mix-40k'
+This dataset is designed for ORPO (Optimizing Reward and Preference Objectives) or DPO (Direct Preference Optimization) training of language models.
+* It contains 44,245 examples in the training split.
+* Includes prompts, chosen answers, and rejected answers for each sample.
+* Combines various high-quality DPO datasets.
 [More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+This model was fine-tuned using the ORPO (Optimizing Reward and Preference Objectives) technique on the meta-llama/Llama-3.2-1B-Instruct base model.
+Base Model: meta-llama/Llama-3.2-1B-Instruct
+Training Technique: ORPO (Optimizing Reward and Preference Objectives)
+Efficient Fine-tuning Method: LoRA (Low-Rank Adaptation)
 #### Preprocessing [optional]
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+Learning Rate: 2e-5
+Batch Size: 4
+Gradient Accumulation Steps: 4
+Training Steps: 500
+Warmup Steps: 20
+LoRA Rank: 16
+LoRA Alpha: 32
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
+For evaluation the model used Hellaswag
+Results:
+|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
+|---------|------:|------|-----:|--------|---|-----:|---|-----:|
+|hellaswag|      1|none  |     0|acc     |↑  |0.4516|±  |0.0050|
+|         |       |none  |     0|acc_norm|↑  |0.6139|±  |0.0049|
 ### Testing Data, Factors & Metrics
 ## Model Card Authors [optional]
+Ruth Shacterman
 ## Model Card Contact