rshacter-llama-3.2-1B-instruct / README.md

Update README.md

a4d9e58 verified 7 months ago

8.6 kB

	---
	library_name: transformers
	datasets:
	- mlabonne/orpo-dpo-mix-40k
	language:
	- en
	base_model:
	- meta-llama/Llama-3.2-1B-Instruct
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	ORPO-Tuned Llama-3.2-1B-Instruct


	## Model Details
	- This model is a fine-tuned version of the meta-llama/Llama-3.2-1B-Instruct base model, adapted using the ORPO (Optimizing Reward and Preference Objectives) technique.
	- Base Model: It builds upon the Llama-3.2-1B-Instruct model, (1 billion parameter instruction-following language model).
	- Fine-Tuning Technique: The model was fine-tuned using ORPO. ORPO combines supervised fine-tuning with preference optimization.
	- Training Data: It was trained on the mlabonne/orpo-dpo-mix-40k dataset, containing 44,245 examples of prompts, chosen answers, and rejected answers.
	- Purpose: The model is designed to generate responses that are better aligned with human preferences while maintaining the general knowledge and capabilities of the base Llama 3 model.
	- Efficient Fine-Tuning: LoRA (Low-Rank Adaptation) was used for efficient adaptation, allowing for faster training and smaller storage requirements.
	- Capabilities: Model should follow instructions and generate responses that are more in line with human preferences compared to the base model.
	- Evaluation: The model's performance was evaluated on the HellaSwag benchmark


	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.


	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->
	https://uplimit.com/course/open-source-llms/session/session_clu1q3j6f016d128r2zxe3uyj/assignment/assignment_clyvnyyjh019h199337oef4ur
	https://uplimit.com/ugc-assets/course/course_clmz6fh2a00aa12bqdtjv6ygs/assets/1728565337395-85hdx93s03d0v9bd8j1nnxfjylyty2/uplimitopensourcellmsoctoberweekone.ipynb


	## Uses


	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	Hands-on learning: Finetuning LLMs

	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
	Introduction to Finetuning LLMs course - Learning

	### Downstream Use [optional]

	This model is designed for tasks requiring improved alignment with human preferences, such as:
	- Chatbots
	- Question-answering systems
	- General text generation with enhanced preference alignment<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
	This should not yet be used in the world - More finetuning is required


	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->
	- Performance may vary on tasks outside the training distribution
	- May inherit biases present in the base model and training data
	- Limited to 1B parameters, which may impact performance on complex tasks

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	- Users should be aware of potential biases in model outputs
	- Not suitable for critical decision-making without human oversight
	- May generate plausible-sounding but incorrect information


	## Training Details


	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
	For training data the model used:'mlabonne/orpo-dpo-mix-40k'

	This dataset is designed for ORPO (Optimizing Reward and Preference Objectives) or DPO (Direct Preference Optimization) training of language models.
	* It contains 44,245 examples in the training split.
	* Includes prompts, chosen answers, and rejected answers for each sample.
	* Combines various high-quality DPO datasets.
	[More Information Needed]

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
	This model was fine-tuned using the ORPO (Optimizing Reward and Preference Objectives) technique on the meta-llama/Llama-3.2-1B-Instruct base model.

	Base Model: meta-llama/Llama-3.2-1B-Instruct
	Training Technique: ORPO (Optimizing Reward and Preference Objectives)
	Efficient Fine-tuning Method: LoRA (Low-Rank Adaptation)

	#### Preprocessing [optional]

	[More Information Needed]


	#### Training Hyperparameters

	- Learning Rate: 2e-5
	- Batch Size: 4
	- Gradient Accumulation Steps: 4
	- Training Steps: 500
	- Warmup Steps: 20
	- LoRA Rank: 16
	- LoRA Alpha: 32


	#### Speeds, Sizes, Times [optional]

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

	[More Information Needed]

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->
	For evaluation the model used Hellaswag
	Results:


	\| Tasks \|Version\|Filter\|n-shot\| Metric \| \|Value \| \|Stderr\|
	\|---------\|------:\|------\|-----:\|--------\|---\|-----:\|---\|-----:\|
	\|hellaswag\| 1\|none \| 0\|acc \|↑ \|0.4516\|± \|0.0050\|
	\| \| \|none \| 0\|acc_norm\|↑ \|0.6139\|± \|0.0049\|

	Interpretation:
	- Performance Level: The model achieves a raw accuracy of 45.16% and a normalized accuracy of 61.39% on the HellaSwag task.
	- Confidence: The small standard errors (about 0.5% for both metrics) indicate that these results are fairly precise.
	- Improvement over Random: Given that HellaSwag typically has 4 choices per question, a random baseline would achieve 25% accuracy. This model performs significantly better than random.
	- Normalized vs. Raw Accuracy: The higher normalized accuracy (61.39% vs. 45.16%) suggests that the model performs better when accounting for task-specific challenges.
	- Room for Improvement: While the performance is well above random, there's still significant room for improvement to reach human-level performance (which is typically above 95% on HellaSwag).



	#### Summary

	- Base Model: meta-llama/Llama-3.2-1B-Instruct
	- Model Type: Causal Language Model
	- Language: English

	Intended Use
	- This model is designed for tasks requiring improved alignment with human preferences, such as:
	- Chatbots
	- Question-answering systems
	- General text generation with enhanced preference alignment

	Training Data
	- Dataset: mlabonne/orpo-dpo-mix-40k
	- Size: 44,245 examples
	- Content: Prompts, chosen answers, and rejected answers


	Task: HellaSwag
	- This is a benchmark task designed to evaluate a model's commonsense reasoning and ability to complete scenarios logically.
	- No specific filtering was applied to the test set.
	- The evaluation was done in a zero-shot setting, where the model didn't receive any examples before making predictions.

	Interpretation:
	- Performance Level: The model achieves a raw accuracy of 45.16% and a normalized accuracy of 61.39% on the HellaSwag task.
	- Confidence: The small standard errors (about 0.5% for both metrics) indicate that these results are fairly precise.
	- Improvement over Random: Given that HellaSwag typically has 4 choices per question, a random baseline would achieve 25% accuracy. This model performs significantly better than random.
	- Normalized vs. Raw Accuracy: The higher normalized accuracy (61.39% vs. 45.16%) suggests that the model performs better when accounting for task-specific challenges.
	- Room for Improvement: While the performance is well above random, there's still significant room for improvement to reach human-level performance (which is typically above 95% on HellaSwag).
	- Metrics: a. acc (Accuracy): Value: 0.4516 (45.16%), Stderr: ± 0.0050 (0.50%), b. acc_norm (Normalized Accuracy): Value: 0.6139 (61.39%), Stderr: ± 0.0049 (0.49%)


	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	- Hardware Type: A100
	- Hours used: No comment
	- Cloud Provider: Google Collab
	- Compute Region: Sacramento, CA, US
	- Framework: PyTorch

	## Technical Specifications [optional]

	Hardware: A100 GPU


	## Model Card Author

	Ruth Shacterman

	## Model Card Contact

	[More Information Needed]