devshaheen
commited on
Commit
•
6ab9caa
1
Parent(s):
bafd56b
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- mlabonne/guanaco-llama2-1k
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
base_model:
|
8 |
+
- NousResearch/Llama-2-7b-chat-hf
|
9 |
+
pipeline_tag: text-generation
|
10 |
+
library_name: transformers
|
11 |
+
finetuned_model: true
|
12 |
+
model_type: causal-lm
|
13 |
+
finetuned_task: instruction-following
|
14 |
+
training_args:
|
15 |
+
num_train_epochs: 1
|
16 |
+
batch_size: 4
|
17 |
+
learning_rate: 2e-4
|
18 |
+
weight_decay: 0.001
|
19 |
+
gradient_accumulation_steps: 1
|
20 |
+
gradient_checkpointing: true
|
21 |
+
max_grad_norm: 0.3
|
22 |
+
logging_steps: 25
|
23 |
+
warmup_ratio: 0.03
|
24 |
+
optim: paged_adamw_32bit
|
25 |
+
lr_scheduler_type: cosine
|
26 |
+
metrics:
|
27 |
+
- accuracy
|
28 |
+
- loss
|
29 |
+
description: |
|
30 |
+
This is a fine-tuned version of the Llama-2-7B-Chat model, trained on the `mlabonne/guanaco-llama2-1k` dataset for instruction-following tasks. The model has been adapted for text generation, including various NLP tasks such as question answering, summarization, and more.
|
31 |
+
The fine-tuning process utilizes QLoRa and 4-bit quantization for memory efficiency and better GPU utilization. This model has been optimized for instruction-following and efficient training with gradient accumulation and checkpointing.
|
32 |
+
tags:
|
33 |
+
- instruction-following
|
34 |
+
- text-generation
|
35 |
+
- fine-tuned
|
36 |
+
- llama2
|
37 |
+
- causal-language-model
|
38 |
+
- QLoRa
|
39 |
+
- 4-bit-quantization
|
40 |
+
- low-memory
|
41 |
+
- training-optimized
|
42 |
+
trainer_info: |
|
43 |
+
The fine-tuned model was trained with a learning rate of 2e-4 using the Paged AdamW optimizer and a warmup ratio of 0.03. The training included gradient accumulation, weight decay, and gradient checkpointing to reduce memory usage. The model was trained for 1 epoch with a batch size of 4 per device. The model was also quantized using the NF4 type to reduce memory usage further. The training was conducted on a 4-bit quantized version of the base model for memory efficiency.
|
44 |
+
The training script utilizes the `SFTTrainer` class for supervised fine-tuning, with parameters optimized for instruction-following tasks, ensuring robust performance for various text generation tasks.
|
45 |
+
model_compatibility:
|
46 |
+
- GPU: Yes (with support for 4-bit and bf16, check compatibility with your hardware)
|
47 |
+
- Suitable for tasks: Text generation, Question answering, Summarization, and Instruction-based tasks
|
48 |
+
---
|