Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,58 @@
|
|
1 |
---
|
2 |
license: llama3.1
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: llama3.1
|
3 |
+
datasets:
|
4 |
+
- agentlans/crash-course
|
5 |
+
base_model:
|
6 |
+
- agentlans/Llama3.1-SuperDeepFuse
|
7 |
---
|
8 |
+
# Llama3.1-SuperDeepFuse-CrashCourse12K
|
9 |
+
|
10 |
+
Llama3.1-SuperDeepFuse-CrashCourse12K is an 8B parameter language model based on [Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
|
11 |
+
and further fine-tuned on [agentlans/crash-course](https://huggingface.co/datasets/agentlans/crash-course).
|
12 |
+
|
13 |
+
## Model Details
|
14 |
+
|
15 |
+
- **Base Model**: Llama3.1-SuperDeepFuse (8B parameters)
|
16 |
+
- **Fine-tuning Dataset**: 12 000 samples from agentlans/crash-course (containing samples from 10 high-quality instruct datasets)
|
17 |
+
- **Model Type**: Instruction-tuned language model
|
18 |
+
- **Language(s)**: Multilingual
|
19 |
+
- **License**: Follows standard Llama 3.1 usage terms
|
20 |
+
|
21 |
+
## Training Procedure
|
22 |
+
|
23 |
+
### Fine-tuning
|
24 |
+
|
25 |
+
- **Method**: LoRA (Low-Rank Adaptation)
|
26 |
+
- **Optimizer**: AdamW
|
27 |
+
- **Learning Rate**: 5e-5
|
28 |
+
- **Batch Size**: 2 per device
|
29 |
+
- **Gradient Accumulation Steps**: 8
|
30 |
+
- **Training Epochs**: 1
|
31 |
+
- **Max Sequence Length**: 2048
|
32 |
+
- **LoRA Configuration**:
|
33 |
+
- Rank: 8
|
34 |
+
- Alpha: 16
|
35 |
+
- Dropout: 0.5
|
36 |
+
- Target: all layers
|
37 |
+
- **Quantization**: 4-bit (bitsandbytes)
|
38 |
+
- **Precision**: BF16
|
39 |
+
- **Other Techniques**: NEFTune (noise alpha: 5), RS-LoRA
|
40 |
+
|
41 |
+
## Performance and Limitations
|
42 |
+
|
43 |
+
This model potentially offers:
|
44 |
+
|
45 |
+
- Enhanced multi-task reasoning
|
46 |
+
- Improved performance in mathematics and coding tasks
|
47 |
+
- Better instruction-following abilities
|
48 |
+
|
49 |
+
However:
|
50 |
+
|
51 |
+
- Performance may be limited compared to larger model variants
|
52 |
+
- Can produce misleading or incorrect outputs
|
53 |
+
- Outputs should be independently verified for critical applications
|
54 |
+
|
55 |
+
## Additional Information
|
56 |
+
|
57 |
+
- For the original model, see [agentlans/Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
|
58 |
+
- For the base Llama 3.1 model, including training data and model architecture, refer to the original [Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model card.
|