Add pipeline tag, library name and link to Github repository (#1)
Browse files- Add pipeline tag, library name and link to Github repository (862ceaa2156002b03fa3e37fa966d791ef399813)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,15 +1,19 @@
|
|
1 |
---
|
2 |
-
license: llama3.2
|
3 |
-
datasets:
|
4 |
-
- BAAI/Infinity-Instruct
|
5 |
base_model:
|
6 |
- meta-llama/Llama-3.2-1B-Instruct
|
|
|
|
|
|
|
|
|
|
|
7 |
---
|
8 |
|
9 |
## Model Overview
|
10 |
|
11 |
This weight is a fine-tuned version of **[Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** using the **[LLM-Neo](https://arxiv.org/abs/2411.06839)** method. Usage is identical to the original Llama-3.2-1B-Instruct model.
|
12 |
|
|
|
|
|
13 |
## Training Details
|
14 |
|
15 |
The training process employs the **LLM-Neo** method. The dataset is derived from a mixed sample of **[BAAI/Infinity-Instruct](https://huggingface.co/datasets/BAAI/Infinity-Instruct)**, specifically the `0625` and `7M` subsets, with a total of 10k instruction samples. The KD (knowledge distillation) model used is **[Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)**, with the following hyperparameters:
|
@@ -69,10 +73,7 @@ The mathematical evaluation highlights significant improvements of the current m
|
|
69 |
|
70 |
---
|
71 |
|
72 |
-
|
73 |
-
|
74 |
### Summary
|
75 |
|
76 |
- **Strengths**: The current model demonstrates notable improvements over **Llama-3.2-1B-Instruct** across multiple benchmark tasks, particularly in reasoning and mathematical problem-solving.
|
77 |
-
- **Future Directions**: Further optimization in logical reasoning tasks (e.g., **TabMWP**) and continued enhancements in general language and mathematical adaptability.
|
78 |
-
|
|
|
1 |
---
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- meta-llama/Llama-3.2-1B-Instruct
|
4 |
+
datasets:
|
5 |
+
- BAAI/Infinity-Instruct
|
6 |
+
license: llama3.2
|
7 |
+
pipeline_tag: text-generation
|
8 |
+
library_name: transformers
|
9 |
---
|
10 |
|
11 |
## Model Overview
|
12 |
|
13 |
This weight is a fine-tuned version of **[Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** using the **[LLM-Neo](https://arxiv.org/abs/2411.06839)** method. Usage is identical to the original Llama-3.2-1B-Instruct model.
|
14 |
|
15 |
+
The official implementation can be found here: https://github.com/yang3121099/LLM-Neo
|
16 |
+
|
17 |
## Training Details
|
18 |
|
19 |
The training process employs the **LLM-Neo** method. The dataset is derived from a mixed sample of **[BAAI/Infinity-Instruct](https://huggingface.co/datasets/BAAI/Infinity-Instruct)**, specifically the `0625` and `7M` subsets, with a total of 10k instruction samples. The KD (knowledge distillation) model used is **[Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)**, with the following hyperparameters:
|
|
|
73 |
|
74 |
---
|
75 |
|
|
|
|
|
76 |
### Summary
|
77 |
|
78 |
- **Strengths**: The current model demonstrates notable improvements over **Llama-3.2-1B-Instruct** across multiple benchmark tasks, particularly in reasoning and mathematical problem-solving.
|
79 |
+
- **Future Directions**: Further optimization in logical reasoning tasks (e.g., **TabMWP**) and continued enhancements in general language and mathematical adaptability.
|
|