yang31210999 nielsr HF staff commited on
Commit
d258d01
·
verified ·
1 Parent(s): c347fbc

Add pipeline tag, library name and link to Github repository (#1)

Browse files

- Add pipeline tag, library name and link to Github repository (862ceaa2156002b03fa3e37fa966d791ef399813)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -1,15 +1,19 @@
1
  ---
2
- license: llama3.2
3
- datasets:
4
- - BAAI/Infinity-Instruct
5
  base_model:
6
  - meta-llama/Llama-3.2-1B-Instruct
 
 
 
 
 
7
  ---
8
 
9
  ## Model Overview
10
 
11
  This weight is a fine-tuned version of **[Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** using the **[LLM-Neo](https://arxiv.org/abs/2411.06839)** method. Usage is identical to the original Llama-3.2-1B-Instruct model.
12
 
 
 
13
  ## Training Details
14
 
15
  The training process employs the **LLM-Neo** method. The dataset is derived from a mixed sample of **[BAAI/Infinity-Instruct](https://huggingface.co/datasets/BAAI/Infinity-Instruct)**, specifically the `0625` and `7M` subsets, with a total of 10k instruction samples. The KD (knowledge distillation) model used is **[Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)**, with the following hyperparameters:
@@ -69,10 +73,7 @@ The mathematical evaluation highlights significant improvements of the current m
69
 
70
  ---
71
 
72
-
73
-
74
  ### Summary
75
 
76
  - **Strengths**: The current model demonstrates notable improvements over **Llama-3.2-1B-Instruct** across multiple benchmark tasks, particularly in reasoning and mathematical problem-solving.
77
- - **Future Directions**: Further optimization in logical reasoning tasks (e.g., **TabMWP**) and continued enhancements in general language and mathematical adaptability.
78
-
 
1
  ---
 
 
 
2
  base_model:
3
  - meta-llama/Llama-3.2-1B-Instruct
4
+ datasets:
5
+ - BAAI/Infinity-Instruct
6
+ license: llama3.2
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
  ---
10
 
11
  ## Model Overview
12
 
13
  This weight is a fine-tuned version of **[Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)** using the **[LLM-Neo](https://arxiv.org/abs/2411.06839)** method. Usage is identical to the original Llama-3.2-1B-Instruct model.
14
 
15
+ The official implementation can be found here: https://github.com/yang3121099/LLM-Neo
16
+
17
  ## Training Details
18
 
19
  The training process employs the **LLM-Neo** method. The dataset is derived from a mixed sample of **[BAAI/Infinity-Instruct](https://huggingface.co/datasets/BAAI/Infinity-Instruct)**, specifically the `0625` and `7M` subsets, with a total of 10k instruction samples. The KD (knowledge distillation) model used is **[Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)**, with the following hyperparameters:
 
73
 
74
  ---
75
 
 
 
76
  ### Summary
77
 
78
  - **Strengths**: The current model demonstrates notable improvements over **Llama-3.2-1B-Instruct** across multiple benchmark tasks, particularly in reasoning and mathematical problem-solving.
79
+ - **Future Directions**: Further optimization in logical reasoning tasks (e.g., **TabMWP**) and continued enhancements in general language and mathematical adaptability.