jacobfulano
commited on
Commit
•
f6e53b7
1
Parent(s):
a3d5e7e
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ We trained [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) using f
|
|
32 |
| Setting | Dataset | HuggingFace Collection |
|
33 |
| --------| ------| ------ |
|
34 |
| Continued Pretraining - Code | [StarCoder-Python](https://huggingface.co/datasets/bigcode/starcoderdata) | [LoRA-TMLR-2024/continued-pretraining-code-starcoder-python](https://huggingface.co/collections/LoRA-TMLR-2024/continued-pretraining-code-starcoder-python-66f22ce3b26f416f21f58142) |
|
35 |
-
| Continued Pretraing - Math | [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) |
|
36 |
| Instruction Finetuning - Code | [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)| [LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k-66f224a800152f31e4942a3b) |
|
37 |
| Instruction Finetuning - Math | [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | [LoRA-TMLR-2024/instruction-finetuning-math-metamathqa](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-math-metamathqa-66f31cc40fda6b6b938d33e2) |
|
38 |
|
@@ -65,7 +65,7 @@ with LoRA.
|
|
65 |
## Uses
|
66 |
|
67 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
68 |
-
These are research
|
69 |
|
70 |
|
71 |
## Training Details
|
@@ -111,6 +111,8 @@ subset and sub-sampled it to 20B tokens.
|
|
111 |
| gradient_clipping | norm (threshold=1) |
|
112 |
| num_gpus | 32 |
|
113 |
|
|
|
|
|
114 |
## Math CPT (OpenWebMath)
|
115 |
|
116 |
[OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) (Paster et al., 2023) - This dataset contains 14.7B tokens derived from mathematical web pages from Common Crawl, correctly formatted to preserve mathematical content such as LaTeX equations. To match with the StarCoder-Python dataset, we trained on up to 20B tokens, repeating tokens beyond the first 14.7B. An analysis of this dataset shows that it contains a considerable amount of full English sentences.
|
@@ -128,6 +130,9 @@ subset and sub-sampled it to 20B tokens.
|
|
128 |
| gradient_clipping | norm (threshold=1) |
|
129 |
| num_gpus | 32 |
|
130 |
|
|
|
|
|
|
|
131 |
## Code IFT (Magicoder-Evol-Instruct-110K)
|
132 |
|
133 |
[Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K) (Wei et al., 2023) This dataset contains 72.97M tokens
|
|
|
32 |
| Setting | Dataset | HuggingFace Collection |
|
33 |
| --------| ------| ------ |
|
34 |
| Continued Pretraining - Code | [StarCoder-Python](https://huggingface.co/datasets/bigcode/starcoderdata) | [LoRA-TMLR-2024/continued-pretraining-code-starcoder-python](https://huggingface.co/collections/LoRA-TMLR-2024/continued-pretraining-code-starcoder-python-66f22ce3b26f416f21f58142) |
|
35 |
+
| Continued Pretraing - Math | [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) | [LoRA-TMLR-2024/continued-pretraining-math-openwebmath](https://huggingface.co/collections/LoRA-TMLR-2024/continued-pretraining-math-openwebmath-66f31d12f55fb27de05b2e3f) |
|
36 |
| Instruction Finetuning - Code | [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)| [LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k-66f224a800152f31e4942a3b) |
|
37 |
| Instruction Finetuning - Math | [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | [LoRA-TMLR-2024/instruction-finetuning-math-metamathqa](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-math-metamathqa-66f31cc40fda6b6b938d33e2) |
|
38 |
|
|
|
65 |
## Uses
|
66 |
|
67 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
68 |
+
These are research artifacts that are intended for research purposes only.
|
69 |
|
70 |
|
71 |
## Training Details
|
|
|
111 |
| gradient_clipping | norm (threshold=1) |
|
112 |
| num_gpus | 32 |
|
113 |
|
114 |
+
We trained models for 0.25B, 0.5B, 1B, 2B, 4B, 8B, 16B and 20B tokens. These checkpoints can be found for each LoRA and full finetuning setting in the HuggingFace model branches.
|
115 |
+
|
116 |
## Math CPT (OpenWebMath)
|
117 |
|
118 |
[OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) (Paster et al., 2023) - This dataset contains 14.7B tokens derived from mathematical web pages from Common Crawl, correctly formatted to preserve mathematical content such as LaTeX equations. To match with the StarCoder-Python dataset, we trained on up to 20B tokens, repeating tokens beyond the first 14.7B. An analysis of this dataset shows that it contains a considerable amount of full English sentences.
|
|
|
130 |
| gradient_clipping | norm (threshold=1) |
|
131 |
| num_gpus | 32 |
|
132 |
|
133 |
+
|
134 |
+
We trained models for 0.25B, 0.5B, 1B, 2B, 4B, 8B, 16B and 20B tokens. These checkpoints can be found for each LoRA and full finetuning setting in the HuggingFace model branches.
|
135 |
+
|
136 |
## Code IFT (Magicoder-Evol-Instruct-110K)
|
137 |
|
138 |
[Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K) (Wei et al., 2023) This dataset contains 72.97M tokens
|