PEFT
Safetensors
English
jacobfulano commited on
Commit
f6e53b7
1 Parent(s): a3d5e7e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -32,7 +32,7 @@ We trained [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) using f
32
  | Setting | Dataset | HuggingFace Collection |
33
  | --------| ------| ------ |
34
  | Continued Pretraining - Code | [StarCoder-Python](https://huggingface.co/datasets/bigcode/starcoderdata) | [LoRA-TMLR-2024/continued-pretraining-code-starcoder-python](https://huggingface.co/collections/LoRA-TMLR-2024/continued-pretraining-code-starcoder-python-66f22ce3b26f416f21f58142) |
35
- | Continued Pretraing - Math | [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) | TBD |
36
  | Instruction Finetuning - Code | [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)| [LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k-66f224a800152f31e4942a3b) |
37
  | Instruction Finetuning - Math | [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | [LoRA-TMLR-2024/instruction-finetuning-math-metamathqa](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-math-metamathqa-66f31cc40fda6b6b938d33e2) |
38
 
@@ -65,7 +65,7 @@ with LoRA.
65
  ## Uses
66
 
67
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
68
- These are research artefacts that are intended for research purposes only.
69
 
70
 
71
  ## Training Details
@@ -111,6 +111,8 @@ subset and sub-sampled it to 20B tokens.
111
  | gradient_clipping | norm (threshold=1) |
112
  | num_gpus | 32 |
113
 
 
 
114
  ## Math CPT (OpenWebMath)
115
 
116
  [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) (Paster et al., 2023) - This dataset contains 14.7B tokens derived from mathematical web pages from Common Crawl, correctly formatted to preserve mathematical content such as LaTeX equations. To match with the StarCoder-Python dataset, we trained on up to 20B tokens, repeating tokens beyond the first 14.7B. An analysis of this dataset shows that it contains a considerable amount of full English sentences.
@@ -128,6 +130,9 @@ subset and sub-sampled it to 20B tokens.
128
  | gradient_clipping | norm (threshold=1) |
129
  | num_gpus | 32 |
130
 
 
 
 
131
  ## Code IFT (Magicoder-Evol-Instruct-110K)
132
 
133
  [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K) (Wei et al., 2023) This dataset contains 72.97M tokens
 
32
  | Setting | Dataset | HuggingFace Collection |
33
  | --------| ------| ------ |
34
  | Continued Pretraining - Code | [StarCoder-Python](https://huggingface.co/datasets/bigcode/starcoderdata) | [LoRA-TMLR-2024/continued-pretraining-code-starcoder-python](https://huggingface.co/collections/LoRA-TMLR-2024/continued-pretraining-code-starcoder-python-66f22ce3b26f416f21f58142) |
35
+ | Continued Pretraing - Math | [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) | [LoRA-TMLR-2024/continued-pretraining-math-openwebmath](https://huggingface.co/collections/LoRA-TMLR-2024/continued-pretraining-math-openwebmath-66f31d12f55fb27de05b2e3f) |
36
  | Instruction Finetuning - Code | [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)| [LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k-66f224a800152f31e4942a3b) |
37
  | Instruction Finetuning - Math | [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | [LoRA-TMLR-2024/instruction-finetuning-math-metamathqa](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-math-metamathqa-66f31cc40fda6b6b938d33e2) |
38
 
 
65
  ## Uses
66
 
67
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
68
+ These are research artifacts that are intended for research purposes only.
69
 
70
 
71
  ## Training Details
 
111
  | gradient_clipping | norm (threshold=1) |
112
  | num_gpus | 32 |
113
 
114
+ We trained models for 0.25B, 0.5B, 1B, 2B, 4B, 8B, 16B and 20B tokens. These checkpoints can be found for each LoRA and full finetuning setting in the HuggingFace model branches.
115
+
116
  ## Math CPT (OpenWebMath)
117
 
118
  [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) (Paster et al., 2023) - This dataset contains 14.7B tokens derived from mathematical web pages from Common Crawl, correctly formatted to preserve mathematical content such as LaTeX equations. To match with the StarCoder-Python dataset, we trained on up to 20B tokens, repeating tokens beyond the first 14.7B. An analysis of this dataset shows that it contains a considerable amount of full English sentences.
 
130
  | gradient_clipping | norm (threshold=1) |
131
  | num_gpus | 32 |
132
 
133
+
134
+ We trained models for 0.25B, 0.5B, 1B, 2B, 4B, 8B, 16B and 20B tokens. These checkpoints can be found for each LoRA and full finetuning setting in the HuggingFace model branches.
135
+
136
  ## Code IFT (Magicoder-Evol-Instruct-110K)
137
 
138
  [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K) (Wei et al., 2023) This dataset contains 72.97M tokens