LoRA-TMLR-2024
/

magicoder-lora-rank-64-alpha-128

PEFT

Safetensors

English

Model card Files Files and versions Community

jacobfulano commited on Sep 27

Commit

f6e53b7

•

1 Parent(s): a3d5e7e

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ We trained [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) using f
 | Setting | Dataset | HuggingFace Collection |
 | --------| ------| ------ |
 | Continued Pretraining - Code | [StarCoder-Python](https://huggingface.co/datasets/bigcode/starcoderdata) | [LoRA-TMLR-2024/continued-pretraining-code-starcoder-python](https://huggingface.co/collections/LoRA-TMLR-2024/continued-pretraining-code-starcoder-python-66f22ce3b26f416f21f58142) |
-| Continued Pretraing - Math | [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) | TBD |
 | Instruction Finetuning - Code | [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)| [LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k-66f224a800152f31e4942a3b) |
 | Instruction Finetuning - Math | [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | [LoRA-TMLR-2024/instruction-finetuning-math-metamathqa](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-math-metamathqa-66f31cc40fda6b6b938d33e2) |
@@ -65,7 +65,7 @@ with LoRA.
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-These are research artefacts that are intended for research purposes only.
 ## Training Details
@@ -111,6 +111,8 @@ subset and sub-sampled it to 20B tokens.
 | gradient_clipping            | norm (threshold=1)                                                                      |
 | num_gpus                     | 32                                                                                      |
 ## Math CPT (OpenWebMath)
 [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) (Paster et al., 2023) - This dataset contains 14.7B tokens derived from mathematical web pages from Common Crawl, correctly formatted to preserve mathematical content such as LaTeX equations. To match with the StarCoder-Python dataset, we trained on up to 20B tokens, repeating tokens beyond the first 14.7B. An analysis of this dataset shows that it contains a considerable amount of full English sentences.
@@ -128,6 +130,9 @@ subset and sub-sampled it to 20B tokens.
 | gradient_clipping            | norm (threshold=1)                                                                      |
 | num_gpus                     | 32                                                                                      |
 ## Code IFT (Magicoder-Evol-Instruct-110K)
 [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K) (Wei et al., 2023) This dataset contains 72.97M tokens

 | Setting | Dataset | HuggingFace Collection |
 | --------| ------| ------ |
 | Continued Pretraining - Code | [StarCoder-Python](https://huggingface.co/datasets/bigcode/starcoderdata) | [LoRA-TMLR-2024/continued-pretraining-code-starcoder-python](https://huggingface.co/collections/LoRA-TMLR-2024/continued-pretraining-code-starcoder-python-66f22ce3b26f416f21f58142) |
+| Continued Pretraing - Math | [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) | [LoRA-TMLR-2024/continued-pretraining-math-openwebmath](https://huggingface.co/collections/LoRA-TMLR-2024/continued-pretraining-math-openwebmath-66f31d12f55fb27de05b2e3f) |
 | Instruction Finetuning - Code | [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)| [LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-code-magicoder-evol-instruct-110k-66f224a800152f31e4942a3b) |
 | Instruction Finetuning - Math | [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | [LoRA-TMLR-2024/instruction-finetuning-math-metamathqa](https://huggingface.co/collections/LoRA-TMLR-2024/instruction-finetuning-math-metamathqa-66f31cc40fda6b6b938d33e2) |
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+These are research artifacts that are intended for research purposes only.
 ## Training Details
 | gradient_clipping            | norm (threshold=1)                                                                      |
 | num_gpus                     | 32                                                                                      |
+We trained models for 0.25B, 0.5B, 1B, 2B, 4B, 8B, 16B and 20B tokens. These checkpoints can be found for each LoRA and full finetuning setting in the HuggingFace model branches.
 ## Math CPT (OpenWebMath)
 [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) (Paster et al., 2023) - This dataset contains 14.7B tokens derived from mathematical web pages from Common Crawl, correctly formatted to preserve mathematical content such as LaTeX equations. To match with the StarCoder-Python dataset, we trained on up to 20B tokens, repeating tokens beyond the first 14.7B. An analysis of this dataset shows that it contains a considerable amount of full English sentences.
 | gradient_clipping            | norm (threshold=1)                                                                      |
 | num_gpus                     | 32                                                                                      |
+We trained models for 0.25B, 0.5B, 1B, 2B, 4B, 8B, 16B and 20B tokens. These checkpoints can be found for each LoRA and full finetuning setting in the HuggingFace model branches.
 ## Code IFT (Magicoder-Evol-Instruct-110K)
 [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K) (Wei et al., 2023) This dataset contains 72.97M tokens