TRI-ML
/

DCLM-1B

Transformers

Safetensors

openlm

Model card Files Files and versions Community

Achal Dave commited on Jul 22, 2024

Commit

87d9f45

1 Parent(s): 23b1485

Rearrange

Browse files

Files changed (1) hide show

README.md +19 -16

README.md CHANGED Viewed

@@ -11,6 +11,24 @@ license: apache-2.0
 DCLM-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
 ## Model Details
 | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
@@ -53,23 +71,8 @@ We train our 1.4B model for 4.3T tokens on DCLM-Baseline, combined with the
 StarCoder and ProofPile2 datasets.
 We will update our paper soon with more training details.
-## Evaluation
-Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite), compared to recently released small models on key benchmarks.
-As described in the paper, Core accuracy is the average of centered accuracy on
-22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged
-over 53 tasks.  We evaluate the models using llm-foundry.
-| Model                             | Params | Tokens | Open dataset? | Core  | MMLU     | Extended |
-|-----------------------------------|--------|--------|---------------|----------|----------|-----------|
-| **Open weights, closed datasets** |        |        |               |          |          |           |
-| Qwen2-1.5B                        | 1.5B   | ?      | ❌             | 42.1     | **56.4** | **32.4**  |
-| Gemma-2B                          | 2.5B   | 3T     | ❌             | **43.3** | 40.8     | 26.6      |
-| **Open weights, open datasets**   |        |        |               |          |          |           |
-| OLMo-1B                           | 1.2B   | 3T     | ✅             | 29.7     | 26.0     | 16.1      |
-| SmolLM                            | 1.7B   | 1T     | ✅             | 36.3     | 30.0     | 21.2      |
-| DCLM-1B                           | 1.4B   | 4.3T   | ✅             | **45.2** | **47.5** | **28.1**  |
 | Task                                     | Score   |
 |------------------------------------------|---------|

 DCLM-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
+## Evaluation
+Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite), compared to recently released small models on key benchmarks.
+As described in the paper, Core accuracy is the average of centered accuracy on
+22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged
+over 53 tasks.  We evaluate the models using llm-foundry.
+| Model                             | Params | Tokens | Open dataset? | Core  | MMLU     | Extended |
+|-----------------------------------|--------|--------|---------------|----------|----------|-----------|
+| **Open weights, closed datasets** |        |        |               |          |          |           |
+| Qwen2-1.5B                        | 1.5B   | ?      | ❌             | 42.1     | **56.4** | **32.4**  |
+| Gemma-2B                          | 2.5B   | 3T     | ❌             | **43.3** | 40.8     | 26.6      |
+| **Open weights, open datasets**   |        |        |               |          |          |           |
+| OLMo-1B                           | 1.2B   | 3T     | ✅             | 29.7     | 26.0     | 16.1      |
+| SmolLM                            | 1.7B   | 1T     | ✅             | 36.3     | 30.0     | 21.2      |
+| DCLM-1B                           | 1.4B   | 4.3T   | ✅             | **45.2** | **47.5** | **28.1**  |
 ## Model Details
 | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
 StarCoder and ProofPile2 datasets.
 We will update our paper soon with more training details.
+### Detailed evaluation
 | Task                                     | Score   |
 |------------------------------------------|---------|