Achal Dave
commited on
Commit
Β·
87d9f45
1
Parent(s):
23b1485
Rearrange
Browse files
README.md
CHANGED
@@ -11,6 +11,24 @@ license: apache-2.0
|
|
11 |
|
12 |
DCLM-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
## Model Details
|
15 |
|
16 |
| Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
|
@@ -53,23 +71,8 @@ We train our 1.4B model for 4.3T tokens on DCLM-Baseline, combined with the
|
|
53 |
StarCoder and ProofPile2 datasets.
|
54 |
We will update our paper soon with more training details.
|
55 |
|
56 |
-
## Evaluation
|
57 |
|
58 |
-
|
59 |
-
As described in the paper, Core accuracy is the average of centered accuracy on
|
60 |
-
22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged
|
61 |
-
over 53 tasks. We evaluate the models using llm-foundry.
|
62 |
-
|
63 |
-
|
64 |
-
| Model | Params | Tokens | Open dataset? | Core | MMLU | Extended |
|
65 |
-
|-----------------------------------|--------|--------|---------------|----------|----------|-----------|
|
66 |
-
| **Open weights, closed datasets** | | | | | | |
|
67 |
-
| Qwen2-1.5B | 1.5B | ? | β | 42.1 | **56.4** | **32.4** |
|
68 |
-
| Gemma-2B | 2.5B | 3T | β | **43.3** | 40.8 | 26.6 |
|
69 |
-
| **Open weights, open datasets** | | | | | | |
|
70 |
-
| OLMo-1B | 1.2B | 3T | β
| 29.7 | 26.0 | 16.1 |
|
71 |
-
| SmolLM | 1.7B | 1T | β
| 36.3 | 30.0 | 21.2 |
|
72 |
-
| DCLM-1B | 1.4B | 4.3T | β
| **45.2** | **47.5** | **28.1** |
|
73 |
|
74 |
| Task | Score |
|
75 |
|------------------------------------------|---------|
|
|
|
11 |
|
12 |
DCLM-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
|
13 |
|
14 |
+
## Evaluation
|
15 |
+
|
16 |
+
Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite), compared to recently released small models on key benchmarks.
|
17 |
+
As described in the paper, Core accuracy is the average of centered accuracy on
|
18 |
+
22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged
|
19 |
+
over 53 tasks. We evaluate the models using llm-foundry.
|
20 |
+
|
21 |
+
|
22 |
+
| Model | Params | Tokens | Open dataset? | Core | MMLU | Extended |
|
23 |
+
|-----------------------------------|--------|--------|---------------|----------|----------|-----------|
|
24 |
+
| **Open weights, closed datasets** | | | | | | |
|
25 |
+
| Qwen2-1.5B | 1.5B | ? | β | 42.1 | **56.4** | **32.4** |
|
26 |
+
| Gemma-2B | 2.5B | 3T | β | **43.3** | 40.8 | 26.6 |
|
27 |
+
| **Open weights, open datasets** | | | | | | |
|
28 |
+
| OLMo-1B | 1.2B | 3T | β
| 29.7 | 26.0 | 16.1 |
|
29 |
+
| SmolLM | 1.7B | 1T | β
| 36.3 | 30.0 | 21.2 |
|
30 |
+
| DCLM-1B | 1.4B | 4.3T | β
| **45.2** | **47.5** | **28.1** |
|
31 |
+
|
32 |
## Model Details
|
33 |
|
34 |
| Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
|
|
|
71 |
StarCoder and ProofPile2 datasets.
|
72 |
We will update our paper soon with more training details.
|
73 |
|
|
|
74 |
|
75 |
+
### Detailed evaluation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
|
77 |
| Task | Score |
|
78 |
|------------------------------------------|---------|
|