Achal Dave commited on
Commit
87d9f45
Β·
1 Parent(s): 23b1485
Files changed (1) hide show
  1. README.md +19 -16
README.md CHANGED
@@ -11,6 +11,24 @@ license: apache-2.0
11
 
12
  DCLM-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ## Model Details
15
 
16
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
@@ -53,23 +71,8 @@ We train our 1.4B model for 4.3T tokens on DCLM-Baseline, combined with the
53
  StarCoder and ProofPile2 datasets.
54
  We will update our paper soon with more training details.
55
 
56
- ## Evaluation
57
 
58
- Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite), compared to recently released small models on key benchmarks.
59
- As described in the paper, Core accuracy is the average of centered accuracy on
60
- 22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged
61
- over 53 tasks. We evaluate the models using llm-foundry.
62
-
63
-
64
- | Model | Params | Tokens | Open dataset? | Core | MMLU | Extended |
65
- |-----------------------------------|--------|--------|---------------|----------|----------|-----------|
66
- | **Open weights, closed datasets** | | | | | | |
67
- | Qwen2-1.5B | 1.5B | ? | ❌ | 42.1 | **56.4** | **32.4** |
68
- | Gemma-2B | 2.5B | 3T | ❌ | **43.3** | 40.8 | 26.6 |
69
- | **Open weights, open datasets** | | | | | | |
70
- | OLMo-1B | 1.2B | 3T | βœ… | 29.7 | 26.0 | 16.1 |
71
- | SmolLM | 1.7B | 1T | βœ… | 36.3 | 30.0 | 21.2 |
72
- | DCLM-1B | 1.4B | 4.3T | βœ… | **45.2** | **47.5** | **28.1** |
73
 
74
  | Task | Score |
75
  |------------------------------------------|---------|
 
11
 
12
  DCLM-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
13
 
14
+ ## Evaluation
15
+
16
+ Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite), compared to recently released small models on key benchmarks.
17
+ As described in the paper, Core accuracy is the average of centered accuracy on
18
+ 22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged
19
+ over 53 tasks. We evaluate the models using llm-foundry.
20
+
21
+
22
+ | Model | Params | Tokens | Open dataset? | Core | MMLU | Extended |
23
+ |-----------------------------------|--------|--------|---------------|----------|----------|-----------|
24
+ | **Open weights, closed datasets** | | | | | | |
25
+ | Qwen2-1.5B | 1.5B | ? | ❌ | 42.1 | **56.4** | **32.4** |
26
+ | Gemma-2B | 2.5B | 3T | ❌ | **43.3** | 40.8 | 26.6 |
27
+ | **Open weights, open datasets** | | | | | | |
28
+ | OLMo-1B | 1.2B | 3T | βœ… | 29.7 | 26.0 | 16.1 |
29
+ | SmolLM | 1.7B | 1T | βœ… | 36.3 | 30.0 | 21.2 |
30
+ | DCLM-1B | 1.4B | 4.3T | βœ… | **45.2** | **47.5** | **28.1** |
31
+
32
  ## Model Details
33
 
34
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
 
71
  StarCoder and ProofPile2 datasets.
72
  We will update our paper soon with more training details.
73
 
 
74
 
75
+ ### Detailed evaluation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  | Task | Score |
78
  |------------------------------------------|---------|