TRI-ML
/

DCLM-1B

Model card Files Files and versions Community

Achal Dave commited on Jul 22, 2024

Commit

2fbae44

·

1 Parent(s): 62f139c

Update

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -13,13 +13,13 @@ DCLM-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline d
 ## Evaluation
-Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite), compared to recently released small models on key benchmarks.
 As described in the paper, Core accuracy is the average of centered accuracy on
 22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged
 over 53 tasks.
-| Model                             | Params | Tokens | Open dataset? | Core  | MMLU     | Extended |
 |-----------------------------------|--------|--------|---------------|----------|----------|-----------|
 | **Open weights, closed datasets** |        |        |               |          |          |           |
 | Qwen2-1.5B                        | 1.5B   | ?      | ❌             | 42.1     | **56.4** | **32.4**  |

 ## Evaluation
+We evaluate DCLM-1B using the [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite, and compare to recently released small models on key benchmarks.
 As described in the paper, Core accuracy is the average of centered accuracy on
 22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged
 over 53 tasks.
+| Model                             | Params | Tokens | Open dataset? | Core  | MMLU 5-shot    | Extended |
 |-----------------------------------|--------|--------|---------------|----------|----------|-----------|
 | **Open weights, closed datasets** |        |        |               |          |          |           |
 | Qwen2-1.5B                        | 1.5B   | ?      | ❌             | 42.1     | **56.4** | **32.4**  |