UNSAFE
/

Mixtress-135M

text generation

Model card Files Files and versions

Vectorrent commited on Oct 13, 2024

Commit

322cf7f

·

verified ·

1 Parent(s): 7368704

Upload README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -23,8 +23,6 @@ datasets:
 Mixtress 135M is a transformer model based upon the [Mixtral](https://huggingface.co/docs/transformers/en/model_doc/mixtral) architecture. It is the culmination of approximately 20 weeks of [Kaggle](https://kaggle.com) free hours, and 67 twelve-hour training runs.
-The results are laughably bad. The model has massively overfit to the training data, and it saw far less tokens than other models of comparable size. But at least I can say we saw it through to completion!
 ## Training data
 Mixtress was trained on a curated sampling of data from the following datasets:
@@ -68,10 +66,12 @@ All evaluations were done using the [Pythia evaluation harness](https://github.c
 ### Scores
-| Model and Size    | ARC-easy   | ARC-challenge | HellaSwag  | OpenBookQA | PiQA       |
-| ----------------- | ---------- | ------------- | ---------- | ---------- | ---------- |
-| gpt-neo-125m      | 22.95      | N/A           | 30.26      | N/A        | N/A        |
-| **Mixtress-135M** | **0.2921** | **0.2457**    | **0.2699** | **0.2180** | **0.5267** |
 ## Join Us

 Mixtress 135M is a transformer model based upon the [Mixtral](https://huggingface.co/docs/transformers/en/model_doc/mixtral) architecture. It is the culmination of approximately 20 weeks of [Kaggle](https://kaggle.com) free hours, and 67 twelve-hour training runs.
 ## Training data
 Mixtress was trained on a curated sampling of data from the following datasets:
 ### Scores
+| Model and Size            | ARC-easy   | ARC-challenge | HellaSwag  | OpenBookQA | PiQA       | TinyMMLU   | TriviaQA | Winogrande |
+| ------------------------- | ---------- | ------------- | ---------- | ---------- | ---------- | ---------- | -------- | ---------- |
+| EleutherAI/gpt-neo-125m   | 22.95%     | N/A           | 30.26%     | N/A        | N/A        | N/A        | N/A      | N/A        |
+| HuggingFaceTB/SmolLM-135M | 43.99%     | N/A           | 42.30%     | N/A        | 69.60%     | 30.23%     | 4.11%    | 52.70%     |
+| OpenAI/GPT2-137M          | 31.09%     | N/A           | 29.76%     | N/A        | 62.51%     | 26.29%     | 0.49%    | 49.72%     |
+| **UNSAFE/Mixtress-135M**  | **29.21%** | **24.57%**    | **26.99%** | **21.80**  | **52.67%** | **31.71%** | **N/A**  | **50.91%** |
 ## Join Us