clarify
Browse files
README.md
CHANGED
@@ -127,7 +127,7 @@ on the evaluation set of 10k linear equations from the DeepMind *algebra__linear
|
|
127 |
|
128 |
Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).
|
129 |
This dataset was created with the goal of analyzing both the token-to-token and decoded-sequence-to-decoded-sequence partial correctness of CyberSolve's predicitions in detail beyond just its ability to get answers flat out right or wrong. Similar partial correctness benchmark datasets were created for the
|
130 |
-
intial [FLAN-T5 model](https://huggingface.co/datasets/MarioBarbeque/FLAN-T5-DeepMind-LinAlg-1D-benchmark), the
|
131 |
the [1.1 version](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.1-correctness-benchmark) of the model. *We have yet to complete partial correctness analysis of the various model versions and their predicitions, but we look forward to better understanding the mathematical
|
132 |
reasoning capabilities of models and publishing our results when complete!*
|
133 |
|
|
|
127 |
|
128 |
Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).
|
129 |
This dataset was created with the goal of analyzing both the token-to-token and decoded-sequence-to-decoded-sequence partial correctness of CyberSolve's predicitions in detail beyond just its ability to get answers flat out right or wrong. Similar partial correctness benchmark datasets were created for the
|
130 |
+
intial [FLAN-T5 model](https://huggingface.co/datasets/MarioBarbeque/FLAN-T5-DeepMind-LinAlg-1D-benchmark), the [zeroth-generation downsampled training](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-DeepMind-LinAlg-1D-downsample-benchmark-v2) of CyberSolve, and
|
131 |
the [1.1 version](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.1-correctness-benchmark) of the model. *We have yet to complete partial correctness analysis of the various model versions and their predicitions, but we look forward to better understanding the mathematical
|
132 |
reasoning capabilities of models and publishing our results when complete!*
|
133 |
|