MarioBarbeque commited on
Commit
05ad944
·
verified ·
1 Parent(s): 98737b9
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -127,7 +127,7 @@ on the evaluation set of 10k linear equations from the DeepMind *algebra__linear
127
 
128
  Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).
129
  This dataset was created with the goal of analyzing both the token-to-token and decoded-sequence-to-decoded-sequence partial correctness of CyberSolve's predicitions in detail beyond just its ability to get answers flat out right or wrong. Similar partial correctness benchmark datasets were created for the
130
- intial [FLAN-T5 model](https://huggingface.co/datasets/MarioBarbeque/FLAN-T5-DeepMind-LinAlg-1D-benchmark), the preliminary, [zeroth-generation downsampled training](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-DeepMind-LinAlg-1D-downsample-benchmark-v2) of CyberSolve, and
131
  the [1.1 version](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.1-correctness-benchmark) of the model. *We have yet to complete partial correctness analysis of the various model versions and their predicitions, but we look forward to better understanding the mathematical
132
  reasoning capabilities of models and publishing our results when complete!*
133
 
 
127
 
128
  Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).
129
  This dataset was created with the goal of analyzing both the token-to-token and decoded-sequence-to-decoded-sequence partial correctness of CyberSolve's predicitions in detail beyond just its ability to get answers flat out right or wrong. Similar partial correctness benchmark datasets were created for the
130
+ intial [FLAN-T5 model](https://huggingface.co/datasets/MarioBarbeque/FLAN-T5-DeepMind-LinAlg-1D-benchmark), the [zeroth-generation downsampled training](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-DeepMind-LinAlg-1D-downsample-benchmark-v2) of CyberSolve, and
131
  the [1.1 version](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.1-correctness-benchmark) of the model. *We have yet to complete partial correctness analysis of the various model versions and their predicitions, but we look forward to better understanding the mathematical
132
  reasoning capabilities of models and publishing our results when complete!*
133