fix typo
Browse files
README.md
CHANGED
@@ -122,7 +122,7 @@ The model was trained locally on a single-node with multiple Nvidia A100 GPUs us
|
|
122 |
|
123 |
## Evaluation / Metrics
|
124 |
|
125 |
-
We evaluate our text-to-text linear equation solver by using the `exact_match` metric
|
126 |
on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
|
127 |
|
128 |
Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).
|
|
|
122 |
|
123 |
## Evaluation / Metrics
|
124 |
|
125 |
+
We evaluate our text-to-text linear equation solver by using the `exact_match` metric to compare the model's decoded predicted tokens with their numeric labels. *CyberSolve LinAlg 1.2* scores a **90.75** exact match score
|
126 |
on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
|
127 |
|
128 |
Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).
|