MarioBarbeque commited on
Commit
98737b9
·
verified ·
1 Parent(s): 9b6013a
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -122,7 +122,7 @@ The model was trained locally on a single-node with multiple Nvidia A100 GPUs us
122
 
123
  ## Evaluation / Metrics
124
 
125
- We evaluate our text-to-text linear equation solver by using the `exact_match` metric by comparing the model's decoded predicted tokens with their numeric labels. *CyberSolve LinAlg 1.2* scores a **90.75** exact match score
126
  on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
127
 
128
  Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).
 
122
 
123
  ## Evaluation / Metrics
124
 
125
+ We evaluate our text-to-text linear equation solver by using the `exact_match` metric to compare the model's decoded predicted tokens with their numeric labels. *CyberSolve LinAlg 1.2* scores a **90.75** exact match score
126
  on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
127
 
128
  Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).