MarioBarbeque
/

CyberSolve-LinAlg-1.2

text2text-generation

text-generation-inference

Model card Files Files and versions Community

MarioBarbeque commited on Jan 27

Commit

98737b9

·

verified ·

1 Parent(s): 9b6013a

fix typo

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -122,7 +122,7 @@ The model was trained locally on a single-node with multiple Nvidia A100 GPUs us
 ## Evaluation / Metrics
-We evaluate our text-to-text linear equation solver by using the `exact_match` metric by comparing the model's decoded predicted tokens with their numeric labels. *CyberSolve LinAlg 1.2* scores a **90.75** exact match score
 on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
 Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).

 ## Evaluation / Metrics
+We evaluate our text-to-text linear equation solver by using the `exact_match` metric to compare the model's decoded predicted tokens with their numeric labels. *CyberSolve LinAlg 1.2* scores a **90.75** exact match score
 on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
 Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).