Update README.md
Browse files
README.md
CHANGED
@@ -166,11 +166,14 @@ For more details please check [our tech report](https://nlp.uniroma1.it/minerva/
|
|
166 |
|
167 |
## Model Evaluation
|
168 |
|
169 |
-
|
|
|
|
|
170 |
|
171 |
-
|
172 |
-
|
173 |
-
|
|
|
174 |
|
175 |
<!-- **Italian** Data: -->
|
176 |
<!-- | Task | Accuracy |
|
|
|
166 |
|
167 |
## Model Evaluation
|
168 |
|
169 |
+
For Minerva's evaluation process, we utilized ITA-Bench, a new evaluation suite to test the capabilities of Italian-speaking models.
|
170 |
+
ITA-Bench is a collection of 18 benchmarks that assess the performance of language models on various tasks, including scientific knowledge,
|
171 |
+
commonsense reasoning, and mathematical problem-solving.
|
172 |
|
173 |
+
<div style={{ display: 'flex', justifyContent: 'space-around' }}>
|
174 |
+
<img src="https://huggingface.co/sapienzanlp/Minerva-7B-base-v1.0/resolve/main/Minerva%20LLMs%20Results%20Base%20Models.png" alt="Results on base models" style={{ width: '45%' }}></img>
|
175 |
+
<img src="https://huggingface.co/sapienzanlp/Minerva-7B-base-v1.0/resolve/main/Minerva%20LLMs%20Results%20All%20Base%20Models.png" alt="Results on base models" style={{ width: '45%' }}></img>
|
176 |
+
</div>
|
177 |
|
178 |
<!-- **Italian** Data: -->
|
179 |
<!-- | Task | Accuracy |
|