Update README.md
Browse filesadded evaluation images
README.md
CHANGED
@@ -130,11 +130,14 @@ Minerva-7B-base-v1.0 was trained using [llm-foundry 0.8.0](https://github.com/ri
|
|
130 |
|
131 |
## Model Evaluation
|
132 |
|
133 |
-
|
|
|
|
|
134 |
|
135 |
-
|
136 |
-
|
137 |
-
|
|
|
138 |
|
139 |
<!-- **Italian** Data: -->
|
140 |
<!-- | Task | Accuracy |
|
|
|
130 |
|
131 |
## Model Evaluation
|
132 |
|
133 |
+
For Minerva's evaluation process, we utilized ITA-Bench, a new evaluation suite to test the capabilities of Italian-speaking models.
|
134 |
+
ITA-Bench is a collection of 18 benchmarks that assess the performance of language models on various tasks, including scientific knowledge,
|
135 |
+
commonsense reasoning, and mathematical problem-solving.
|
136 |
|
137 |
+
<div style={{ display: 'flex', justifyContent: 'space-around' }}>
|
138 |
+
<img src="https://huggingface.co/sapienzanlp/Minerva-7B-base-v1.0/resolve/main/Minerva%20LLMs%20Results%20Base%20Models.png" alt="Results on base models" style={{ width: '45%' }}></img>
|
139 |
+
<img src="https://huggingface.co/sapienzanlp/Minerva-7B-base-v1.0/resolve/main/Minerva%20LLMs%20Results%20All%20Base%20Models.png" alt="Results on base models" style={{ width: '45%' }}></img>
|
140 |
+
</div>
|
141 |
|
142 |
<!-- **Italian** Data: -->
|
143 |
<!-- | Task | Accuracy |
|