edobobo commited on
Commit
454ab56
1 Parent(s): 8f68678

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -166,11 +166,14 @@ For more details please check [our tech report](https://nlp.uniroma1.it/minerva/
166
 
167
  ## Model Evaluation
168
 
169
- We assessed our model using the [LM-Evaluation-Harness](https://github.com/EleutherAI/lm-evaluation-harness) library, which serves as a comprehensive framework for testing generative language models across a wide range of evaluation tasks.
 
 
170
 
171
- All the reported benchmark data was already present in the LM-Evaluation-Harness suite.
172
-
173
- _Scores will be available at later stage._
 
174
 
175
  <!-- **Italian** Data: -->
176
  <!-- | Task | Accuracy |
 
166
 
167
  ## Model Evaluation
168
 
169
+ For Minerva's evaluation process, we utilized ITA-Bench, a new evaluation suite to test the capabilities of Italian-speaking models.
170
+ ITA-Bench is a collection of 18 benchmarks that assess the performance of language models on various tasks, including scientific knowledge,
171
+ commonsense reasoning, and mathematical problem-solving.
172
 
173
+ <div style={{ display: 'flex', justifyContent: 'space-around' }}>
174
+ <img src="https://huggingface.co/sapienzanlp/Minerva-7B-base-v1.0/resolve/main/Minerva%20LLMs%20Results%20Base%20Models.png" alt="Results on base models" style={{ width: '45%' }}></img>
175
+ <img src="https://huggingface.co/sapienzanlp/Minerva-7B-base-v1.0/resolve/main/Minerva%20LLMs%20Results%20All%20Base%20Models.png" alt="Results on base models" style={{ width: '45%' }}></img>
176
+ </div>
177
 
178
  <!-- **Italian** Data: -->
179
  <!-- | Task | Accuracy |