edobobo commited on
Commit
5d3cd6b
·
verified ·
1 Parent(s): 722eeb1

Update README.md

Browse files

added evaluation images

Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -130,11 +130,14 @@ Minerva-7B-base-v1.0 was trained using [llm-foundry 0.8.0](https://github.com/ri
130
 
131
  ## Model Evaluation
132
 
133
- We assessed our model using the [LM-Evaluation-Harness](https://github.com/EleutherAI/lm-evaluation-harness) library, which serves as a comprehensive framework for testing generative language models across a wide range of evaluation tasks.
 
 
134
 
135
- All the reported benchmark data was already present in the LM-Evaluation-Harness suite.
136
-
137
- _Scores will be available at later stage._
 
138
 
139
  <!-- **Italian** Data: -->
140
  <!-- | Task | Accuracy |
 
130
 
131
  ## Model Evaluation
132
 
133
+ For Minerva's evaluation process, we utilized ITA-Bench, a new evaluation suite to test the capabilities of Italian-speaking models.
134
+ ITA-Bench is a collection of 18 benchmarks that assess the performance of language models on various tasks, including scientific knowledge,
135
+ commonsense reasoning, and mathematical problem-solving.
136
 
137
+ <div style={{ display: 'flex', justifyContent: 'space-around' }}>
138
+ <img src="https://huggingface.co/sapienzanlp/Minerva-7B-base-v1.0/resolve/main/Minerva%20LLMs%20Results%20Base%20Models.png" alt="Results on base models" style={{ width: '45%' }}></img>
139
+ <img src="https://huggingface.co/sapienzanlp/Minerva-7B-base-v1.0/resolve/main/Minerva%20LLMs%20Results%20All%20Base%20Models.png" alt="Results on base models" style={{ width: '45%' }}></img>
140
+ </div>
141
 
142
  <!-- **Italian** Data: -->
143
  <!-- | Task | Accuracy |