avemio-digital commited on
Commit
4e066ee
verified
1 Parent(s): 68ee678

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -147,6 +147,18 @@ Four evaluation metrics were employed across all subsets: language quality, over
147
  | relevant_context | 71.3 | 69.1 | **65.5** | 89.5 |
148
  | summarizations | 73.8 | 81.6 | **80.3** | 86.9 |
149
 
 
 
 
 
 
 
 
 
 
 
 
 
150
  ## Model Details
151
 
152
  ### Data
 
147
  | relevant_context | 71.3 | 69.1 | **65.5** | 89.5 |
148
  | summarizations | 73.8 | 81.6 | **80.3** | 86.9 |
149
 
150
+
151
+ ## Hard Benchmark Eval
152
+
153
+ <img src="https://avemio.digital/wp-content/uploads/2025/01/GRAG-NEMO-ORPO.png" alt="GRAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
154
+
155
+ | Metric | [Vanila-Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) | **[GRAG-NEMO-ORPO](https://huggingface.co/avemio/GRAG-NEMO-12B-ORPO-HESSIAN-AI)** | GPT-3.5-TURBO | GPT-4o | GPT-4o-mini |
156
+ |-------------------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|----------------|---------|-------------|
157
+ | **OVERALL SCORES (weighted):** | | | | | |
158
+ | hard_reasoning_de | 43.6 | **49.7** | 37.9 | 62.9 | 58.4 |
159
+ | hard_reasoning_en | 54.2 | **55.6** | 48.3 | 61.7 | 62.9 |
160
+
161
+
162
  ## Model Details
163
 
164
  ### Data