Update README.md
Browse files
README.md
CHANGED
@@ -147,6 +147,18 @@ Four evaluation metrics were employed across all subsets: language quality, over
|
|
147 |
| relevant_context | 71.3 | 69.1 | **65.5** | 89.5 |
|
148 |
| summarizations | 73.8 | 81.6 | **80.3** | 86.9 |
|
149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
150 |
## Model Details
|
151 |
|
152 |
### Data
|
|
|
147 |
| relevant_context | 71.3 | 69.1 | **65.5** | 89.5 |
|
148 |
| summarizations | 73.8 | 81.6 | **80.3** | 86.9 |
|
149 |
|
150 |
+
|
151 |
+
## Hard Benchmark Eval
|
152 |
+
|
153 |
+
<img src="https://avemio.digital/wp-content/uploads/2025/01/GRAG-NEMO-ORPO.png" alt="GRAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
154 |
+
|
155 |
+
| Metric | [Vanila-Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) | **[GRAG-NEMO-ORPO](https://huggingface.co/avemio/GRAG-NEMO-12B-ORPO-HESSIAN-AI)** | GPT-3.5-TURBO | GPT-4o | GPT-4o-mini |
|
156 |
+
|-------------------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|----------------|---------|-------------|
|
157 |
+
| **OVERALL SCORES (weighted):** | | | | | |
|
158 |
+
| hard_reasoning_de | 43.6 | **49.7** | 37.9 | 62.9 | 58.4 |
|
159 |
+
| hard_reasoning_en | 54.2 | **55.6** | 48.3 | 61.7 | 62.9 |
|
160 |
+
|
161 |
+
|
162 |
## Model Details
|
163 |
|
164 |
### Data
|