sagorsarker
commited on
Commit
•
12fd976
1
Parent(s):
e47b00a
Update README.md
Browse files
README.md
CHANGED
@@ -113,6 +113,7 @@ We evaluated the models on the following datasets:
|
|
113 |
#### Evaluation on English Benchmark datasets
|
114 |
- **gemma-2-2b** outperforms **titulm-gemma-2-2b-v1.0** across all tasks in both 0-shot and 5-shot settings, achieving the highest scores in **MMLU**, **BoolQ**, **Commonsense QA**, **OpenBook QA**, and **PIQA**, with a peak 5-shot score of **0.80** in **PIQA**.
|
115 |
- **titulm-gemma-2-2b-v1.0** shows competitive performance but lags behind **gemma-2-2b**, particularly in **Commonsense QA** and **BoolQ**, with the highest score being **0.77** in **PIQA**.
|
|
|
116 |
|
117 |
| Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
|
118 |
|--------------------------------------|--------|--------------|------------|--------------------|-----------------|-----------|
|
|
|
113 |
#### Evaluation on English Benchmark datasets
|
114 |
- **gemma-2-2b** outperforms **titulm-gemma-2-2b-v1.0** across all tasks in both 0-shot and 5-shot settings, achieving the highest scores in **MMLU**, **BoolQ**, **Commonsense QA**, **OpenBook QA**, and **PIQA**, with a peak 5-shot score of **0.80** in **PIQA**.
|
115 |
- **titulm-gemma-2-2b-v1.0** shows competitive performance but lags behind **gemma-2-2b**, particularly in **Commonsense QA** and **BoolQ**, with the highest score being **0.77** in **PIQA**.
|
116 |
+
- It is expected as we have trained our model only on Bangla text.
|
117 |
|
118 |
| Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
|
119 |
|--------------------------------------|--------|--------------|------------|--------------------|-----------------|-----------|
|