GenVRadmin
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,31 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
This model is finetuned from HuggingFaceH4/zephyr-7b-gemma-v0.1 and is finetuned on 8 Indian languages.
|
6 |
+
To improve the resoning and maths skills, we first SFT tune the gemma on Microsoft's Orca datasets.
|
7 |
+
|
8 |
+
We utilize Orca maths Hindi dataset: GenVRadmin/Aryabhatta-Orca-Maths-Hindi
|
9 |
+
And original Orca maths dataset: microsoft/orca-math-word-problems-200k
|
10 |
+
|
11 |
+
This pushes the MATHS score from 24.3 in Gemma-7B to 25.5 in Zephyr-Gemma and 31.6 in GemmaOrca.
|
12 |
+
|
13 |
+
The model is then finetuned on GenVR's Samvaad datasets (GenVRadmin/Samvaad-Indic-Positive and GenVRadmin/Samvaad-Tamil-Mixtral and a subset of GenVRadmin/Samvaad-Mixed-Language-3).
|
14 |
+
|
15 |
+
This is then finetuned on various open sourced datasets like:
|
16 |
+
|
17 |
+
Telugu-LLM-Labs/yahma_alpaca_cleaned_telugu_filtered_and_romanized, Telugu-LLM-Labs/teknium_GPTeacher_general_instruct_telugu_filtered_and_romanized
|
18 |
+
abhinand/tamil-alpaca
|
19 |
+
Tensoic/airoboros-3.2_kn, Tensoic/gpt-teacher_kn
|
20 |
+
Tensoic/Alpaca-Gujarati
|
21 |
+
Open-Orca/OpenOrca
|
22 |
+
pankajmathur/alpaca_orca
|
23 |
+
|
24 |
+
The model achieves following scores on benchmarks:
|
25 |
+
|
26 |
+
Model AGIEval GPT4All TruthfulQA BigBench Average ⬇️
|
27 |
+
AryaBhatta-GemmaOrca 39.9 74.26 58.85 43.35 54.09
|
28 |
+
zephyr-7b-beta 37.52 71.77 55.26 39.77 51.08
|
29 |
+
zephyr-7b-gemma-v0.1 34.22 66.37 52.19 37.10 47.47
|
30 |
+
mlabonne/Gemmalpaca-7B 21.6 40.87 44.85 30.49 34.45
|
31 |
+
google/gemma-7b-it 21.33 40.84 41.70 30.25 33.53
|