GenVRadmin commited on
Commit
7f5a41e
·
verified ·
1 Parent(s): 71272f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -1,3 +1,31 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ This model is finetuned from HuggingFaceH4/zephyr-7b-gemma-v0.1 and is finetuned on 8 Indian languages.
6
+ To improve the resoning and maths skills, we first SFT tune the gemma on Microsoft's Orca datasets.
7
+
8
+ We utilize Orca maths Hindi dataset: GenVRadmin/Aryabhatta-Orca-Maths-Hindi
9
+ And original Orca maths dataset: microsoft/orca-math-word-problems-200k
10
+
11
+ This pushes the MATHS score from 24.3 in Gemma-7B to 25.5 in Zephyr-Gemma and 31.6 in GemmaOrca.
12
+
13
+ The model is then finetuned on GenVR's Samvaad datasets (GenVRadmin/Samvaad-Indic-Positive and GenVRadmin/Samvaad-Tamil-Mixtral and a subset of GenVRadmin/Samvaad-Mixed-Language-3).
14
+
15
+ This is then finetuned on various open sourced datasets like:
16
+
17
+ Telugu-LLM-Labs/yahma_alpaca_cleaned_telugu_filtered_and_romanized, Telugu-LLM-Labs/teknium_GPTeacher_general_instruct_telugu_filtered_and_romanized
18
+ abhinand/tamil-alpaca
19
+ Tensoic/airoboros-3.2_kn, Tensoic/gpt-teacher_kn
20
+ Tensoic/Alpaca-Gujarati
21
+ Open-Orca/OpenOrca
22
+ pankajmathur/alpaca_orca
23
+
24
+ The model achieves following scores on benchmarks:
25
+
26
+ Model AGIEval GPT4All TruthfulQA BigBench Average ⬇️
27
+ AryaBhatta-GemmaOrca 39.9 74.26 58.85 43.35 54.09
28
+ zephyr-7b-beta 37.52 71.77 55.26 39.77 51.08
29
+ zephyr-7b-gemma-v0.1 34.22 66.37 52.19 37.10 47.47
30
+ mlabonne/Gemmalpaca-7B 21.6 40.87 44.85 30.49 34.45
31
+ google/gemma-7b-it 21.33 40.84 41.70 30.25 33.53