File size: 1,640 Bytes
d223b73
 
 
7f5a41e
d23f44b
7f5a41e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d23f44b
7f5a41e
 
d23f44b
 
 
7f5a41e
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
license: mit
---

This model is finetuned from HuggingFaceH4/zephyr-7b-gemma-v0.1 and is finetuned on 9 Indian languages (Hindi, Tamil, Punjabi, Bengali, Gujarati, Oriya, Telugu, Kannada, Malayalam) plus English.
To improve the resoning and maths skills, we first SFT tune the gemma on Microsoft's Orca datasets.

We utilize Orca maths Hindi dataset: GenVRadmin/Aryabhatta-Orca-Maths-Hindi
And original Orca maths dataset: microsoft/orca-math-word-problems-200k

This pushes the MATHS score from 24.3 in Gemma-7B to 25.5 in Zephyr-Gemma and 31.6 in GemmaOrca.

The model is then finetuned on GenVR's Samvaad datasets (GenVRadmin/Samvaad-Indic-Positive and GenVRadmin/Samvaad-Tamil-Mixtral and a subset of GenVRadmin/Samvaad-Mixed-Language-3).

This is then finetuned on various open sourced datasets like:

Telugu-LLM-Labs/yahma_alpaca_cleaned_telugu_filtered_and_romanized, Telugu-LLM-Labs/teknium_GPTeacher_general_instruct_telugu_filtered_and_romanized
abhinand/tamil-alpaca
Tensoic/airoboros-3.2_kn, Tensoic/gpt-teacher_kn
Tensoic/Alpaca-Gujarati
HydraIndicLM/bengali_alpaca_dolly_67k
Open-Orca/OpenOrca
pankajmathur/alpaca_orca
OdiaGenAI/Odia_Alpaca_instructions_52k, OdiaGenAI/gpt-teacher-roleplay-odia-3k
GenVRadmin/Samvaad-Punjabi-Mini
pankajmathur/WizardLM_Orca

The model achieves following scores on benchmarks:

Model	AGIEval	GPT4All	TruthfulQA	BigBench	Average ⬇️
AryaBhatta-GemmaOrca  39.9   74.26   58.85   43.35   54.09
zephyr-7b-beta	37.52	71.77	55.26	39.77	51.08
zephyr-7b-gemma-v0.1	34.22	66.37	52.19	37.10	47.47
mlabonne/Gemmalpaca-7B	21.6	40.87	44.85	30.49	34.45
google/gemma-7b-it	21.33	40.84	41.70	30.25	33.53