GenVRadmin commited on
Commit
d748c03
·
verified ·
1 Parent(s): 91b70bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -18
README.md CHANGED
@@ -5,8 +5,8 @@ license: mit
5
  This model is finetuned from HuggingFaceH4/zephyr-7b-gemma-v0.1 and is finetuned on 9 Indian languages (Hindi, Tamil, Punjabi, Bengali, Gujarati, Oriya, Telugu, Kannada, Malayalam) plus English.
6
  To improve the resoning and maths skills, we first SFT tune the gemma on Microsoft's Orca datasets.
7
 
8
- We utilize Orca maths Hindi dataset: GenVRadmin/Aryabhatta-Orca-Maths-Hindi
9
- And original Orca maths dataset: microsoft/orca-math-word-problems-200k
10
 
11
  This pushes the MATHS score from 24.3 in Gemma-7B to 25.5 in Zephyr-Gemma and 31.6 in GemmaOrca.
12
 
@@ -15,22 +15,24 @@ The model is then finetuned on GenVR's Samvaad datasets (GenVRadmin/Samvaad-Indi
15
  This is then finetuned on various open sourced datasets like:
16
 
17
  Telugu-LLM-Labs/yahma_alpaca_cleaned_telugu_filtered_and_romanized \
18
- Telugu-LLM-Labs/teknium_GPTeacher_general_instruct_telugu_filtered_and_romanized
19
- abhinand/tamil-alpaca
20
- Tensoic/airoboros-3.2_kn, Tensoic/gpt-teacher_kn
21
- Tensoic/Alpaca-Gujarati
22
- HydraIndicLM/bengali_alpaca_dolly_67k
23
- Open-Orca/OpenOrca
24
- pankajmathur/alpaca_orca
25
- OdiaGenAI/Odia_Alpaca_instructions_52k, OdiaGenAI/gpt-teacher-roleplay-odia-3k
26
- GenVRadmin/Samvaad-Punjabi-Mini
27
- pankajmathur/WizardLM_Orca
 
 
28
 
29
  The model achieves following scores on benchmarks:
30
 
31
- Model AGIEval GPT4All TruthfulQA BigBench Average ⬇️
32
- AryaBhatta-GemmaOrca 39.9 74.26 58.85 43.35 54.09
33
- zephyr-7b-beta 37.52 71.77 55.26 39.77 51.08
34
- zephyr-7b-gemma-v0.1 34.22 66.37 52.19 37.10 47.47
35
- mlabonne/Gemmalpaca-7B 21.6 40.87 44.85 30.49 34.45
36
- google/gemma-7b-it 21.33 40.84 41.70 30.25 33.53
 
5
  This model is finetuned from HuggingFaceH4/zephyr-7b-gemma-v0.1 and is finetuned on 9 Indian languages (Hindi, Tamil, Punjabi, Bengali, Gujarati, Oriya, Telugu, Kannada, Malayalam) plus English.
6
  To improve the resoning and maths skills, we first SFT tune the gemma on Microsoft's Orca datasets.
7
 
8
+ We utilize Orca maths Hindi dataset: GenVRadmin/Aryabhatta-Orca-Maths-Hindi \
9
+ And original Orca maths dataset: microsoft/orca-math-word-problems-200k \
10
 
11
  This pushes the MATHS score from 24.3 in Gemma-7B to 25.5 in Zephyr-Gemma and 31.6 in GemmaOrca.
12
 
 
15
  This is then finetuned on various open sourced datasets like:
16
 
17
  Telugu-LLM-Labs/yahma_alpaca_cleaned_telugu_filtered_and_romanized \
18
+ Telugu-LLM-Labs/teknium_GPTeacher_general_instruct_telugu_filtered_and_romanized \
19
+ abhinand/tamil-alpaca \
20
+ Tensoic/airoboros-3.2_kn \
21
+ Tensoic/gpt-teacher_kn \
22
+ Tensoic/Alpaca-Gujarati \
23
+ HydraIndicLM/bengali_alpaca_dolly_67k \
24
+ Open-Orca/OpenOrca \
25
+ pankajmathur/alpaca_orca \
26
+ OdiaGenAI/Odia_Alpaca_instructions_52k \
27
+ OdiaGenAI/gpt-teacher-roleplay-odia-3k \
28
+ GenVRadmin/Samvaad-Punjabi-Mini \
29
+ pankajmathur/WizardLM_Orca \
30
 
31
  The model achieves following scores on benchmarks:
32
 
33
+ Model AGIEval GPT4All TruthfulQA BigBench Average ⬇️ \
34
+ AryaBhatta-GemmaOrca 39.9 74.26 58.85 43.35 54.09 \
35
+ zephyr-7b-beta 37.52 71.77 55.26 39.77 51.08 \
36
+ zephyr-7b-gemma-v0.1 34.22 66.37 52.19 37.10 47.47 \
37
+ mlabonne/Gemmalpaca-7B 21.6 40.87 44.85 30.49 34.45 \
38
+ google/gemma-7b-it 21.33 40.84 41.70 30.25 33.53 \