Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning

🤗 Hugging Face ｜ 🚀Github ｜ 📑 Paper ｜ 📖 English ｜ 📖 Chinese ｜ 💻 Kaggle Notebook

Kyara (Knowledge Yielding Adaptive Retrieval Augmentation) is an experimental project aimed at improving language models through knowledge retrieval processes. The project seeks to enhance the model’s ability to adapt knowledge and improve language comprehension, particularly in underrepresented languages like Traditional Chinese. Given the relatively scarce availability of Traditional Chinese data compared to the vast corpus of English data used for model training, Kyara addresses this gap by expanding the limited corpus for this language.

This release is a preview version of the Kyara-2.5 series. Compared to Kyara-1.5, this iteration incorporates a significantly larger volume of high-quality STEM content and challenging reasoning datasets. Additionally, it employs online reinforcement techniques for preference optimization to refining the model’s performance.

Benchmark

All evaluations are conducted in a zero-shot setting.

Metric	Kyara-9b-it	Gemma-2-9b-it
TMMLUPlus	60.74	54.77
- STEM	69.54	58.11
- Humanities	52.64	48.71
- Other	57.10	51.43
- Social-Science	63.69	60.84
MMLU-Redux	73.04	72.82
GSM8K	90.37	87.41
MATH-L5	31.35	19.42
CRUX	49.25	46.00
MT-Bench	8.81	8.53
MT-Bench-TW	8.36	7.80
Chatbot-Arena-Hard	43.90	33.60
AlignBench	7.25	6.88

Details of TMMLU+

STEM

sub_category	score
advance_chemistry	0.650407
basic_medical_science	0.681342
computer_science	0.839080
engineering_math	0.611650
junior_chemistry	0.708134
junior_math_exam	0.720000
junior_science_exam	0.755869
organic_chemistry	0.678899
pharmacy	0.452685
physics	0.742268
secondary_physics	0.660714
statistics_and_machine_learning	0.794643
tve_mathematics	0.766667
tve_natural_sciences	0.674528

Humanities

sub_category	score
administrative_law	0.454762
anti_money_laundering	0.738806
general_principles_of_law	0.509434
introduction_to_law	0.523207
jce_humanities	0.577778
taxation	0.322667
trust_practice	0.558603

Social Science

sub_category	score
chinese_language_and_literature	0.457286
clinical_psychology	0.664000
economics	0.702290
education	0.653226
education_(profession_level)	0.458848
educational_psychology	0.670455
geography_of_taiwan	0.618490
human_behavior	0.711974
junior_chinese_exam	0.765714
macroeconomics	0.649635
national_protection	0.687204
occupational_therapy_for_psychological_disorders	0.699816
physical_education	0.569832
politic_science	0.658291
taiwanese_hokkien	0.294574
three_principles_of_people	0.697842
ttqav2	0.761062
tve_chinese_language	0.745342

Others

sub_category	score
accounting	0.350785
agriculture	0.476821
auditing	0.516364
business_management	0.661871
culinary_skills	0.636986
dentistry	0.581454
finance_banking	0.592593
financial_analysis	0.722513
fire_science	0.483871
insurance_studies	0.497368
junior_social_studies	0.785714
logic_reasoning	0.589928
management_accounting	0.530233
marketing_management	0.784946
mechanical	0.711864
music	0.521583
nautical_science	0.441016
official_document_management	0.513514
optometry	0.441304
pharmacology	0.639515
real_estate	0.500000
technical	0.604478
trade	0.410359
traditional_chinese_medicine_clinical_medicine	0.456835
tve_design	0.735417
veterinary_pathology	0.519435
veterinary_pharmacology	0.711111

zake7749
/

gemma-2-9b-it-chinese-kyara