RefalMachine's picture
Upload folder using huggingface_hub
c1a6d04 verified
INFO: 2024-07-12 14:38:26,855: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-12 14:38:26,857: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:38:26,857: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:38:26,962: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-12 14:38:26,967: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:38:26,967: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:38:28,342: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-12 14:38:28,344: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645]
INFO: 2024-07-12 14:38:28,344: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 14:38:30,929: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-12 14:38:30,929: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645]
INFO: 2024-07-12 14:38:30,930: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 14:38:31,797: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-12 14:38:31,797: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:38:31,797: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:38:32,851: llmtf.base.darumeru/MultiQ: Loading Dataset: 5.99s
INFO: 2024-07-12 14:38:33,901: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-12 14:38:33,901: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645]
INFO: 2024-07-12 14:38:33,901: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 14:38:36,027: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-12 14:38:36,028: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:38:36,028: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:38:38,261: llmtf.base.daru/treewayabstractive: Loading Dataset: 6.46s
INFO: 2024-07-12 14:38:38,807: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.78s
INFO: 2024-07-12 14:38:45,516: llmtf.base.darumeru/ruMMLU: Loading Dataset: 18.55s
INFO: 2024-07-12 14:38:46,639: llmtf.base.daru/treewayextractive: Loading Dataset: 12.74s
INFO: 2024-07-12 14:40:53,988: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 143.06s
INFO: 2024-07-12 14:41:01,199: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 152.85s
INFO: 2024-07-12 14:46:53,288: llmtf.base.darumeru/MultiQ: Processing Dataset: 500.44s
INFO: 2024-07-12 14:46:53,291: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-12 14:46:53,296: llmtf.base.darumeru/MultiQ: {'f1': 0.386977461015027, 'em': 0.29923518164435947}
INFO: 2024-07-12 14:46:53,303: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:46:53,303: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:46:55,951: llmtf.base.darumeru/PARus: Loading Dataset: 2.65s
INFO: 2024-07-12 14:47:04,040: llmtf.base.darumeru/PARus: Processing Dataset: 8.09s
INFO: 2024-07-12 14:47:04,041: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-12 14:47:04,068: llmtf.base.darumeru/PARus: {'acc': 0.85}
INFO: 2024-07-12 14:47:04,070: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:47:04,070: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:47:06,132: llmtf.base.darumeru/RCB: Loading Dataset: 2.06s
INFO: 2024-07-12 14:47:17,495: llmtf.base.darumeru/RCB: Processing Dataset: 11.36s
INFO: 2024-07-12 14:47:17,497: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-12 14:47:17,534: llmtf.base.darumeru/RCB: {'acc': 0.55, 'f1_macro': 0.4666949497457971}
INFO: 2024-07-12 14:47:17,536: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:47:17,537: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:47:21,666: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 4.13s
INFO: 2024-07-12 14:48:08,393: llmtf.base.darumeru/ruMMLU: Processing Dataset: 562.88s
INFO: 2024-07-12 14:48:08,395: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-12 14:48:08,403: llmtf.base.darumeru/ruMMLU: {'acc': 0.582360570687419}
INFO: 2024-07-12 14:48:08,448: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 14:48:08,455: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruMMLU
0.571 0.343 0.850 0.508 0.582
INFO: 2024-07-12 14:48:49,352: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 610.54s
INFO: 2024-07-12 14:48:49,355: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-12 14:48:49,359: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.538512282980733, 'len': 0.9931441625372947, 'lcs': 0.9360287808828226}
INFO: 2024-07-12 14:48:49,361: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:48:49,361: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:48:52,201: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.84s
INFO: 2024-07-12 14:48:57,178: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 95.51s
INFO: 2024-07-12 14:48:57,180: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-12 14:48:57,193: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.820446735395189, 'f1_macro': 0.8204618323622268}
INFO: 2024-07-12 14:48:57,202: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:48:57,203: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:49:04,588: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.38s
INFO: 2024-07-12 14:51:29,740: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 635.72s
INFO: 2024-07-12 14:51:29,741: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-12 14:51:29,782: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.560000
anatomy 0.614815
astronomy 0.776316
business_ethics 0.750000
clinical_knowledge 0.758491
college_biology 0.777778
college_chemistry 0.460000
college_computer_science 0.640000
college_mathematics 0.370000
college_medicine 0.676301
college_physics 0.421569
computer_security 0.720000
conceptual_physics 0.719149
econometrics 0.587719
electrical_engineering 0.724138
elementary_mathematics 0.624339
formal_logic 0.531746
global_facts 0.460000
high_school_biology 0.822581
high_school_chemistry 0.605911
high_school_computer_science 0.790000
high_school_european_history 0.793939
high_school_geography 0.888889
high_school_government_and_politics 0.922280
high_school_macroeconomics 0.733333
high_school_mathematics 0.500000
high_school_microeconomics 0.819328
high_school_physics 0.443709
high_school_psychology 0.864220
high_school_statistics 0.643519
high_school_us_history 0.867647
high_school_world_history 0.827004
human_aging 0.744395
human_sexuality 0.793893
international_law 0.859504
jurisprudence 0.833333
logical_fallacies 0.791411
machine_learning 0.473214
management 0.805825
marketing 0.910256
medical_genetics 0.820000
miscellaneous 0.855683
moral_disputes 0.774566
moral_scenarios 0.424581
nutrition 0.784314
philosophy 0.739550
prehistory 0.765432
professional_accounting 0.556738
professional_law 0.516297
professional_medicine 0.709559
professional_psychology 0.750000
public_relations 0.727273
security_studies 0.726531
sociology 0.875622
us_foreign_policy 0.830000
virology 0.512048
world_religions 0.824561
INFO: 2024-07-12 14:51:29,791: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.615123
humanities 0.734583
other (business, health, misc.) 0.711316
social sciences 0.793257
INFO: 2024-07-12 14:51:29,799: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7135698102802155}
INFO: 2024-07-12 14:51:29,838: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 14:51:29,846: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA nlpcoreteam/enMMLU
0.687 0.343 0.850 0.508 0.993 0.582 0.820 0.714
INFO: 2024-07-12 14:53:26,124: llmtf.base.darumeru/ruTiE: Processing Dataset: 261.53s
INFO: 2024-07-12 14:53:26,131: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-12 14:53:26,196: llmtf.base.darumeru/ruTiE: {'acc': 0.6348837209302326}
INFO: 2024-07-12 14:53:26,205: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:53:26,205: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:53:28,360: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.15s
INFO: 2024-07-12 14:53:33,270: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.91s
INFO: 2024-07-12 14:53:33,272: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-12 14:53:33,292: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9020443349753694}
INFO: 2024-07-12 14:53:33,293: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:53:33,293: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:53:35,854: llmtf.base.darumeru/RWSD: Loading Dataset: 2.56s
INFO: 2024-07-12 14:53:46,261: llmtf.base.darumeru/RWSD: Processing Dataset: 10.41s
INFO: 2024-07-12 14:53:46,293: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-12 14:53:46,297: llmtf.base.darumeru/RWSD: {'acc': 0.5931372549019608}
INFO: 2024-07-12 14:53:46,299: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:53:46,299: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:53:50,748: llmtf.base.darumeru/USE: Loading Dataset: 4.45s
INFO: 2024-07-12 14:54:21,956: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 800.74s
INFO: 2024-07-12 14:54:21,957: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-12 14:54:21,996: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.470000
anatomy 0.570370
astronomy 0.657895
business_ethics 0.700000
clinical_knowledge 0.652830
college_biology 0.638889
college_chemistry 0.440000
college_computer_science 0.590000
college_mathematics 0.460000
college_medicine 0.618497
college_physics 0.372549
computer_security 0.640000
conceptual_physics 0.625532
econometrics 0.429825
electrical_engineering 0.537931
elementary_mathematics 0.529101
formal_logic 0.476190
global_facts 0.410000
high_school_biology 0.748387
high_school_chemistry 0.532020
high_school_computer_science 0.700000
high_school_european_history 0.757576
high_school_geography 0.717172
high_school_government_and_politics 0.709845
high_school_macroeconomics 0.612821
high_school_mathematics 0.451852
high_school_microeconomics 0.676471
high_school_physics 0.456954
high_school_psychology 0.774312
high_school_statistics 0.587963
high_school_us_history 0.740196
high_school_world_history 0.767932
human_aging 0.618834
human_sexuality 0.656489
international_law 0.735537
jurisprudence 0.657407
logical_fallacies 0.570552
machine_learning 0.410714
management 0.699029
marketing 0.807692
medical_genetics 0.670000
miscellaneous 0.701149
moral_disputes 0.676301
moral_scenarios 0.220112
nutrition 0.679739
philosophy 0.688103
prehistory 0.626543
professional_accounting 0.365248
professional_law 0.417862
professional_medicine 0.602941
professional_psychology 0.570261
public_relations 0.618182
security_studies 0.726531
sociology 0.671642
us_foreign_policy 0.750000
virology 0.481928
world_religions 0.736842
INFO: 2024-07-12 14:54:22,004: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.547210
humanities 0.620858
other (business, health, misc.) 0.612733
social sciences 0.659462
INFO: 2024-07-12 14:54:22,012: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.6100658446710188}
INFO: 2024-07-12 14:54:22,056: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 14:54:22,065: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.687 0.343 0.850 0.508 0.593 0.993 0.582 0.820 0.635 0.903 0.714 0.610
INFO: 2024-07-12 14:56:31,835: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 459.63s
INFO: 2024-07-12 14:56:31,837: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-12 14:56:31,868: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.291582292949926, 'len': 0.9984711650118888, 'lcs': 0.9897423394080185}
INFO: 2024-07-12 14:56:31,870: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 14:56:31,870: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 14:56:34,191: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.32s
INFO: 2024-07-12 14:58:14,794: llmtf.base.darumeru/USE: Processing Dataset: 264.03s
INFO: 2024-07-12 14:58:14,798: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-12 14:58:14,803: llmtf.base.darumeru/USE: {'grade_norm': 0.15784313725490196}
INFO: 2024-07-12 14:58:14,807: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645]
INFO: 2024-07-12 14:58:14,807: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 14:58:21,363: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 6.55s
INFO: 2024-07-12 15:00:15,879: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 114.52s
INFO: 2024-07-12 15:00:15,881: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-12 15:00:15,893: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7341227125941873, 'mcc': 0.3720880731254111}
INFO: 2024-07-12 15:00:15,899: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 15:00:15,925: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.662 0.343 0.850 0.508 0.593 0.158 0.998 0.993 0.582 0.820 0.635 0.903 0.714 0.610 0.553
INFO: 2024-07-12 15:03:26,998: llmtf.base.daru/treewayextractive: Processing Dataset: 1480.36s
INFO: 2024-07-12 15:03:27,003: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-12 15:03:27,254: llmtf.base.daru/treewayextractive: {'r-prec': 0.3917012265512266}
INFO: 2024-07-12 15:03:27,764: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 15:03:27,788: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.644 0.392 0.343 0.850 0.508 0.593 0.158 0.998 0.993 0.582 0.820 0.635 0.903 0.714 0.610 0.553
INFO: 2024-07-12 15:09:56,745: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 802.55s
INFO: 2024-07-12 15:09:56,748: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-12 15:09:56,767: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.654942872308677, 'len': 0.9887005042024545, 'lcs': 0.8325056701679933}
INFO: 2024-07-12 15:09:56,769: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:09:56,769: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:09:59,376: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.61s
INFO: 2024-07-12 15:19:53,024: llmtf.base.darumeru/cp_para_en: Processing Dataset: 593.65s
INFO: 2024-07-12 15:19:53,044: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-12 15:19:53,048: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.345346412423869, 'len': 0.998698675596975, 'lcs': 0.9617108678980784}
INFO: 2024-07-12 15:19:53,049: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 15:19:53,069: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.673 0.392 0.343 0.850 0.508 0.593 0.158 0.962 0.833 0.998 0.993 0.582 0.820 0.635 0.903 0.714 0.610 0.553
INFO: 2024-07-12 15:22:59,702: llmtf.base.daru/treewayabstractive: Processing Dataset: 2661.42s
INFO: 2024-07-12 15:22:59,722: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-12 15:22:59,742: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3366772629962508, 'rouge2': 0.11808146329188003}
INFO: 2024-07-12 15:22:59,745: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 15:22:59,771: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.649 0.227 0.392 0.343 0.850 0.508 0.593 0.158 0.962 0.833 0.998 0.993 0.582 0.820 0.635 0.903 0.714 0.610 0.553