|
INFO: 2024-07-12 14:38:26,855: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
|
INFO: 2024-07-12 14:38:26,857: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:38:26,857: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:38:26,962: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
|
INFO: 2024-07-12 14:38:26,967: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:38:26,967: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:38:28,342: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-07-12 14:38:28,344: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 14:38:28,344: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 14:38:30,929: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-07-12 14:38:30,929: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 14:38:30,930: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 14:38:31,797: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-07-12 14:38:31,797: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:38:31,797: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:38:32,851: llmtf.base.darumeru/MultiQ: Loading Dataset: 5.99s |
|
INFO: 2024-07-12 14:38:33,901: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-07-12 14:38:33,901: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 14:38:33,901: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 14:38:36,027: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
|
INFO: 2024-07-12 14:38:36,028: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:38:36,028: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:38:38,261: llmtf.base.daru/treewayabstractive: Loading Dataset: 6.46s |
|
INFO: 2024-07-12 14:38:38,807: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.78s |
|
INFO: 2024-07-12 14:38:45,516: llmtf.base.darumeru/ruMMLU: Loading Dataset: 18.55s |
|
INFO: 2024-07-12 14:38:46,639: llmtf.base.daru/treewayextractive: Loading Dataset: 12.74s |
|
INFO: 2024-07-12 14:40:53,988: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 143.06s |
|
INFO: 2024-07-12 14:41:01,199: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 152.85s |
|
INFO: 2024-07-12 14:46:53,288: llmtf.base.darumeru/MultiQ: Processing Dataset: 500.44s |
|
INFO: 2024-07-12 14:46:53,291: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-07-12 14:46:53,296: llmtf.base.darumeru/MultiQ: {'f1': 0.386977461015027, 'em': 0.29923518164435947} |
|
INFO: 2024-07-12 14:46:53,303: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:46:53,303: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:46:55,951: llmtf.base.darumeru/PARus: Loading Dataset: 2.65s |
|
INFO: 2024-07-12 14:47:04,040: llmtf.base.darumeru/PARus: Processing Dataset: 8.09s |
|
INFO: 2024-07-12 14:47:04,041: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-07-12 14:47:04,068: llmtf.base.darumeru/PARus: {'acc': 0.85} |
|
INFO: 2024-07-12 14:47:04,070: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:47:04,070: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:47:06,132: llmtf.base.darumeru/RCB: Loading Dataset: 2.06s |
|
INFO: 2024-07-12 14:47:17,495: llmtf.base.darumeru/RCB: Processing Dataset: 11.36s |
|
INFO: 2024-07-12 14:47:17,497: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-07-12 14:47:17,534: llmtf.base.darumeru/RCB: {'acc': 0.55, 'f1_macro': 0.4666949497457971} |
|
INFO: 2024-07-12 14:47:17,536: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:47:17,537: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:47:21,666: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 4.13s |
|
INFO: 2024-07-12 14:48:08,393: llmtf.base.darumeru/ruMMLU: Processing Dataset: 562.88s |
|
INFO: 2024-07-12 14:48:08,395: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
|
INFO: 2024-07-12 14:48:08,403: llmtf.base.darumeru/ruMMLU: {'acc': 0.582360570687419} |
|
INFO: 2024-07-12 14:48:08,448: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 14:48:08,455: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruMMLU |
|
0.571 0.343 0.850 0.508 0.582 |
|
INFO: 2024-07-12 14:48:49,352: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 610.54s |
|
INFO: 2024-07-12 14:48:49,355: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
|
INFO: 2024-07-12 14:48:49,359: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.538512282980733, 'len': 0.9931441625372947, 'lcs': 0.9360287808828226} |
|
INFO: 2024-07-12 14:48:49,361: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:48:49,361: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:48:52,201: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.84s |
|
INFO: 2024-07-12 14:48:57,178: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 95.51s |
|
INFO: 2024-07-12 14:48:57,180: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-07-12 14:48:57,193: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.820446735395189, 'f1_macro': 0.8204618323622268} |
|
INFO: 2024-07-12 14:48:57,202: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:48:57,203: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:49:04,588: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.38s |
|
INFO: 2024-07-12 14:51:29,740: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 635.72s |
|
INFO: 2024-07-12 14:51:29,741: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-07-12 14:51:29,782: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.560000 |
|
anatomy 0.614815 |
|
astronomy 0.776316 |
|
business_ethics 0.750000 |
|
clinical_knowledge 0.758491 |
|
college_biology 0.777778 |
|
college_chemistry 0.460000 |
|
college_computer_science 0.640000 |
|
college_mathematics 0.370000 |
|
college_medicine 0.676301 |
|
college_physics 0.421569 |
|
computer_security 0.720000 |
|
conceptual_physics 0.719149 |
|
econometrics 0.587719 |
|
electrical_engineering 0.724138 |
|
elementary_mathematics 0.624339 |
|
formal_logic 0.531746 |
|
global_facts 0.460000 |
|
high_school_biology 0.822581 |
|
high_school_chemistry 0.605911 |
|
high_school_computer_science 0.790000 |
|
high_school_european_history 0.793939 |
|
high_school_geography 0.888889 |
|
high_school_government_and_politics 0.922280 |
|
high_school_macroeconomics 0.733333 |
|
high_school_mathematics 0.500000 |
|
high_school_microeconomics 0.819328 |
|
high_school_physics 0.443709 |
|
high_school_psychology 0.864220 |
|
high_school_statistics 0.643519 |
|
high_school_us_history 0.867647 |
|
high_school_world_history 0.827004 |
|
human_aging 0.744395 |
|
human_sexuality 0.793893 |
|
international_law 0.859504 |
|
jurisprudence 0.833333 |
|
logical_fallacies 0.791411 |
|
machine_learning 0.473214 |
|
management 0.805825 |
|
marketing 0.910256 |
|
medical_genetics 0.820000 |
|
miscellaneous 0.855683 |
|
moral_disputes 0.774566 |
|
moral_scenarios 0.424581 |
|
nutrition 0.784314 |
|
philosophy 0.739550 |
|
prehistory 0.765432 |
|
professional_accounting 0.556738 |
|
professional_law 0.516297 |
|
professional_medicine 0.709559 |
|
professional_psychology 0.750000 |
|
public_relations 0.727273 |
|
security_studies 0.726531 |
|
sociology 0.875622 |
|
us_foreign_policy 0.830000 |
|
virology 0.512048 |
|
world_religions 0.824561 |
|
INFO: 2024-07-12 14:51:29,791: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.615123 |
|
humanities 0.734583 |
|
other (business, health, misc.) 0.711316 |
|
social sciences 0.793257 |
|
INFO: 2024-07-12 14:51:29,799: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7135698102802155} |
|
INFO: 2024-07-12 14:51:29,838: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 14:51:29,846: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA nlpcoreteam/enMMLU |
|
0.687 0.343 0.850 0.508 0.993 0.582 0.820 0.714 |
|
INFO: 2024-07-12 14:53:26,124: llmtf.base.darumeru/ruTiE: Processing Dataset: 261.53s |
|
INFO: 2024-07-12 14:53:26,131: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: |
|
INFO: 2024-07-12 14:53:26,196: llmtf.base.darumeru/ruTiE: {'acc': 0.6348837209302326} |
|
INFO: 2024-07-12 14:53:26,205: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:53:26,205: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:53:28,360: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.15s |
|
INFO: 2024-07-12 14:53:33,270: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.91s |
|
INFO: 2024-07-12 14:53:33,272: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-07-12 14:53:33,292: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9020443349753694} |
|
INFO: 2024-07-12 14:53:33,293: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:53:33,293: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:53:35,854: llmtf.base.darumeru/RWSD: Loading Dataset: 2.56s |
|
INFO: 2024-07-12 14:53:46,261: llmtf.base.darumeru/RWSD: Processing Dataset: 10.41s |
|
INFO: 2024-07-12 14:53:46,293: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-07-12 14:53:46,297: llmtf.base.darumeru/RWSD: {'acc': 0.5931372549019608} |
|
INFO: 2024-07-12 14:53:46,299: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:53:46,299: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:53:50,748: llmtf.base.darumeru/USE: Loading Dataset: 4.45s |
|
INFO: 2024-07-12 14:54:21,956: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 800.74s |
|
INFO: 2024-07-12 14:54:21,957: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-07-12 14:54:21,996: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.470000 |
|
anatomy 0.570370 |
|
astronomy 0.657895 |
|
business_ethics 0.700000 |
|
clinical_knowledge 0.652830 |
|
college_biology 0.638889 |
|
college_chemistry 0.440000 |
|
college_computer_science 0.590000 |
|
college_mathematics 0.460000 |
|
college_medicine 0.618497 |
|
college_physics 0.372549 |
|
computer_security 0.640000 |
|
conceptual_physics 0.625532 |
|
econometrics 0.429825 |
|
electrical_engineering 0.537931 |
|
elementary_mathematics 0.529101 |
|
formal_logic 0.476190 |
|
global_facts 0.410000 |
|
high_school_biology 0.748387 |
|
high_school_chemistry 0.532020 |
|
high_school_computer_science 0.700000 |
|
high_school_european_history 0.757576 |
|
high_school_geography 0.717172 |
|
high_school_government_and_politics 0.709845 |
|
high_school_macroeconomics 0.612821 |
|
high_school_mathematics 0.451852 |
|
high_school_microeconomics 0.676471 |
|
high_school_physics 0.456954 |
|
high_school_psychology 0.774312 |
|
high_school_statistics 0.587963 |
|
high_school_us_history 0.740196 |
|
high_school_world_history 0.767932 |
|
human_aging 0.618834 |
|
human_sexuality 0.656489 |
|
international_law 0.735537 |
|
jurisprudence 0.657407 |
|
logical_fallacies 0.570552 |
|
machine_learning 0.410714 |
|
management 0.699029 |
|
marketing 0.807692 |
|
medical_genetics 0.670000 |
|
miscellaneous 0.701149 |
|
moral_disputes 0.676301 |
|
moral_scenarios 0.220112 |
|
nutrition 0.679739 |
|
philosophy 0.688103 |
|
prehistory 0.626543 |
|
professional_accounting 0.365248 |
|
professional_law 0.417862 |
|
professional_medicine 0.602941 |
|
professional_psychology 0.570261 |
|
public_relations 0.618182 |
|
security_studies 0.726531 |
|
sociology 0.671642 |
|
us_foreign_policy 0.750000 |
|
virology 0.481928 |
|
world_religions 0.736842 |
|
INFO: 2024-07-12 14:54:22,004: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.547210 |
|
humanities 0.620858 |
|
other (business, health, misc.) 0.612733 |
|
social sciences 0.659462 |
|
INFO: 2024-07-12 14:54:22,012: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.6100658446710188} |
|
INFO: 2024-07-12 14:54:22,056: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 14:54:22,065: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.687 0.343 0.850 0.508 0.593 0.993 0.582 0.820 0.635 0.903 0.714 0.610 |
|
INFO: 2024-07-12 14:56:31,835: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 459.63s |
|
INFO: 2024-07-12 14:56:31,837: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
|
INFO: 2024-07-12 14:56:31,868: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.291582292949926, 'len': 0.9984711650118888, 'lcs': 0.9897423394080185} |
|
INFO: 2024-07-12 14:56:31,870: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:56:31,870: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:56:34,191: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.32s |
|
INFO: 2024-07-12 14:58:14,794: llmtf.base.darumeru/USE: Processing Dataset: 264.03s |
|
INFO: 2024-07-12 14:58:14,798: llmtf.base.darumeru/USE: Results for darumeru/USE: |
|
INFO: 2024-07-12 14:58:14,803: llmtf.base.darumeru/USE: {'grade_norm': 0.15784313725490196} |
|
INFO: 2024-07-12 14:58:14,807: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 14:58:14,807: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 14:58:21,363: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 6.55s |
|
INFO: 2024-07-12 15:00:15,879: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 114.52s |
|
INFO: 2024-07-12 15:00:15,881: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
|
INFO: 2024-07-12 15:00:15,893: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7341227125941873, 'mcc': 0.3720880731254111} |
|
INFO: 2024-07-12 15:00:15,899: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 15:00:15,925: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.662 0.343 0.850 0.508 0.593 0.158 0.998 0.993 0.582 0.820 0.635 0.903 0.714 0.610 0.553 |
|
INFO: 2024-07-12 15:03:26,998: llmtf.base.daru/treewayextractive: Processing Dataset: 1480.36s |
|
INFO: 2024-07-12 15:03:27,003: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-07-12 15:03:27,254: llmtf.base.daru/treewayextractive: {'r-prec': 0.3917012265512266} |
|
INFO: 2024-07-12 15:03:27,764: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 15:03:27,788: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.644 0.392 0.343 0.850 0.508 0.593 0.158 0.998 0.993 0.582 0.820 0.635 0.903 0.714 0.610 0.553 |
|
INFO: 2024-07-12 15:09:56,745: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 802.55s |
|
INFO: 2024-07-12 15:09:56,748: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-07-12 15:09:56,767: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.654942872308677, 'len': 0.9887005042024545, 'lcs': 0.8325056701679933} |
|
INFO: 2024-07-12 15:09:56,769: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:09:56,769: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:09:59,376: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.61s |
|
INFO: 2024-07-12 15:19:53,024: llmtf.base.darumeru/cp_para_en: Processing Dataset: 593.65s |
|
INFO: 2024-07-12 15:19:53,044: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
|
INFO: 2024-07-12 15:19:53,048: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.345346412423869, 'len': 0.998698675596975, 'lcs': 0.9617108678980784} |
|
INFO: 2024-07-12 15:19:53,049: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 15:19:53,069: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.673 0.392 0.343 0.850 0.508 0.593 0.158 0.962 0.833 0.998 0.993 0.582 0.820 0.635 0.903 0.714 0.610 0.553 |
|
INFO: 2024-07-12 15:22:59,702: llmtf.base.daru/treewayabstractive: Processing Dataset: 2661.42s |
|
INFO: 2024-07-12 15:22:59,722: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-07-12 15:22:59,742: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3366772629962508, 'rouge2': 0.11808146329188003} |
|
INFO: 2024-07-12 15:22:59,745: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 15:22:59,771: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.649 0.227 0.392 0.343 0.850 0.508 0.593 0.158 0.962 0.833 0.998 0.993 0.582 0.820 0.635 0.903 0.714 0.610 0.553 |
|
|