|
INFO: 2024-07-12 13:32:20,716: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
|
INFO: 2024-07-12 13:32:20,717: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:32:20,717: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:32:21,381: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
|
INFO: 2024-07-12 13:32:21,381: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:32:21,381: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:32:23,188: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-07-12 13:32:23,189: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 13:32:23,189: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 13:32:24,860: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.14s |
|
INFO: 2024-07-12 13:32:24,879: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-07-12 13:32:24,879: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 13:32:24,879: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 13:32:26,606: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-07-12 13:32:26,607: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:32:26,607: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:32:28,649: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-07-12 13:32:28,649: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 13:32:28,649: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 13:32:30,170: llmtf.base.darumeru/ruMMLU: Loading Dataset: 8.79s |
|
INFO: 2024-07-12 13:32:31,039: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
|
INFO: 2024-07-12 13:32:31,040: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:32:31,040: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:32:31,528: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.92s |
|
INFO: 2024-07-12 13:32:33,602: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.56s |
|
INFO: 2024-07-12 13:32:41,690: llmtf.base.daru/treewayextractive: Loading Dataset: 13.04s |
|
INFO: 2024-07-12 13:34:36,781: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 131.90s |
|
INFO: 2024-07-12 13:34:38,322: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 135.13s |
|
INFO: 2024-07-12 13:39:29,078: llmtf.base.darumeru/ruMMLU: Processing Dataset: 418.89s |
|
INFO: 2024-07-12 13:39:29,081: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
|
INFO: 2024-07-12 13:39:29,091: llmtf.base.darumeru/ruMMLU: {'acc': 0.5755761747979646} |
|
INFO: 2024-07-12 13:39:29,136: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 13:39:29,141: llmtf.base.evaluator: |
|
mean darumeru/ruMMLU |
|
0.576 0.576 |
|
INFO: 2024-07-12 13:42:34,523: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 477.74s |
|
INFO: 2024-07-12 13:42:34,527: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-07-12 13:42:34,569: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.490000 |
|
anatomy 0.585185 |
|
astronomy 0.743421 |
|
business_ethics 0.750000 |
|
clinical_knowledge 0.762264 |
|
college_biology 0.784722 |
|
college_chemistry 0.460000 |
|
college_computer_science 0.630000 |
|
college_mathematics 0.420000 |
|
college_medicine 0.664740 |
|
college_physics 0.392157 |
|
computer_security 0.690000 |
|
conceptual_physics 0.697872 |
|
econometrics 0.570175 |
|
electrical_engineering 0.689655 |
|
elementary_mathematics 0.611111 |
|
formal_logic 0.507937 |
|
global_facts 0.440000 |
|
high_school_biology 0.809677 |
|
high_school_chemistry 0.615764 |
|
high_school_computer_science 0.780000 |
|
high_school_european_history 0.812121 |
|
high_school_geography 0.848485 |
|
high_school_government_and_politics 0.911917 |
|
high_school_macroeconomics 0.715385 |
|
high_school_mathematics 0.488889 |
|
high_school_microeconomics 0.802521 |
|
high_school_physics 0.456954 |
|
high_school_psychology 0.858716 |
|
high_school_statistics 0.675926 |
|
high_school_us_history 0.828431 |
|
high_school_world_history 0.848101 |
|
human_aging 0.686099 |
|
human_sexuality 0.755725 |
|
international_law 0.801653 |
|
jurisprudence 0.833333 |
|
logical_fallacies 0.815951 |
|
machine_learning 0.455357 |
|
management 0.776699 |
|
marketing 0.893162 |
|
medical_genetics 0.790000 |
|
miscellaneous 0.845466 |
|
moral_disputes 0.739884 |
|
moral_scenarios 0.422346 |
|
nutrition 0.774510 |
|
philosophy 0.742765 |
|
prehistory 0.746914 |
|
professional_accounting 0.570922 |
|
professional_law 0.505215 |
|
professional_medicine 0.687500 |
|
professional_psychology 0.722222 |
|
public_relations 0.727273 |
|
security_studies 0.722449 |
|
sociology 0.870647 |
|
us_foreign_policy 0.870000 |
|
virology 0.542169 |
|
world_religions 0.801170 |
|
INFO: 2024-07-12 13:42:34,577: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.605084 |
|
humanities 0.723525 |
|
other (business, health, misc.) 0.697765 |
|
social sciences 0.781293 |
|
INFO: 2024-07-12 13:42:34,584: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7019166707055704} |
|
INFO: 2024-07-12 13:42:34,616: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 13:42:34,636: llmtf.base.evaluator: |
|
mean darumeru/ruMMLU nlpcoreteam/enMMLU |
|
0.639 0.576 0.702 |
|
INFO: 2024-07-12 13:42:36,791: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 603.17s |
|
INFO: 2024-07-12 13:42:36,793: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
|
INFO: 2024-07-12 13:42:36,798: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.531882961858218, 'len': 0.9931991516400629, 'lcs': 0.9651252623970072} |
|
INFO: 2024-07-12 13:42:36,799: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:42:36,799: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:42:38,975: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.17s |
|
INFO: 2024-07-12 13:43:48,104: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 549.78s |
|
INFO: 2024-07-12 13:43:48,105: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-07-12 13:43:48,145: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.430000 |
|
anatomy 0.562963 |
|
astronomy 0.677632 |
|
business_ethics 0.660000 |
|
clinical_knowledge 0.600000 |
|
college_biology 0.597222 |
|
college_chemistry 0.390000 |
|
college_computer_science 0.540000 |
|
college_mathematics 0.470000 |
|
college_medicine 0.554913 |
|
college_physics 0.372549 |
|
computer_security 0.660000 |
|
conceptual_physics 0.574468 |
|
econometrics 0.464912 |
|
electrical_engineering 0.558621 |
|
elementary_mathematics 0.576720 |
|
formal_logic 0.476190 |
|
global_facts 0.420000 |
|
high_school_biology 0.729032 |
|
high_school_chemistry 0.512315 |
|
high_school_computer_science 0.750000 |
|
high_school_european_history 0.745455 |
|
high_school_geography 0.717172 |
|
high_school_government_and_politics 0.663212 |
|
high_school_macroeconomics 0.615385 |
|
high_school_mathematics 0.455556 |
|
high_school_microeconomics 0.642857 |
|
high_school_physics 0.410596 |
|
high_school_psychology 0.759633 |
|
high_school_statistics 0.560185 |
|
high_school_us_history 0.715686 |
|
high_school_world_history 0.767932 |
|
human_aging 0.609865 |
|
human_sexuality 0.633588 |
|
international_law 0.710744 |
|
jurisprudence 0.611111 |
|
logical_fallacies 0.527607 |
|
machine_learning 0.348214 |
|
management 0.718447 |
|
marketing 0.769231 |
|
medical_genetics 0.610000 |
|
miscellaneous 0.688378 |
|
moral_disputes 0.638728 |
|
moral_scenarios 0.244693 |
|
nutrition 0.669935 |
|
philosophy 0.639871 |
|
prehistory 0.608025 |
|
professional_accounting 0.390071 |
|
professional_law 0.410039 |
|
professional_medicine 0.610294 |
|
professional_psychology 0.542484 |
|
public_relations 0.618182 |
|
security_studies 0.653061 |
|
sociology 0.741294 |
|
us_foreign_policy 0.740000 |
|
virology 0.506024 |
|
world_religions 0.719298 |
|
INFO: 2024-07-12 13:43:48,152: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.534062 |
|
humanities 0.601183 |
|
other (business, health, misc.) 0.597866 |
|
social sciences 0.649315 |
|
INFO: 2024-07-12 13:43:48,160: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5956063721324766} |
|
INFO: 2024-07-12 13:43:48,192: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 13:43:48,197: llmtf.base.evaluator: |
|
mean darumeru/cp_sent_ru darumeru/ruMMLU nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.717 0.993 0.576 0.702 0.596 |
|
INFO: 2024-07-12 13:44:07,504: llmtf.base.darumeru/MultiQ: Processing Dataset: 702.63s |
|
INFO: 2024-07-12 13:44:07,507: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-07-12 13:44:07,511: llmtf.base.darumeru/MultiQ: {'f1': 0.23082049162131107, 'em': 0.13288718929254303} |
|
INFO: 2024-07-12 13:44:07,516: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:44:07,516: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:44:09,506: llmtf.base.darumeru/PARus: Loading Dataset: 1.99s |
|
INFO: 2024-07-12 13:44:15,843: llmtf.base.darumeru/PARus: Processing Dataset: 6.34s |
|
INFO: 2024-07-12 13:44:15,844: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-07-12 13:44:15,857: llmtf.base.darumeru/PARus: {'acc': 0.79} |
|
INFO: 2024-07-12 13:44:15,858: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:44:15,858: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:44:17,890: llmtf.base.darumeru/RCB: Loading Dataset: 2.03s |
|
INFO: 2024-07-12 13:44:26,535: llmtf.base.darumeru/RCB: Processing Dataset: 8.64s |
|
INFO: 2024-07-12 13:44:26,550: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-07-12 13:44:26,556: llmtf.base.darumeru/RCB: {'acc': 0.5181818181818182, 'f1_macro': 0.4329109928238468} |
|
INFO: 2024-07-12 13:44:26,557: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:44:26,557: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:44:29,227: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 2.67s |
|
INFO: 2024-07-12 13:45:43,981: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 74.75s |
|
INFO: 2024-07-12 13:45:43,996: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-07-12 13:45:44,008: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.8178694158075601, 'f1_macro': 0.8177749093697874} |
|
INFO: 2024-07-12 13:45:44,015: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:45:44,015: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:45:51,418: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.40s |
|
INFO: 2024-07-12 13:50:12,621: llmtf.base.darumeru/ruTiE: Processing Dataset: 261.20s |
|
INFO: 2024-07-12 13:50:12,624: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: |
|
INFO: 2024-07-12 13:50:12,657: llmtf.base.darumeru/ruTiE: {'acc': 0.6348837209302326} |
|
INFO: 2024-07-12 13:50:12,660: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:50:12,661: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:50:15,001: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.34s |
|
INFO: 2024-07-12 13:50:18,607: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 3.60s |
|
INFO: 2024-07-12 13:50:18,609: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-07-12 13:50:18,614: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9043845651108326} |
|
INFO: 2024-07-12 13:50:18,615: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:50:18,615: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:50:20,678: llmtf.base.darumeru/RWSD: Loading Dataset: 2.06s |
|
INFO: 2024-07-12 13:50:28,821: llmtf.base.darumeru/RWSD: Processing Dataset: 8.14s |
|
INFO: 2024-07-12 13:50:28,823: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-07-12 13:50:28,827: llmtf.base.darumeru/RWSD: {'acc': 0.6029411764705882} |
|
INFO: 2024-07-12 13:50:28,828: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:50:28,828: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:50:31,849: llmtf.base.darumeru/USE: Loading Dataset: 3.02s |
|
INFO: 2024-07-12 13:51:35,478: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 536.50s |
|
INFO: 2024-07-12 13:51:35,480: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
|
INFO: 2024-07-12 13:51:35,484: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.31579408268585, 'len': 0.8895181009295375, 'lcs': 0.8825764242763656} |
|
INFO: 2024-07-12 13:51:35,486: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 13:51:35,486: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 13:51:37,358: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 1.87s |
|
INFO: 2024-07-12 13:55:22,257: llmtf.base.darumeru/USE: Processing Dataset: 290.41s |
|
INFO: 2024-07-12 13:55:22,261: llmtf.base.darumeru/USE: Results for darumeru/USE: |
|
INFO: 2024-07-12 13:55:22,266: llmtf.base.darumeru/USE: {'grade_norm': 0.153921568627451} |
|
INFO: 2024-07-12 13:55:22,269: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 13:55:22,269: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 13:55:26,569: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 4.30s |
|
INFO: 2024-07-12 13:56:58,317: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 91.75s |
|
INFO: 2024-07-12 13:56:58,322: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
|
INFO: 2024-07-12 13:56:58,334: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7072120559741658, 'mcc': 0.23883340451741628} |
|
INFO: 2024-07-12 13:56:58,338: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 13:56:58,365: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.628 0.182 0.790 0.476 0.603 0.154 0.890 0.993 0.576 0.818 0.635 0.905 0.702 0.596 0.473 |
|
INFO: 2024-07-12 13:58:57,546: llmtf.base.daru/treewayextractive: Processing Dataset: 1575.85s |
|
INFO: 2024-07-12 13:58:57,550: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-07-12 13:58:57,798: llmtf.base.daru/treewayextractive: {'r-prec': 0.3917012265512266} |
|
INFO: 2024-07-12 13:58:58,347: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 13:58:58,375: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.612 0.392 0.182 0.790 0.476 0.603 0.154 0.890 0.993 0.576 0.818 0.635 0.905 0.702 0.596 0.473 |
|
INFO: 2024-07-12 14:04:59,106: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 801.75s |
|
INFO: 2024-07-12 14:04:59,122: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-07-12 14:04:59,126: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.654953066881009, 'len': 0.9876165527469303, 'lcs': 0.8268038045788856} |
|
INFO: 2024-07-12 14:04:59,127: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 14:04:59,127: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 14:05:01,461: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.33s |
|
INFO: 2024-07-12 14:14:59,808: llmtf.base.darumeru/cp_para_en: Processing Dataset: 598.35s |
|
INFO: 2024-07-12 14:14:59,811: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
|
INFO: 2024-07-12 14:14:59,832: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.353034074577473, 'len': 0.9802575114789726, 'lcs': 0.9058441645642358} |
|
INFO: 2024-07-12 14:14:59,833: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 14:14:59,844: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.642 0.392 0.182 0.790 0.476 0.603 0.154 0.906 0.827 0.890 0.993 0.576 0.818 0.635 0.905 0.702 0.596 0.473 |
|
INFO: 2024-07-12 14:37:23,690: llmtf.base.daru/treewayabstractive: Processing Dataset: 3892.16s |
|
INFO: 2024-07-12 14:37:23,708: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-07-12 14:37:23,718: llmtf.base.daru/treewayabstractive: {'rouge1': 0.2813082731366044, 'rouge2': 0.10129515826302848} |
|
INFO: 2024-07-12 14:37:23,720: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 14:37:23,733: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.617 0.191 0.392 0.182 0.790 0.476 0.603 0.154 0.906 0.827 0.890 0.993 0.576 0.818 0.635 0.905 0.702 0.596 0.473 |
|
|