open_pl_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

djstrong commited on Feb 14

Commit

02311bf

1 Parent(s): a2eeaa1

try

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

eval-results/.gitattributes +55 -0
eval-results/.idea/open_pl_llm_leaderboard_results.iml +8 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_in_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-32-59.263304.json +118 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_in_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-15-19.394508.json +105 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_out_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-27-04.852309.json +118 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_out_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-12-58.277105.json +105 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_8tags_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-35-19.801525.json +106 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_8tags_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T17-37-36.637222.json +111 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_belebele_mc_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-13-12.455988.json +107 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_belebele_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-36-48.898331.json +109 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_cbd_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-16-56.871408.json +111 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_cbd_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-44-52.956955.json +119 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_dyk_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-13-01.731913.json +107 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_dyk_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-14-18.998768.json +118 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_eq_bench_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-57-26.656191.json +131 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_eq_bench_first_turn_1723381723/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-35-21.291842.json +107 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_klej_ner_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-21-30.374052.json +105 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_klej_ner_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T16-15-22.967823.json +112 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_polqa_closed_book_1723381723/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-22-25.200287.json +106 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_polqa_open_book_1723381723/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-58-53.946082.json +106 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_polqa_reranking_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-23-40.740746.json +101 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_poquad_open_book_1723381723/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T16-06-52.670471.json +104 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_ppc_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-13-02.213869.json +101 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_ppc_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-44-08.270834.json +111 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_psc_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-11-55.449227.json +107 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_psc_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-50-13.495944.json +118 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_in_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-25-19.492688.json +118 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_in_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-24-50.869505.json +105 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_out_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-22-20.849828.json +118 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_out_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-19-39.147509.json +105 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_8tags_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-53-51.017953.json +106 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_8tags_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T17-01-59.819478.json +111 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_belebele_mc_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-15-54.278792.json +107 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_belebele_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-36-23.654679.json +109 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_cbd_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-21-52.993123.json +111 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_cbd_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-33-25.750066.json +119 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_dyk_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-14-08.007826.json +107 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_dyk_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-54-27.674557.json +118 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_eq_bench_first_turn_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-18-22.563512.json +107 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_klej_ner_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-35-52.497622.json +105 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_klej_ner_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-57-18.271570.json +112 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_pes_1723381722/results_2024-08-27T17-50-52.063138.json +0 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_polqa_closed_book_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-16-23.588300.json +106 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_polqa_open_book_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-47-00.491423.json +106 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_polqa_reranking_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-50-02.859037.json +101 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_poquad_open_book_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T17-09-42.653951.json +104 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_ppc_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-13-23.877449.json +101 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_ppc_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-30-24.424865.json +111 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_psc_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-18-48.485190.json +107 -0
eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_psc_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-47-28.998766.json +118 -0

eval-results/.gitattributes ADDED Viewed

	@@ -0,0 +1,55 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.lz4 filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+# Audio files - uncompressed
+*.pcm filter=lfs diff=lfs merge=lfs -text
+*.sam filter=lfs diff=lfs merge=lfs -text
+*.raw filter=lfs diff=lfs merge=lfs -text
+# Audio files - compressed
+*.aac filter=lfs diff=lfs merge=lfs -text
+*.flac filter=lfs diff=lfs merge=lfs -text
+*.mp3 filter=lfs diff=lfs merge=lfs -text
+*.ogg filter=lfs diff=lfs merge=lfs -text
+*.wav filter=lfs diff=lfs merge=lfs -text
+# Image files - uncompressed
+*.bmp filter=lfs diff=lfs merge=lfs -text
+*.gif filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.tiff filter=lfs diff=lfs merge=lfs -text
+# Image files - compressed
+*.jpg filter=lfs diff=lfs merge=lfs -text
+*.jpeg filter=lfs diff=lfs merge=lfs -text
+*.webp filter=lfs diff=lfs merge=lfs -text

eval-results/.idea/open_pl_llm_leaderboard_results.iml ADDED Viewed

	@@ -0,0 +1,8 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<module type="PYTHON_MODULE" version="4">
+  <component name="NewModuleRootManager">
+    <content url="file://$MODULE_DIR$" />
+    <orderEntry type="inheritedJdk" />
+    <orderEntry type="sourceFolder" forTests="false" />
+  </component>
+</module>

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_in_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-32-59.263304.json ADDED Viewed

	@@ -0,0 +1,118 @@

+{
+  "results": {
+    "polemo2_in": {
+      "exact_match,score-first": 0.7797783933518005,
+      "exact_match_stderr,score-first": 0.01543291377156506,
+      "alias": "polemo2_in"
+    }
+  },
+  "group_subtasks": {
+    "polemo2_in": []
+  },
+  "configs": {
+    "polemo2_in": {
+      "task": "polemo2_in",
+      "group": [
+        "polemo2"
+      ],
+      "dataset_path": "allegro/klej-polemo2-in",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Opinia: \"{{sentence}}\"\nOkreśl sentyment podanej opinii. Możliwe odpowiedzi:\nA - Neutralny\nB - Negatywny\nC - Pozytywny\nD - Niejednoznaczny\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{'__label__meta_zero': 'A', '__label__meta_minus_m': 'B', '__label__meta_plus_m': 'C', '__label__meta_amb': 'D'}.get(target)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true,
+          "hf_evaluate": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}",
+      "metadata": {
+        "version": 1.0
+      }
+    }
+  },
+  "versions": {
+    "polemo2_in": 1.0
+  },
+  "n-shot": {
+    "polemo2_in": 0
+  },
+  "higher_is_better": {
+    "polemo2_in": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polemo2_in": {
+      "original": 722,
+      "effective": 722
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_in/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381747.6734786,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4499.98\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polemo2_in": "287c7460415884286befac7ba8422a32230ec65846799595a6fee727f2d037a5"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2341192.242787643,
+  "end_time": 2342631.868033792,
+  "total_evaluation_time_seconds": "1439.6252461490221"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_in_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-15-19.394508.json ADDED Viewed

	@@ -0,0 +1,105 @@

+{
+  "results": {
+    "polemo2_in_multiple_choice": {
+      "acc,none": 0.7714681440443213,
+      "acc_stderr,none": 0.015637406997304655,
+      "acc_norm,none": 0.7742382271468145,
+      "acc_norm_stderr,none": 0.015570224561219015,
+      "alias": "polemo2_in_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polemo2_in_multiple_choice": []
+  },
+  "configs": {
+    "polemo2_in_multiple_choice": {
+      "task": "polemo2_in_multiple_choice",
+      "group": [
+        "polemo2_mc"
+      ],
+      "dataset_path": "allegro/klej-polemo2-in",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Opinia: \"{{sentence}}\"\nOkreśl sentyment podanej opinii: Neutralny, Negatywny, Pozytywny, Niejednoznaczny.\nSentyment:",
+      "doc_to_target": "{{['__label__meta_zero', '__label__meta_minus_m', '__label__meta_plus_m', '__label__meta_amb'].index(target)}}",
+      "doc_to_choice": [
+        "Neutralny",
+        "Negatywny",
+        "Pozytywny",
+        "Niejednoznaczny"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polemo2_in_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polemo2_in_multiple_choice": 0
+  },
+  "higher_is_better": {
+    "polemo2_in_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polemo2_in_multiple_choice": {
+      "original": 722,
+      "effective": 722
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_in_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8968034,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polemo2_in_multiple_choice": "6cade7fdeb7a53de3a966bebb7fe941479487faada4badf2831b62d7bb426916"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2669732.988216114,
+  "end_time": 2670111.950610943,
+  "total_evaluation_time_seconds": "378.96239482890815"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_out_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-27-04.852309.json ADDED Viewed

	@@ -0,0 +1,118 @@

+{
+  "results": {
+    "polemo2_out": {
+      "exact_match,score-first": 0.7530364372469636,
+      "exact_match_stderr,score-first": 0.0194223142525205,
+      "alias": "polemo2_out"
+    }
+  },
+  "group_subtasks": {
+    "polemo2_out": []
+  },
+  "configs": {
+    "polemo2_out": {
+      "task": "polemo2_out",
+      "group": [
+        "polemo2"
+      ],
+      "dataset_path": "allegro/klej-polemo2-out",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Opinia: \"{{sentence}}\"\nOkreśl sentyment podanej opinii. Możliwe odpowiedzi:\nA - Neutralny\nB - Negatywny\nC - Pozytywny\nD - Niejednoznaczny\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{'__label__meta_zero': 'A', '__label__meta_minus_m': 'B', '__label__meta_plus_m': 'C', '__label__meta_amb': 'D'}.get(target)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true,
+          "hf_evaluate": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}",
+      "metadata": {
+        "version": 1.0
+      }
+    }
+  },
+  "versions": {
+    "polemo2_out": 1.0
+  },
+  "n-shot": {
+    "polemo2_out": 0
+  },
+  "higher_is_better": {
+    "polemo2_out": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polemo2_out": {
+      "original": 494,
+      "effective": 494
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_out/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.629631,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polemo2_out": "bf931755699911cb191ed108ec01aa6c9695552185da1ccb8f6c40c22db028b6"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2577223.678777486,
+  "end_time": 2578308.421439366,
+  "total_evaluation_time_seconds": "1084.7426618798636"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_out_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-12-58.277105.json ADDED Viewed

	@@ -0,0 +1,105 @@

+{
+  "results": {
+    "polemo2_out_multiple_choice": {
+      "acc,none": 0.742914979757085,
+      "acc_stderr,none": 0.019682691432000205,
+      "acc_norm,none": 0.7672064777327935,
+      "acc_norm_stderr,none": 0.019033476340855917,
+      "alias": "polemo2_out_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polemo2_out_multiple_choice": []
+  },
+  "configs": {
+    "polemo2_out_multiple_choice": {
+      "task": "polemo2_out_multiple_choice",
+      "group": [
+        "polemo2_mc"
+      ],
+      "dataset_path": "allegro/klej-polemo2-out",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Opinia: \"{{sentence}}\"\nOkreśl sentyment podanej opinii: Neutralny, Negatywny, Pozytywny, Niejednoznaczny.\nSentyment:",
+      "doc_to_target": "{{['__label__meta_zero', '__label__meta_minus_m', '__label__meta_plus_m', '__label__meta_amb'].index(target)}}",
+      "doc_to_choice": [
+        "Neutralny",
+        "Negatywny",
+        "Pozytywny",
+        "Niejednoznaczny"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polemo2_out_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polemo2_out_multiple_choice": 0
+  },
+  "higher_is_better": {
+    "polemo2_out_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polemo2_out_multiple_choice": {
+      "original": 494,
+      "effective": 494
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polemo2_out_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8963165,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polemo2_out_multiple_choice": "63ec4fc12bc668a566b3f91378159707f11e63ac52ded120b65fa3dd6a1b9979"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2669732.987871353,
+  "end_time": 2669970.833388602,
+  "total_evaluation_time_seconds": "237.8455172488466"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_8tags_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-35-19.801525.json ADDED Viewed

	@@ -0,0 +1,106 @@

+{
+  "results": {
+    "polish_8tags_multiple_choice": {
+      "acc,none": 0.785224153705398,
+      "acc_stderr,none": 0.006211537927009462,
+      "acc_norm,none": 0.7829368709972553,
+      "acc_norm_stderr,none": 0.006235424129675317,
+      "alias": "polish_8tags_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_8tags_multiple_choice": []
+  },
+  "configs": {
+    "polish_8tags_multiple_choice": {
+      "task": "polish_8tags_multiple_choice",
+      "dataset_path": "sdadas/8tags",
+      "training_split": "train",
+      "test_split": "test",
+      "fewshot_split": "train",
+      "doc_to_text": "Tytuł: \"{{sentence}}\"\nDo podanego tytułu przyporządkuj jedną najlepiej pasującą kategorię z podanych: Film, Historia, Jedzenie, Medycyna, Motoryzacja, Praca, Sport, Technologie.\nKategoria:",
+      "doc_to_target": "{{label|int}}",
+      "doc_to_choice": [
+        "Film",
+        "Historia",
+        "Jedzenie",
+        "Medycyna",
+        "Motoryzacja",
+        "Praca",
+        "Sport",
+        "Technologie"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polish_8tags_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_8tags_multiple_choice": 0
+  },
+  "higher_is_better": {
+    "polish_8tags_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_8tags_multiple_choice": {
+      "original": 4372,
+      "effective": 4372
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_8tags_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8961906,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_8tags_multiple_choice": "97e61e52772af016579422421c750a76a73c5aa55b81bd957c03e5fe7ca43b9b"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2669732.988419174,
+  "end_time": 2671312.355425343,
+  "total_evaluation_time_seconds": "1579.3670061687008"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_8tags_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T17-37-36.637222.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+  "results": {
+    "polish_8tags_regex": {
+      "exact_match,score-first": 0.7509149130832571,
+      "exact_match_stderr,score-first": 0.006541522277132546,
+      "alias": "polish_8tags_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_8tags_regex": []
+  },
+  "configs": {
+    "polish_8tags_regex": {
+      "task": "polish_8tags_regex",
+      "dataset_path": "sdadas/8tags",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Tytuł: \"{{sentence}}\"\nPytanie: jaka kategoria najlepiej pasuje do podanego tytułu?\nMożliwe odpowiedzi:\nA - film\nB - historia\nC - jedzenie\nD - medycyna\nE - motoryzacja\nF - praca\nG - sport\nH - technologie\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F', 6: 'G', 7: 'H'}.get(label)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCDEFGH]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polish_8tags_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_8tags_regex": 0
+  },
+  "higher_is_better": {
+    "polish_8tags_regex": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polish_8tags_regex": {
+      "original": 4372,
+      "effective": 4372
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_8tags_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6299846,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_8tags_regex": "65692e40c28addb981c1eb0f272d45d3abf7b640c98f72a2acf9de48677c436e"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2577223.709303458,
+  "end_time": 2586140.204190591,
+  "total_evaluation_time_seconds": "8916.494887132663"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_belebele_mc_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-13-12.455988.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "results": {
+    "polish_belebele_mc": {
+      "acc,none": 0.8755555555555555,
+      "acc_stderr,none": 0.011009047987347446,
+      "acc_norm,none": 0.8755555555555555,
+      "acc_norm_stderr,none": 0.011009047987347446,
+      "alias": "polish_belebele_mc"
+    }
+  },
+  "group_subtasks": {
+    "polish_belebele_mc": []
+  },
+  "configs": {
+    "polish_belebele_mc": {
+      "task": "polish_belebele_mc",
+      "dataset_path": "facebook/belebele",
+      "test_split": "pol_Latn",
+      "fewshot_split": "pol_Latn",
+      "doc_to_text": "Fragment: \"{{flores_passage}}\"\nPytanie: \"{{question}}\"\nMożliwe odpowiedzi:\nA - {{mc_answer1}}\nB - {{mc_answer2}}\nC - {{mc_answer3}}\nD - {{mc_answer4}}\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{['1', '2', '3', '4'].index(correct_answer_num)}}",
+      "doc_to_choice": [
+        "A",
+        "B",
+        "C",
+        "D"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "fewshot_config": {
+        "sampler": "first_n"
+      },
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{question}}",
+      "metadata": {
+        "version": 0.0
+      }
+    }
+  },
+  "versions": {
+    "polish_belebele_mc": 0.0
+  },
+  "n-shot": {
+    "polish_belebele_mc": 0
+  },
+  "higher_is_better": {
+    "polish_belebele_mc": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_belebele_mc": {
+      "original": 900,
+      "effective": 900
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_belebele_mc/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8964837,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_belebele_mc": "e575c2bfe123497ebf8be109e92bdcb84761ff1f7ebc06ee26942cfec0914841"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2669732.988173293,
+  "end_time": 2669985.011977567,
+  "total_evaluation_time_seconds": "252.023804273922"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_belebele_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-36-48.898331.json ADDED Viewed

	@@ -0,0 +1,109 @@

+{
+  "results": {
+    "polish_belebele_regex": {
+      "exact_match,score-first": 0.8622222222222222,
+      "exact_match_stderr,score-first": 0.011495274539524291,
+      "alias": "polish_belebele_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_belebele_regex": []
+  },
+  "configs": {
+    "polish_belebele_regex": {
+      "task": "polish_belebele_regex",
+      "dataset_path": "facebook/belebele",
+      "test_split": "pol_Latn",
+      "doc_to_text": "Fragment: \"{{flores_passage}}\"\nPytanie: \"{{question}}\"\nMożliwe odpowiedzi:\nA - {{mc_answer1}}\nB - {{mc_answer2}}\nC - {{mc_answer3}}\nD - {{mc_answer4}}\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{0: 'A', 1: 'B', 2: 'C', 3: 'D'}.get(correct_answer_num|int - 1)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{flores_passage}} {{question}} {{mc_answer1}} {{mc_answer2}} {{mc_answer3}} {{mc_answer4}}"
+    }
+  },
+  "versions": {
+    "polish_belebele_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_belebele_regex": 0
+  },
+  "higher_is_better": {
+    "polish_belebele_regex": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polish_belebele_regex": {
+      "original": 900,
+      "effective": 900
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_belebele_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.62987,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_belebele_regex": "27d3ad975a6f34d19e414caf684c5c66f47347a7e3f05c8420cd085a341dcbe7"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2577223.702730047,
+  "end_time": 2578892.467186104,
+  "total_evaluation_time_seconds": "1668.7644560569897"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_cbd_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-16-56.871408.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+  "results": {
+    "polish_cbd_multiple_choice": {
+      "acc,none": 0.232,
+      "acc_stderr,none": 0.01335493745228157,
+      "f1,none": 0.19798644869999346,
+      "f1_stderr,none": "N/A",
+      "acc_norm,none": 0.254,
+      "acc_norm_stderr,none": 0.013772206565168544,
+      "alias": "polish_cbd_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_cbd_multiple_choice": []
+  },
+  "configs": {
+    "polish_cbd_multiple_choice": {
+      "task": "polish_cbd_multiple_choice",
+      "dataset_path": "ptaszynski/PolishCyberbullyingDataset",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Wypowiedź: \"{{TEXT}}\"\nDo podanej wypowiedzi przyporządkuj jedną, najlepiej pasującą kategorię z podanych: nieszkodliwa, szyderstwo, obelga, insynuacja, groźba, molestowanie.\nKategoria:",
+      "doc_to_target": "{{{'szyderstwo': 1, 'obelga': 2, 'insynuacja': 3, 'grozba': 4, 'molestowanie': 5}.get(CATEGORIES, 0)}}",
+      "doc_to_choice": [
+        "nieszkodliwa",
+        "szyderstwo",
+        "obelga",
+        "insynuacja",
+        "groźba",
+        "molestowanie"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"A\", \"B\", \"C\", \"D\", \"E\", \"F\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1_macro(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions, average='macro')\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{TEXT}}"
+    }
+  },
+  "versions": {
+    "polish_cbd_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_cbd_multiple_choice": 0
+  },
+  "higher_is_better": {
+    "polish_cbd_multiple_choice": {
+      "acc": true,
+      "acc_norm": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_cbd_multiple_choice": {
+      "original": 1000,
+      "effective": 1000
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_cbd_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.896205,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_cbd_multiple_choice": "56be7d38fd3346cc3ebad202ec8c0365fc6a9b7c3b60b7c527ec0cf16db2c0df"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2669732.987932883,
+  "end_time": 2670209.427766158,
+  "total_evaluation_time_seconds": "476.4398332745768"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_cbd_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-44-52.956955.json ADDED Viewed

	@@ -0,0 +1,119 @@

+{
+  "results": {
+    "polish_cbd_regex": {
+      "exact_match,score-first": 0.389,
+      "exact_match_stderr,score-first": 0.015424555647308496,
+      "f1,score-first": 0.24472868820966764,
+      "f1_stderr,score-first": "N/A",
+      "alias": "polish_cbd_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_cbd_regex": []
+  },
+  "configs": {
+    "polish_cbd_regex": {
+      "task": "polish_cbd_regex",
+      "dataset_path": "ptaszynski/PolishCyberbullyingDataset",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Wypowiedź: \"{{TEXT}}\"\nPytanie: Jaka kategoria najlepiej pasuje do podanej wypowiedzi?\nMożliwe odpowiedzi:\nA - nieszkodliwa\nB - szyderstwo\nC - obelga\nD - insynuacja\nE - groźba\nF - molestowanie\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{'szyderstwo': 'B', 'obelga': 'C', 'insynuacja': 'D', 'grozba': 'E', 'molestowanie': 'F'}.get(CATEGORIES, 'A')}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"A\", \"B\", \"C\", \"D\", \"E\", \"F\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1_macro(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions, average='macro')\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ",",
+          ";"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCDEF]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{TEXT}}"
+    }
+  },
+  "versions": {
+    "polish_cbd_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_cbd_regex": 0
+  },
+  "higher_is_better": {
+    "polish_cbd_regex": {
+      "exact_match": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_cbd_regex": {
+      "original": 1000,
+      "effective": 1000
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_cbd_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6293602,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_cbd_regex": "d924b7270ebed050a040882627c2c9edeabe16833fce05d56d9afd2bdd04ab67"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2577223.672784959,
+  "end_time": 2579376.526070405,
+  "total_evaluation_time_seconds": "2152.8532854458317"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_dyk_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-13-01.731913.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "results": {
+    "polish_dyk_multiple_choice": {
+      "acc,none": 0.8571428571428571,
+      "acc_stderr,none": 0.010913926579250558,
+      "f1,none": 0.6508313539192399,
+      "f1_stderr,none": "N/A",
+      "acc_norm,none": 0.8571428571428571,
+      "acc_norm_stderr,none": 0.010913926579250558,
+      "alias": "polish_dyk_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_dyk_multiple_choice": []
+  },
+  "configs": {
+    "polish_dyk_multiple_choice": {
+      "task": "polish_dyk_multiple_choice",
+      "dataset_path": "allegro/klej-dyk",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Pytanie: \"{{question}}\"\nSugerowana odpowiedź: \"{{answer}}\"\nPytanie: Czy sugerowana odpowiedź na zadane pytanie jest poprawna?\nOdpowiedz krótko \"Tak\" lub \"Nie\". Prawidłowa odpowiedź:",
+      "doc_to_target": "{{target|int}}",
+      "doc_to_choice": [
+        "Nie",
+        "Tak"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"B\", \"C\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{question}} {{answer}}"
+    }
+  },
+  "versions": {
+    "polish_dyk_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_dyk_multiple_choice": 0
+  },
+  "higher_is_better": {
+    "polish_dyk_multiple_choice": {
+      "acc": true,
+      "acc_norm": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_dyk_multiple_choice": {
+      "original": 1029,
+      "effective": 1029
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_dyk_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8968518,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_dyk_multiple_choice": "614bab79a1ec3b666218bb65089e147f8ed82ac0a4d10ab14a57ffcc73379688"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2669732.988483474,
+  "end_time": 2669974.28807118,
+  "total_evaluation_time_seconds": "241.29958770610392"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_dyk_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-14-18.998768.json ADDED Viewed

	@@ -0,0 +1,118 @@

+{
+  "results": {
+    "polish_dyk_regex": {
+      "exact_match,score-first": 0.8532555879494655,
+      "exact_match_stderr,score-first": 0.01103630767704879,
+      "f1,score-first": 0.6591422121896162,
+      "f1_stderr,score-first": "N/A",
+      "alias": "polish_dyk_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_dyk_regex": []
+  },
+  "configs": {
+    "polish_dyk_regex": {
+      "task": "polish_dyk_regex",
+      "dataset_path": "allegro/klej-dyk",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Pytanie: \"{{question}}\"\nSugerowana odpowiedź: \"{{answer}}\"\nCzy sugerowana odpowiedź na zadane pytanie jest poprawna? Możliwe opcje:\nA - brakuje sugerowanej odpowiedzi\nB - nie, sugerowana odpowiedź nie jest poprawna\nC - tak, sugerowana odpowiedź jest poprawna\nD - brakuje pytania\nPrawidłowa opcja:",
+      "doc_to_target": "{{{0: 'A', 1: 'B', 2: 'C', 3: 'D'}.get(target|int + 1)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"B\", \"C\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{question}} {{answer}}"
+    }
+  },
+  "versions": {
+    "polish_dyk_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_dyk_regex": 0
+  },
+  "higher_is_better": {
+    "polish_dyk_regex": {
+      "exact_match": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_dyk_regex": {
+      "original": 1029,
+      "effective": 1029
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_dyk_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6296444,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_dyk_regex": "2649d9ae76c76684ce97aa5028f8024f8eda160a26db26b547c0abf175fb2de1"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2577223.673779933,
+  "end_time": 2577542.567465127,
+  "total_evaluation_time_seconds": "318.89368519419804"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_eq_bench_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-57-26.656191.json ADDED Viewed

	@@ -0,0 +1,131 @@

+{
+  "results": {
+    "polish_eq_bench": {
+      "first_eqbench,none": 48.33189616455903,
+      "first_eqbench_stderr,none": 2.573616088487117,
+      "first_percent_parseable,none": 100.0,
+      "first_percent_parseable_stderr,none": 0.0,
+      "revised_eqbench,none": 63.019918840007705,
+      "revised_eqbench_stderr,none": 2.3758111655038587,
+      "revised_percent_parseable,none": 99.41520467836257,
+      "revised_percent_parseable_stderr,none": 0.5847953216374274,
+      "average_eqbench,none": 55.67590750228339,
+      "average_eqbench_stderr,none": 2.1636830548973527,
+      "alias": "polish_eq_bench"
+    }
+  },
+  "group_subtasks": {
+    "polish_eq_bench": []
+  },
+  "configs": {
+    "polish_eq_bench": {
+      "task": "polish_eq_bench",
+      "dataset_path": "speakleash/EQ-Bench-PL",
+      "validation_split": "validation",
+      "doc_to_text": "{{prompt}}\nPierwsze oceny:\n",
+      "doc_to_target": "reference_answer_fullscale",
+      "process_results": "def score(docs, results):\n    first_pass_answers, revised_answers = parse(results[0])\n    reference = eval(docs[\"reference_answer\"])\n    reference_fullscale = eval(docs[\"reference_answer_fullscale\"])\n    first_pass_score = calculate_score(reference, first_pass_answers)\n    revised_pass_score = calculate_score(reference_fullscale, revised_answers)\n    scores= {'first_'+k: v for k, v in first_pass_score.items()}\n    scores.update({'revised_'+k: v for k, v in revised_pass_score.items()})\n    #add average score\n    scores['average_eqbench'] = (scores['first_eqbench'] + scores['revised_eqbench']) / 2\n    return scores\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "first_eqbench",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "first_percent_parseable",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "revised_eqbench",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "revised_percent_parseable",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "average_eqbench",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "max_gen_toks": 512,
+        "do_sample": false,
+        "temperature": 0.0,
+        "until": [
+          "</s>",
+          "[Koniec odpowiedzi]",
+          "Masz za zadanie"
+        ]
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 2.4
+      }
+    }
+  },
+  "versions": {
+    "polish_eq_bench": 2.4
+  },
+  "n-shot": {
+    "polish_eq_bench": 0
+  },
+  "higher_is_better": {
+    "polish_eq_bench": {
+      "first_eqbench": true,
+      "first_percent_parseable": true,
+      "revised_eqbench": true,
+      "revised_percent_parseable": true,
+      "average_eqbench": true
+    }
+  },
+  "n-samples": {
+    "polish_eq_bench": {
+      "original": 171,
+      "effective": 171
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_eq_bench/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6044407,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_eq_bench": "18b3ee14b53fb2aaee4430e37609e64896598b0efa26dc7ecf4e483eece3a6b3"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 780407.632073815,
+  "end_time": 783314.157407221,
+  "total_evaluation_time_seconds": "2906.5253334060544"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_eq_bench_first_turn_1723381723/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-35-21.291842.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "results": {
+    "polish_eq_bench_first_turn": {
+      "first_eqbench,none": 46.996619012784315,
+      "first_eqbench_stderr,none": 2.655038142048486,
+      "first_percent_parseable,none": 100.0,
+      "first_percent_parseable_stderr,none": 0.0,
+      "alias": "polish_eq_bench_first_turn"
+    }
+  },
+  "group_subtasks": {
+    "polish_eq_bench_first_turn": []
+  },
+  "configs": {
+    "polish_eq_bench_first_turn": {
+      "task": "polish_eq_bench_first_turn",
+      "dataset_path": "speakleash/EQ-Bench-PL-first-turn",
+      "validation_split": "validation",
+      "doc_to_text": "{{prompt}}\nOceny:\n",
+      "doc_to_target": "def doc_to_target(doc):\n    reference = eval(doc[\"reference_answer\"])\n\n    target = \"\"\n    for i in range(1, 5):\n        emotion = reference[f\"emotion{i}\"]\n        emotion_score = reference[f\"emotion{i}_score\"]\n        target += f\"{emotion}: {emotion_score}\\n\"\n    target += \"\\n\"\n\n    return target\n",
+      "process_results": "def score_first(docs, results):\n    first_pass_answers = dict(list(re.findall(r'(\\w+(?: \\w+)*):\\s+(\\d+)', results[0]))[:4])\n    reference = eval(docs[\"reference_answer\"])\n    first_pass_score = calculate_score(reference, first_pass_answers)\n    scores= {'first_'+k: v for k, v in first_pass_score.items()}\n    return scores\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "first_eqbench",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "first_percent_parseable",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "max_gen_toks": 512,
+        "do_sample": false,
+        "temperature": 0.0,
+        "until": [
+          "</s>",
+          "[Koniec odpowiedzi]",
+          "Masz za zadanie"
+        ]
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 2.4
+      }
+    }
+  },
+  "versions": {
+    "polish_eq_bench_first_turn": 2.4
+  },
+  "n-shot": {
+    "polish_eq_bench_first_turn": 0
+  },
+  "higher_is_better": {
+    "polish_eq_bench_first_turn": {
+      "first_eqbench": true,
+      "first_percent_parseable": true
+    }
+  },
+  "n-samples": {
+    "polish_eq_bench_first_turn": {
+      "original": 171,
+      "effective": 171
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_eq_bench_first_turn/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381747.7569976,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_eq_bench_first_turn": "0e253a32b5915f6d9cff628bdffb1f234618238d116e0a34217ec48916ba0a49"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2270424.400063744,
+  "end_time": 2272005.840581135,
+  "total_evaluation_time_seconds": "1581.4405173910782"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_klej_ner_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-21-30.374052.json ADDED Viewed

	@@ -0,0 +1,105 @@

+{
+  "results": {
+    "polish_klej_ner_multiple_choice": {
+      "acc,none": 0.46987366375121475,
+      "acc_stderr,none": 0.011004317088597403,
+      "acc_norm,none": 0.5092322643343051,
+      "acc_norm_stderr,none": 0.011022467118497213,
+      "alias": "polish_klej_ner_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_klej_ner_multiple_choice": []
+  },
+  "configs": {
+    "polish_klej_ner_multiple_choice": {
+      "task": "polish_klej_ner_multiple_choice",
+      "dataset_path": "allegro/klej-nkjp-ner",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "fewshot_split": "train",
+      "doc_to_text": "Zdanie: \"{{sentence}}\"\nJakiego rodzaju jest nazwana jednostka, jeżeli występuje w podanym zdaniu?\nMożliwe odpowiedzi: Brak nazwanej jednostki, Nazwa miejsca, Nazwa osoby, Nazwa organizacji, Czas, Nazwa geograficzna.\nRodzaj:",
+      "doc_to_target": "{{{'noEntity': 0, 'placeName': 1, 'persName': 2, 'orgName': 3, 'time': 4, 'geogName': 5}.get(target)}}",
+      "doc_to_choice": [
+        "Brak nazwanej jednostki",
+        "Nazwa miejsca",
+        "Nazwa osoby",
+        "Nazwa organizacji",
+        "Czas",
+        "Nazwa geograficzna"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polish_klej_ner_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_klej_ner_multiple_choice": 0
+  },
+  "higher_is_better": {
+    "polish_klej_ner_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_klej_ner_multiple_choice": {
+      "original": 2058,
+      "effective": 2058
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_klej_ner_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381747.6734633,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4499.98\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_klej_ner_multiple_choice": "09f6e903dc9fc050951f2c84685c285da57a8ddba1ff829ebb489df1ba737161"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2341192.242899954,
+  "end_time": 2341942.977311477,
+  "total_evaluation_time_seconds": "750.7344115232117"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_klej_ner_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T16-15-22.967823.json ADDED Viewed

	@@ -0,0 +1,112 @@

+{
+  "results": {
+    "polish_klej_ner_regex": {
+      "exact_match,score-first": 0.5388726919339164,
+      "exact_match_stderr,score-first": 0.010990978618734456,
+      "alias": "polish_klej_ner_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_klej_ner_regex": []
+  },
+  "configs": {
+    "polish_klej_ner_regex": {
+      "task": "polish_klej_ner_regex",
+      "dataset_path": "allegro/klej-nkjp-ner",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Zdanie: \"{{sentence}}\"\nPytanie: Jakiego rodzaju jest nazwana jednostka, jeżeli występuje w podanym zdaniu?\nMożliwe odpowiedzi:\nA - Brak nazwanej jednostki\nB - Nazwa miejsca\nC - Nazwa osoby\nD - Nazwa organizacji\nE - Czas\nF - Nazwa geograficzna\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{'noEntity': 'A', 'placeName': 'B', 'persName': 'C', 'orgName': 'D', 'time': 'E', 'geogName': 'F'}.get(target)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ",",
+          ";"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCDEF]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polish_klej_ner_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_klej_ner_regex": 0
+  },
+  "higher_is_better": {
+    "polish_klej_ner_regex": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polish_klej_ner_regex": {
+      "original": 2058,
+      "effective": 2058
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_klej_ner_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6294396,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_klej_ner_regex": "73b98cc9f2e2b0a3c1be3efc063d3765b29cbbfcadaa6952ff0b16c2aeca4784"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2577223.695462814,
+  "end_time": 2581206.536121368,
+  "total_evaluation_time_seconds": "3982.840658553876"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_polqa_closed_book_1723381723/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-22-25.200287.json ADDED Viewed

	@@ -0,0 +1,106 @@

+{
+  "results": {
+    "polish_polqa_closed_book": {
+      "exact_match,none": 0.09034267912772585,
+      "exact_match_stderr,none": 0.009242678703782942,
+      "levenshtein,none": 0.3904465212876428,
+      "levenshtein_stderr,none": "N/A",
+      "alias": "polish_polqa_closed_book"
+    }
+  },
+  "group_subtasks": {
+    "polish_polqa_closed_book": []
+  },
+  "configs": {
+    "polish_polqa_closed_book": {
+      "task": "polish_polqa_closed_book",
+      "dataset_path": "ipipan/polqa",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "process_docs": "def process_docs_closed(dataset: datasets.Dataset):\n    def _helper(doc):\n      doc[\"answers\"] = ast.literal_eval(doc['answers'])\n      return doc\n\n    used = set()\n\n    return dataset.remove_columns(COLUMNS_TO_REMOVE).filter(lambda example: example[\"relevant\"] and example['question'] not in used and (used.add(example['question']) or True)).map(_helper)\n",
+      "doc_to_text": "Pytanie: {{question}} \n Prawidłowa odpowiedź:",
+      "doc_to_target": "answers",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def levenshtein(predictions, references):\n    _prediction = predictions[0][0].lower()\n    prediction_number = get_number(_prediction)\n\n    _prediction = re.sub('\\.? ?(</s>)* ?$','',_prediction)\n\n    for reference in references:\n        reference_number = get_number(reference)\n\n        if reference_number is not None:\n            if reference_number == prediction_number:\n                return 1\n        else:\n            ld = distance(_prediction, reference.lower())\n            if ld<len(reference)/2:\n                return 1\n    return 0\n",
+          "aggregation": "def agg_levenshtein(items):\n    return sum(items)/len(items)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "\n",
+          "</s>"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{question}}"
+    }
+  },
+  "versions": {
+    "polish_polqa_closed_book": "Yaml"
+  },
+  "n-shot": {
+    "polish_polqa_closed_book": 0
+  },
+  "higher_is_better": {
+    "polish_polqa_closed_book": {
+      "exact_match": true,
+      "levenshtein": true
+    }
+  },
+  "n-samples": {
+    "polish_polqa_closed_book": {
+      "original": 963,
+      "effective": 963
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_polqa_closed_book/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381747.7568066,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_polqa_closed_book": "0c5507d60ba16e4142471afab656e1a5d591a0227302e05fdfbce5cc9f087079"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2270424.400038904,
+  "end_time": 2271229.748561826,
+  "total_evaluation_time_seconds": "805.3485229220241"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_polqa_open_book_1723381723/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-58-53.946082.json ADDED Viewed

	@@ -0,0 +1,106 @@

+{
+  "results": {
+    "polish_polqa_open_book": {
+      "exact_match,none": 0.23734817813765183,
+      "exact_match_stderr,none": 0.005526353270874367,
+      "levenshtein,none": 0.5875506072874493,
+      "levenshtein_stderr,none": "N/A",
+      "alias": "polish_polqa_open_book"
+    }
+  },
+  "group_subtasks": {
+    "polish_polqa_open_book": []
+  },
+  "configs": {
+    "polish_polqa_open_book": {
+      "task": "polish_polqa_open_book",
+      "dataset_path": "ipipan/polqa",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "process_docs": "def process_docs_open(dataset: datasets.Dataset):\n    def _helper(doc):\n      doc[\"answers\"] = ast.literal_eval(doc['answers'])\n      return doc\n\n    used = set()\n\n    return dataset.remove_columns(COLUMNS_TO_REMOVE).filter(lambda example: example[\"relevant\"] and (example['passage_text'],example['question']) not in used and (used.add((example['passage_text'],example['question'])) or True)).map(_helper)\n",
+      "doc_to_text": "Kontekst: {{passage_text}} \n Pytanie: {{question}} \n Prawidłowa odpowiedź:",
+      "doc_to_target": "answers",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def levenshtein(predictions, references):\n    _prediction = predictions[0][0].lower()\n    prediction_number = get_number(_prediction)\n\n    _prediction = re.sub('\\.? ?(</s>)* ?$','',_prediction)\n\n    for reference in references:\n        reference_number = get_number(reference)\n\n        if reference_number is not None:\n            if reference_number == prediction_number:\n                return 1\n        else:\n            ld = distance(_prediction, reference.lower())\n            if ld<len(reference)/2:\n                return 1\n    return 0\n",
+          "aggregation": "def agg_levenshtein(items):\n    return sum(items)/len(items)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "\n",
+          "</s>"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{passage_text}} {{question}}"
+    }
+  },
+  "versions": {
+    "polish_polqa_open_book": "Yaml"
+  },
+  "n-shot": {
+    "polish_polqa_open_book": 0
+  },
+  "higher_is_better": {
+    "polish_polqa_open_book": {
+      "exact_match": true,
+      "levenshtein": true
+    }
+  },
+  "n-samples": {
+    "polish_polqa_open_book": {
+      "original": 5928,
+      "effective": 5928
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_polqa_open_book/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381747.756693,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_polqa_open_book": "605bac12835fc2014ee7398cc41fe38316bab9148d464cc12f8034038e6dd744"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2270424.414197628,
+  "end_time": 2273418.492015983,
+  "total_evaluation_time_seconds": "2994.0778183550574"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_polqa_reranking_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-23-40.740746.json ADDED Viewed

	@@ -0,0 +1,101 @@

+{
+  "results": {
+    "polish_polqa_reranking_multiple_choice": {
+      "acc,none": 0.8055708552993941,
+      "acc_stderr,none": 0.0035107018856493904,
+      "acc_norm,none": 0.8055708552993941,
+      "acc_norm_stderr,none": 0.0035107018856493904,
+      "alias": "polish_polqa_reranking_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_polqa_reranking_multiple_choice": []
+  },
+  "configs": {
+    "polish_polqa_reranking_multiple_choice": {
+      "task": "polish_polqa_reranking_multiple_choice",
+      "dataset_path": "ipipan/polqa",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "process_docs": "def process_docs(dataset: datasets.Dataset):\n    def _helper(doc):\n      return doc\n\n    used = set()\n\n    return dataset.remove_columns(COLUMNS_TO_REMOVE).filter(lambda example: (example['passage_text'],example['question']) not in used and (used.add((example['passage_text'],example['question'])) or True)).map(_helper)\n",
+      "doc_to_text": "Kontekst: {{passage_text}} \n Pytanie: {{question}} \n Czy kontekst jest relewantny dla pytania? \n Odpowiedz krótko \"Tak\" lub \"Nie\". Prawidłowa odpowiedź:",
+      "doc_to_target": "{{relevant|int}}",
+      "doc_to_choice": [
+        "Nie",
+        "Tak"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{passage_text}} {{question}}"
+    }
+  },
+  "versions": {
+    "polish_polqa_reranking_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_polqa_reranking_multiple_choice": 0
+  },
+  "higher_is_better": {
+    "polish_polqa_reranking_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_polqa_reranking_multiple_choice": {
+      "original": 12709,
+      "effective": 12709
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_polqa_reranking_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381747.6733267,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4499.98\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_polqa_reranking_multiple_choice": "284e872060f899232535470a606d94a217d950995b140caeb313a8887ea3f0b4"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2341192.242885584,
+  "end_time": 2342073.337279379,
+  "total_evaluation_time_seconds": "881.0943937948905"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_poquad_open_book_1723381723/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T16-06-52.670471.json ADDED Viewed

	@@ -0,0 +1,104 @@

+{
+  "results": {
+    "polish_poquad_open_book": {
+      "exact_match,none": 0.0,
+      "exact_match_stderr,none": 0.0,
+      "levenshtein,none": 0.18771686328938236,
+      "levenshtein_stderr,none": "N/A",
+      "alias": "polish_poquad_open_book"
+    }
+  },
+  "group_subtasks": {
+    "polish_poquad_open_book": []
+  },
+  "configs": {
+    "polish_poquad_open_book": {
+      "task": "polish_poquad_open_book",
+      "dataset_path": "clarin-pl/poquad",
+      "training_split": "train",
+      "test_split": "validation",
+      "doc_to_text": "Tytuł: {{title}} \n Kontekst: {{context}} \n Pytanie: {{question}} \n Prawidłowa odpowiedź (krótki cytat z Kontekstu):",
+      "doc_to_target": "def doc_to_target(doc):\n    answer_list = doc[\"answers\"][\"text\"]\n    if len(answer_list) > 0:\n        answer = answer_list[0]\n    else:\n        answer = \"bez odpowiedzi\"\n    return \" \" + answer\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def levenshtein(predictions, references):\n    _prediction = predictions[0].lower().lstrip()\n    prediction_number = get_number(_prediction)\n\n    _prediction = re.sub('.? ?(</s>)* ?$', '', _prediction)\n\n    for reference in references:\n        reference_number = get_number(reference)\n\n        if reference_number is not None:\n            if reference_number == prediction_number:\n                return 1\n        else:\n            ld = distance(_prediction, reference.lower().lstrip())\n            if ld < len(reference)/2:\n                return 1\n    return 0\n",
+          "aggregation": "def agg_levenshtein(items):\n    return sum(items)/len(items)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "\n",
+          "</s>"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{context}} {{question}}"
+    }
+  },
+  "versions": {
+    "polish_poquad_open_book": "Yaml"
+  },
+  "n-shot": {
+    "polish_poquad_open_book": 0
+  },
+  "higher_is_better": {
+    "polish_poquad_open_book": {
+      "exact_match": true,
+      "levenshtein": true
+    }
+  },
+  "n-samples": {
+    "polish_poquad_open_book": {
+      "original": 5764,
+      "effective": 5764
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_poquad_open_book/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381747.7568138,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_poquad_open_book": "19564a782a5615c456e7084c72f26ca5fb6bc601f54dc4bfc5dec174c4e06e50"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2270424.41309478,
+  "end_time": 2273897.216942648,
+  "total_evaluation_time_seconds": "3472.8038478679955"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_ppc_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-13-02.213869.json ADDED Viewed

	@@ -0,0 +1,101 @@

+{
+  "results": {
+    "polish_ppc_multiple_choice": {
+      "acc,none": 0.74,
+      "acc_stderr,none": 0.013877773329774164,
+      "acc_norm,none": 0.74,
+      "acc_norm_stderr,none": 0.013877773329774164,
+      "alias": "polish_ppc_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_ppc_multiple_choice": []
+  },
+  "configs": {
+    "polish_ppc_multiple_choice": {
+      "task": "polish_ppc_multiple_choice",
+      "dataset_path": "sdadas/ppc",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Zdanie A: \"{{sentence_A}}\"\nZdanie B: \"{{sentence_B}}\"\nPytanie: jaka jest zależność między zdaniami A i B? Możliwe odpowiedzi:\nA - znaczą dokładnie to samo\nB - mają podobne znaczenie\nC - mają różne znaczenie\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{label|int - 1}}",
+      "doc_to_choice": [
+        "A",
+        "B",
+        "C"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence_A}} {{sentence_B}}"
+    }
+  },
+  "versions": {
+    "polish_ppc_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_ppc_multiple_choice": 0
+  },
+  "higher_is_better": {
+    "polish_ppc_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_ppc_multiple_choice": {
+      "original": 1000,
+      "effective": 1000
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_ppc_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8964221,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_ppc_multiple_choice": "8747fd84df6316b9938ebd5eafb0d8cedcf82a0c184659a0eea7878c4dacdafa"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2669732.988217134,
+  "end_time": 2669974.769807927,
+  "total_evaluation_time_seconds": "241.78159079281613"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_ppc_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-44-08.270834.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+  "results": {
+    "polish_ppc_regex": {
+      "exact_match,score-first": 0.703,
+      "exact_match_stderr,score-first": 0.014456832294801106,
+      "alias": "polish_ppc_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_ppc_regex": []
+  },
+  "configs": {
+    "polish_ppc_regex": {
+      "task": "polish_ppc_regex",
+      "dataset_path": "sdadas/ppc",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Zdanie A: \"{{sentence_A}}\"\nZdanie B: \"{{sentence_B}}\"\nPytanie: jaka jest zależność między zdaniami A i B? Możliwe odpowiedzi:\nA - wszystkie odpowiedzi poprawne\nB - znaczą dokładnie to samo\nC - mają podobne znaczenie\nD - mają różne znaczenie\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{0: 'A', 1: 'B', 2: 'C', 3: 'D'}.get(label|int)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence_A}} {{sentence_B}}"
+    }
+  },
+  "versions": {
+    "polish_ppc_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_ppc_regex": 0
+  },
+  "higher_is_better": {
+    "polish_ppc_regex": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polish_ppc_regex": {
+      "original": 1000,
+      "effective": 1000
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_ppc_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6299686,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_ppc_regex": "55067cab325af5b68fb5e678581d4bfe4d45a600032d75381bf251f3aa9d8c91"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2577223.665428845,
+  "end_time": 2579331.839649978,
+  "total_evaluation_time_seconds": "2108.174221132882"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_psc_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-11-55.449227.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "results": {
+    "polish_psc_multiple_choice": {
+      "acc,none": 0.9656771799628943,
+      "acc_stderr,none": 0.005547529422575579,
+      "f1,none": 0.9428129829984544,
+      "f1_stderr,none": "N/A",
+      "acc_norm,none": 0.9656771799628943,
+      "acc_norm_stderr,none": 0.005547529422575579,
+      "alias": "polish_psc_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_psc_multiple_choice": []
+  },
+  "configs": {
+    "polish_psc_multiple_choice": {
+      "task": "polish_psc_multiple_choice",
+      "dataset_path": "allegro/klej-psc",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Tekst: \"{{extract_text}}\"\nPodsumowanie: \"{{summary_text}}\"\nPytanie: Czy podsumowanie dla podanego tekstu jest poprawne?\nOdpowiedz krótko \"Tak\" lub \"Nie\". Prawidłowa odpowiedź:",
+      "doc_to_target": "{{label|int}}",
+      "doc_to_choice": [
+        "Nie",
+        "Tak"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"B\", \"C\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{extract_text}} {{summary_text}}"
+    }
+  },
+  "versions": {
+    "polish_psc_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_psc_multiple_choice": 0
+  },
+  "higher_is_better": {
+    "polish_psc_multiple_choice": {
+      "acc": true,
+      "acc_norm": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_psc_multiple_choice": {
+      "original": 1078,
+      "effective": 1078
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_psc_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8966577,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_psc_multiple_choice": "53dfd060110a8ece3c4bf785c368bbf87ac6ae87f1d76781f1ffac90beb47879"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2669732.988093483,
+  "end_time": 2669908.004738389,
+  "total_evaluation_time_seconds": "175.01664490625262"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_psc_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-50-13.495944.json ADDED Viewed

	@@ -0,0 +1,118 @@

+{
+  "results": {
+    "polish_psc_regex": {
+      "exact_match,score-first": 0.7588126159554731,
+      "exact_match_stderr,score-first": 0.01303577072183474,
+      "f1,score-first": 0.8123167155425219,
+      "f1_stderr,score-first": "N/A",
+      "alias": "polish_psc_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_psc_regex": []
+  },
+  "configs": {
+    "polish_psc_regex": {
+      "task": "polish_psc_regex",
+      "dataset_path": "allegro/klej-psc",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Fragment 1: \"{{extract_text}}\"\nFragment 2: \"{{summary_text}}\"\nPytanie: jaka jest zależność między fragmentami 1 i 2?\nMożliwe odpowiedzi:\nA - wszystkie odpowiedzi poprawne\nB - dotyczą tego samego artykułu\nC - dotyczą różnych artykułów\nD - brak poprawnej odpowiedzi\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{0: 'A', 1: 'C', 2: 'B', 3: 'D'}.get(label|int + 1)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"B\", \"C\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{extract_text}} {{summary_text}}"
+    }
+  },
+  "versions": {
+    "polish_psc_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_psc_regex": 0
+  },
+  "higher_is_better": {
+    "polish_psc_regex": {
+      "exact_match": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_psc_regex": {
+      "original": 1078,
+      "effective": 1078
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-0_polish_psc_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6296773,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_psc_regex": "0065cab6bd75fa16d7b0b782973d6452c76ac6f272bfcb4e037049c1d2a420a5"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2577223.665712506,
+  "end_time": 2579697.064617748,
+  "total_evaluation_time_seconds": "2473.398905241862"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_in_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-25-19.492688.json ADDED Viewed

	@@ -0,0 +1,118 @@

+{
+  "results": {
+    "polemo2_in": {
+      "exact_match,score-first": 0.8559556786703602,
+      "exact_match_stderr,score-first": 0.01307693837899346,
+      "alias": "polemo2_in"
+    }
+  },
+  "group_subtasks": {
+    "polemo2_in": []
+  },
+  "configs": {
+    "polemo2_in": {
+      "task": "polemo2_in",
+      "group": [
+        "polemo2"
+      ],
+      "dataset_path": "allegro/klej-polemo2-in",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Opinia: \"{{sentence}}\"\nOkreśl sentyment podanej opinii. Możliwe odpowiedzi:\nA - Neutralny\nB - Negatywny\nC - Pozytywny\nD - Niejednoznaczny\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{'__label__meta_zero': 'A', '__label__meta_minus_m': 'B', '__label__meta_plus_m': 'C', '__label__meta_amb': 'D'}.get(target)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true,
+          "hf_evaluate": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}",
+      "metadata": {
+        "version": 1.0
+      }
+    }
+  },
+  "versions": {
+    "polemo2_in": 1.0
+  },
+  "n-shot": {
+    "polemo2_in": 5
+  },
+  "higher_is_better": {
+    "polemo2_in": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polemo2_in": {
+      "original": 722,
+      "effective": 722
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_in/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8777437,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polemo2_in": "311cf476a99939086a838a34ed5ebef9530cbeea1609d0919757a7dd473b40d1"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2272911.893989815,
+  "end_time": 2273890.992245757,
+  "total_evaluation_time_seconds": "979.09825594211"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_in_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-24-50.869505.json ADDED Viewed

	@@ -0,0 +1,105 @@

+{
+  "results": {
+    "polemo2_in_multiple_choice": {
+      "acc,none": 0.871191135734072,
+      "acc_stderr,none": 0.012475615091746169,
+      "acc_norm,none": 0.8725761772853186,
+      "acc_norm_stderr,none": 0.012418220256560223,
+      "alias": "polemo2_in_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polemo2_in_multiple_choice": []
+  },
+  "configs": {
+    "polemo2_in_multiple_choice": {
+      "task": "polemo2_in_multiple_choice",
+      "group": [
+        "polemo2_mc"
+      ],
+      "dataset_path": "allegro/klej-polemo2-in",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Opinia: \"{{sentence}}\"\nOkreśl sentyment podanej opinii: Neutralny, Negatywny, Pozytywny, Niejednoznaczny.\nSentyment:",
+      "doc_to_target": "{{['__label__meta_zero', '__label__meta_minus_m', '__label__meta_plus_m', '__label__meta_amb'].index(target)}}",
+      "doc_to_choice": [
+        "Neutralny",
+        "Negatywny",
+        "Pozytywny",
+        "Niejednoznaczny"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polemo2_in_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polemo2_in_multiple_choice": 5
+  },
+  "higher_is_better": {
+    "polemo2_in_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polemo2_in_multiple_choice": {
+      "original": 722,
+      "effective": 722
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_in_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381734.9804056,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4499.90\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polemo2_in_multiple_choice": "721bf5bd2111822d757513497aaacb13ff7172a1c79e8d903e554ae7db248670"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2270387.730317351,
+  "end_time": 2271351.381778121,
+  "total_evaluation_time_seconds": "963.6514607700519"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_out_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-22-20.849828.json ADDED Viewed

	@@ -0,0 +1,118 @@

+{
+  "results": {
+    "polemo2_out": {
+      "exact_match,score-first": 0.7550607287449392,
+      "exact_match_stderr,score-first": 0.01936853142177567,
+      "alias": "polemo2_out"
+    }
+  },
+  "group_subtasks": {
+    "polemo2_out": []
+  },
+  "configs": {
+    "polemo2_out": {
+      "task": "polemo2_out",
+      "group": [
+        "polemo2"
+      ],
+      "dataset_path": "allegro/klej-polemo2-out",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Opinia: \"{{sentence}}\"\nOkreśl sentyment podanej opinii. Możliwe odpowiedzi:\nA - Neutralny\nB - Negatywny\nC - Pozytywny\nD - Niejednoznaczny\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{'__label__meta_zero': 'A', '__label__meta_minus_m': 'B', '__label__meta_plus_m': 'C', '__label__meta_amb': 'D'}.get(target)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true,
+          "hf_evaluate": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}",
+      "metadata": {
+        "version": 1.0
+      }
+    }
+  },
+  "versions": {
+    "polemo2_out": 1.0
+  },
+  "n-shot": {
+    "polemo2_out": 5
+  },
+  "higher_is_better": {
+    "polemo2_out": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polemo2_out": {
+      "original": 494,
+      "effective": 494
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_out/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8777425,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polemo2_out": "f4c38529c6c2d9871f34d315f5afa8b183cba25e628c029de45011230d53fac1"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2272911.893717436,
+  "end_time": 2273712.349220811,
+  "total_evaluation_time_seconds": "800.4555033748038"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_out_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-19-39.147509.json ADDED Viewed

	@@ -0,0 +1,105 @@

+{
+  "results": {
+    "polemo2_out_multiple_choice": {
+      "acc,none": 0.7753036437246964,
+      "acc_stderr,none": 0.018797949035330906,
+      "acc_norm,none": 0.7854251012145749,
+      "acc_norm_stderr,none": 0.01848921134882508,
+      "alias": "polemo2_out_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polemo2_out_multiple_choice": []
+  },
+  "configs": {
+    "polemo2_out_multiple_choice": {
+      "task": "polemo2_out_multiple_choice",
+      "group": [
+        "polemo2_mc"
+      ],
+      "dataset_path": "allegro/klej-polemo2-out",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Opinia: \"{{sentence}}\"\nOkreśl sentyment podanej opinii: Neutralny, Negatywny, Pozytywny, Niejednoznaczny.\nSentyment:",
+      "doc_to_target": "{{['__label__meta_zero', '__label__meta_minus_m', '__label__meta_plus_m', '__label__meta_amb'].index(target)}}",
+      "doc_to_choice": [
+        "Neutralny",
+        "Negatywny",
+        "Pozytywny",
+        "Niejednoznaczny"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polemo2_out_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polemo2_out_multiple_choice": 5
+  },
+  "higher_is_better": {
+    "polemo2_out_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polemo2_out_multiple_choice": {
+      "original": 494,
+      "effective": 494
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polemo2_out_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381734.9805498,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4499.90\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polemo2_out_multiple_choice": "45b774f8cfb07b51343dc4aba756739ac8f3ad9410eae31ce9abcab2243c33c6"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2270387.730003106,
+  "end_time": 2271039.659552081,
+  "total_evaluation_time_seconds": "651.9295489750803"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_8tags_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-53-51.017953.json ADDED Viewed

	@@ -0,0 +1,106 @@

+{
+  "results": {
+    "polish_8tags_multiple_choice": {
+      "acc,none": 0.7936870997255261,
+      "acc_stderr,none": 0.006120648645628871,
+      "acc_norm,none": 0.7881976212259836,
+      "acc_norm_stderr,none": 0.0061800583187814695,
+      "alias": "polish_8tags_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_8tags_multiple_choice": []
+  },
+  "configs": {
+    "polish_8tags_multiple_choice": {
+      "task": "polish_8tags_multiple_choice",
+      "dataset_path": "sdadas/8tags",
+      "training_split": "train",
+      "test_split": "test",
+      "fewshot_split": "train",
+      "doc_to_text": "Tytuł: \"{{sentence}}\"\nDo podanego tytułu przyporządkuj jedną najlepiej pasującą kategorię z podanych: Film, Historia, Jedzenie, Medycyna, Motoryzacja, Praca, Sport, Technologie.\nKategoria:",
+      "doc_to_target": "{{label|int}}",
+      "doc_to_choice": [
+        "Film",
+        "Historia",
+        "Jedzenie",
+        "Medycyna",
+        "Motoryzacja",
+        "Praca",
+        "Sport",
+        "Technologie"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polish_8tags_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_8tags_multiple_choice": 5
+  },
+  "higher_is_better": {
+    "polish_8tags_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_8tags_multiple_choice": {
+      "original": 4372,
+      "effective": 4372
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_8tags_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381736.832911,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_8tags_multiple_choice": "73f7a912bc6b67622aaf742339f1fd7d8c602e2bba1d366f9084ffdcd115da22"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 779444.242893573,
+  "end_time": 782146.665026425,
+  "total_evaluation_time_seconds": "2702.4221328520216"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_8tags_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T17-01-59.819478.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+  "results": {
+    "polish_8tags_regex": {
+      "exact_match,score-first": 0.780192131747484,
+      "exact_match_stderr,score-first": 0.006263715115123265,
+      "alias": "polish_8tags_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_8tags_regex": []
+  },
+  "configs": {
+    "polish_8tags_regex": {
+      "task": "polish_8tags_regex",
+      "dataset_path": "sdadas/8tags",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Tytuł: \"{{sentence}}\"\nPytanie: jaka kategoria najlepiej pasuje do podanego tytułu?\nMożliwe odpowiedzi:\nA - film\nB - historia\nC - jedzenie\nD - medycyna\nE - motoryzacja\nF - praca\nG - sport\nH - technologie\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F', 6: 'G', 7: 'H'}.get(label)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCDEFGH]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polish_8tags_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_8tags_regex": 5
+  },
+  "higher_is_better": {
+    "polish_8tags_regex": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polish_8tags_regex": {
+      "original": 4372,
+      "effective": 4372
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_8tags_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.877596,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_8tags_regex": "db46138093af0d6032c98a8689c46f46e11c222dafe4ae0444f5c2f86b97dde9"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2272911.894059325,
+  "end_time": 2279691.316861562,
+  "total_evaluation_time_seconds": "6779.4228022368625"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_belebele_mc_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-15-54.278792.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "results": {
+    "polish_belebele_mc": {
+      "acc,none": 0.8855555555555555,
+      "acc_stderr,none": 0.010617576963634284,
+      "acc_norm,none": 0.8855555555555555,
+      "acc_norm_stderr,none": 0.010617576963634284,
+      "alias": "polish_belebele_mc"
+    }
+  },
+  "group_subtasks": {
+    "polish_belebele_mc": []
+  },
+  "configs": {
+    "polish_belebele_mc": {
+      "task": "polish_belebele_mc",
+      "dataset_path": "facebook/belebele",
+      "test_split": "pol_Latn",
+      "fewshot_split": "pol_Latn",
+      "doc_to_text": "Fragment: \"{{flores_passage}}\"\nPytanie: \"{{question}}\"\nMożliwe odpowiedzi:\nA - {{mc_answer1}}\nB - {{mc_answer2}}\nC - {{mc_answer3}}\nD - {{mc_answer4}}\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{['1', '2', '3', '4'].index(correct_answer_num)}}",
+      "doc_to_choice": [
+        "A",
+        "B",
+        "C",
+        "D"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "fewshot_config": {
+        "sampler": "first_n"
+      },
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{question}}",
+      "metadata": {
+        "version": 0.0
+      }
+    }
+  },
+  "versions": {
+    "polish_belebele_mc": 0.0
+  },
+  "n-shot": {
+    "polish_belebele_mc": 5
+  },
+  "higher_is_better": {
+    "polish_belebele_mc": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_belebele_mc": {
+      "original": 900,
+      "effective": 900
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_belebele_mc/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381736.8325982,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_belebele_mc": "3617d71c141947146b1331680272d92dc45753002d91f496be692e189d2c3338"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 779444.243477263,
+  "end_time": 779869.928050127,
+  "total_evaluation_time_seconds": "425.6845728639746"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_belebele_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-36-23.654679.json ADDED Viewed

	@@ -0,0 +1,109 @@

+{
+  "results": {
+    "polish_belebele_regex": {
+      "exact_match,score-first": 0.8888888888888888,
+      "exact_match_stderr,score-first": 0.010481480680812841,
+      "alias": "polish_belebele_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_belebele_regex": []
+  },
+  "configs": {
+    "polish_belebele_regex": {
+      "task": "polish_belebele_regex",
+      "dataset_path": "facebook/belebele",
+      "test_split": "pol_Latn",
+      "doc_to_text": "Fragment: \"{{flores_passage}}\"\nPytanie: \"{{question}}\"\nMożliwe odpowiedzi:\nA - {{mc_answer1}}\nB - {{mc_answer2}}\nC - {{mc_answer3}}\nD - {{mc_answer4}}\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{0: 'A', 1: 'B', 2: 'C', 3: 'D'}.get(correct_answer_num|int - 1)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{flores_passage}} {{question}} {{mc_answer1}} {{mc_answer2}} {{mc_answer3}} {{mc_answer4}}"
+    }
+  },
+  "versions": {
+    "polish_belebele_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_belebele_regex": 5
+  },
+  "higher_is_better": {
+    "polish_belebele_regex": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polish_belebele_regex": {
+      "original": 900,
+      "effective": 900
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_belebele_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8774083,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_belebele_regex": "f24c47726a598a1d1eea361393c09e061f3bbf93fc16ed74e92c70bd969e71f2"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2272911.893726246,
+  "end_time": 2274555.15416838,
+  "total_evaluation_time_seconds": "1643.260442133993"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_cbd_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-21-52.993123.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+  "results": {
+    "polish_cbd_multiple_choice": {
+      "acc,none": 0.74,
+      "acc_stderr,none": 0.013877773329774166,
+      "f1,none": 0.3516898467962298,
+      "f1_stderr,none": "N/A",
+      "acc_norm,none": 0.747,
+      "acc_norm_stderr,none": 0.01375427861358708,
+      "alias": "polish_cbd_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_cbd_multiple_choice": []
+  },
+  "configs": {
+    "polish_cbd_multiple_choice": {
+      "task": "polish_cbd_multiple_choice",
+      "dataset_path": "ptaszynski/PolishCyberbullyingDataset",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Wypowiedź: \"{{TEXT}}\"\nDo podanej wypowiedzi przyporządkuj jedną, najlepiej pasującą kategorię z podanych: nieszkodliwa, szyderstwo, obelga, insynuacja, groźba, molestowanie.\nKategoria:",
+      "doc_to_target": "{{{'szyderstwo': 1, 'obelga': 2, 'insynuacja': 3, 'grozba': 4, 'molestowanie': 5}.get(CATEGORIES, 0)}}",
+      "doc_to_choice": [
+        "nieszkodliwa",
+        "szyderstwo",
+        "obelga",
+        "insynuacja",
+        "groźba",
+        "molestowanie"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"A\", \"B\", \"C\", \"D\", \"E\", \"F\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1_macro(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions, average='macro')\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{TEXT}}"
+    }
+  },
+  "versions": {
+    "polish_cbd_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_cbd_multiple_choice": 5
+  },
+  "higher_is_better": {
+    "polish_cbd_multiple_choice": {
+      "acc": true,
+      "acc_norm": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_cbd_multiple_choice": {
+      "original": 1000,
+      "effective": 1000
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_cbd_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381736.832983,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_cbd_multiple_choice": "7f04a198edb8f2a8d7c7854adaca6f42c6ab2547d80482066cd86becf9e6cd6c"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 779444.243050853,
+  "end_time": 780228.642777935,
+  "total_evaluation_time_seconds": "784.3997270819964"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_cbd_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-33-25.750066.json ADDED Viewed

	@@ -0,0 +1,119 @@

+{
+  "results": {
+    "polish_cbd_regex": {
+      "exact_match,score-first": 0.75,
+      "exact_match_stderr,score-first": 0.013699915608779773,
+      "f1,score-first": 0.3634343551926929,
+      "f1_stderr,score-first": "N/A",
+      "alias": "polish_cbd_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_cbd_regex": []
+  },
+  "configs": {
+    "polish_cbd_regex": {
+      "task": "polish_cbd_regex",
+      "dataset_path": "ptaszynski/PolishCyberbullyingDataset",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Wypowiedź: \"{{TEXT}}\"\nPytanie: Jaka kategoria najlepiej pasuje do podanej wypowiedzi?\nMożliwe odpowiedzi:\nA - nieszkodliwa\nB - szyderstwo\nC - obelga\nD - insynuacja\nE - groźba\nF - molestowanie\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{'szyderstwo': 'B', 'obelga': 'C', 'insynuacja': 'D', 'grozba': 'E', 'molestowanie': 'F'}.get(CATEGORIES, 'A')}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"A\", \"B\", \"C\", \"D\", \"E\", \"F\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1_macro(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions, average='macro')\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ",",
+          ";"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCDEF]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{TEXT}}"
+    }
+  },
+  "versions": {
+    "polish_cbd_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_cbd_regex": 5
+  },
+  "higher_is_better": {
+    "polish_cbd_regex": {
+      "exact_match": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_cbd_regex": {
+      "original": 1000,
+      "effective": 1000
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_cbd_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6048338,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_cbd_regex": "71dc0083f6f8b533188cbedcb2ea9d61ba63ef8ff3f6bb1c08f1844c9335ddf4"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 780407.632129005,
+  "end_time": 781873.251077335,
+  "total_evaluation_time_seconds": "1465.6189483299386"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_dyk_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-14-08.007826.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "results": {
+    "polish_dyk_multiple_choice": {
+      "acc,none": 0.8794946550048591,
+      "acc_stderr,none": 0.010153673638096375,
+      "f1,none": 0.7004830917874396,
+      "f1_stderr,none": "N/A",
+      "acc_norm,none": 0.8794946550048591,
+      "acc_norm_stderr,none": 0.010153673638096375,
+      "alias": "polish_dyk_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_dyk_multiple_choice": []
+  },
+  "configs": {
+    "polish_dyk_multiple_choice": {
+      "task": "polish_dyk_multiple_choice",
+      "dataset_path": "allegro/klej-dyk",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Pytanie: \"{{question}}\"\nSugerowana odpowiedź: \"{{answer}}\"\nPytanie: Czy sugerowana odpowiedź na zadane pytanie jest poprawna?\nOdpowiedz krótko \"Tak\" lub \"Nie\". Prawidłowa odpowiedź:",
+      "doc_to_target": "{{target|int}}",
+      "doc_to_choice": [
+        "Nie",
+        "Tak"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"B\", \"C\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{question}} {{answer}}"
+    }
+  },
+  "versions": {
+    "polish_dyk_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_dyk_multiple_choice": 5
+  },
+  "higher_is_better": {
+    "polish_dyk_multiple_choice": {
+      "acc": true,
+      "acc_norm": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_dyk_multiple_choice": {
+      "original": 1029,
+      "effective": 1029
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_dyk_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381736.8329315,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_dyk_multiple_choice": "90a835c3521affda43e1b7e595ec145d189a8781186b0d67f0a20cbb60069d75"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 779444.242757443,
+  "end_time": 779763.657474826,
+  "total_evaluation_time_seconds": "319.41471738298424"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_dyk_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-54-27.674557.json ADDED Viewed

	@@ -0,0 +1,118 @@

+{
+  "results": {
+    "polish_dyk_regex": {
+      "exact_match,score-first": 0.8785228377065112,
+      "exact_match_stderr,score-first": 0.010188899761066529,
+      "f1,score-first": 0.7126436781609196,
+      "f1_stderr,score-first": "N/A",
+      "alias": "polish_dyk_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_dyk_regex": []
+  },
+  "configs": {
+    "polish_dyk_regex": {
+      "task": "polish_dyk_regex",
+      "dataset_path": "allegro/klej-dyk",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Pytanie: \"{{question}}\"\nSugerowana odpowiedź: \"{{answer}}\"\nCzy sugerowana odpowiedź na zadane pytanie jest poprawna? Możliwe opcje:\nA - brakuje sugerowanej odpowiedzi\nB - nie, sugerowana odpowiedź nie jest poprawna\nC - tak, sugerowana odpowiedź jest poprawna\nD - brakuje pytania\nPrawidłowa opcja:",
+      "doc_to_target": "{{{0: 'A', 1: 'B', 2: 'C', 3: 'D'}.get(target|int + 1)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"B\", \"C\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{question}} {{answer}}"
+    }
+  },
+  "versions": {
+    "polish_dyk_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_dyk_regex": 5
+  },
+  "higher_is_better": {
+    "polish_dyk_regex": {
+      "exact_match": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_dyk_regex": {
+      "original": 1029,
+      "effective": 1029
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_dyk_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.877224,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_dyk_regex": "ff511210f55c111bbc6d0c4cd80c3d7b334eaf5227fb2ed749d0a0530e518b27"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2272911.894157125,
+  "end_time": 2275639.174286722,
+  "total_evaluation_time_seconds": "2727.2801295970567"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_eq_bench_first_turn_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-18-22.563512.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "results": {
+    "polish_eq_bench_first_turn": {
+      "first_eqbench,none": 70.08076901067246,
+      "first_eqbench_stderr,none": 2.1051510636673663,
+      "first_percent_parseable,none": 100.0,
+      "first_percent_parseable_stderr,none": 0.0,
+      "alias": "polish_eq_bench_first_turn"
+    }
+  },
+  "group_subtasks": {
+    "polish_eq_bench_first_turn": []
+  },
+  "configs": {
+    "polish_eq_bench_first_turn": {
+      "task": "polish_eq_bench_first_turn",
+      "dataset_path": "speakleash/EQ-Bench-PL-first-turn",
+      "validation_split": "validation",
+      "doc_to_text": "{{prompt}}\nOceny:\n",
+      "doc_to_target": "def doc_to_target(doc):\n    reference = eval(doc[\"reference_answer\"])\n\n    target = \"\"\n    for i in range(1, 5):\n        emotion = reference[f\"emotion{i}\"]\n        emotion_score = reference[f\"emotion{i}_score\"]\n        target += f\"{emotion}: {emotion_score}\\n\"\n    target += \"\\n\"\n\n    return target\n",
+      "process_results": "def score_first(docs, results):\n    first_pass_answers = dict(list(re.findall(r'(\\w+(?: \\w+)*):\\s+(\\d+)', results[0]))[:4])\n    reference = eval(docs[\"reference_answer\"])\n    first_pass_score = calculate_score(reference, first_pass_answers)\n    scores= {'first_'+k: v for k, v in first_pass_score.items()}\n    return scores\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "first_eqbench",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "first_percent_parseable",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "max_gen_toks": 512,
+        "do_sample": false,
+        "temperature": 0.0,
+        "until": [
+          "</s>",
+          "[Koniec odpowiedzi]",
+          "Masz za zadanie"
+        ]
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 2.4
+      }
+    }
+  },
+  "versions": {
+    "polish_eq_bench_first_turn": 2.4
+  },
+  "n-shot": {
+    "polish_eq_bench_first_turn": 5
+  },
+  "higher_is_better": {
+    "polish_eq_bench_first_turn": {
+      "first_eqbench": true,
+      "first_percent_parseable": true
+    }
+  },
+  "n-samples": {
+    "polish_eq_bench_first_turn": {
+      "original": 171,
+      "effective": 171
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_eq_bench_first_turn/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6045775,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_eq_bench_first_turn": "80a40657adcfe9c62884d65078de0204ecd846ef1614217065f11a87cbb0ad87"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 780407.631921335,
+  "end_time": 780970.064839895,
+  "total_evaluation_time_seconds": "562.4329185599927"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_klej_ner_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-35-52.497622.json ADDED Viewed

	@@ -0,0 +1,105 @@

+{
+  "results": {
+    "polish_klej_ner_multiple_choice": {
+      "acc,none": 0.5383867832847424,
+      "acc_stderr,none": 0.010991808831354909,
+      "acc_norm,none": 0.5291545189504373,
+      "acc_norm_stderr,none": 0.011005589555788344,
+      "alias": "polish_klej_ner_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_klej_ner_multiple_choice": []
+  },
+  "configs": {
+    "polish_klej_ner_multiple_choice": {
+      "task": "polish_klej_ner_multiple_choice",
+      "dataset_path": "allegro/klej-nkjp-ner",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "fewshot_split": "train",
+      "doc_to_text": "Zdanie: \"{{sentence}}\"\nJakiego rodzaju jest nazwana jednostka, jeżeli występuje w podanym zdaniu?\nMożliwe odpowiedzi: Brak nazwanej jednostki, Nazwa miejsca, Nazwa osoby, Nazwa organizacji, Czas, Nazwa geograficzna.\nRodzaj:",
+      "doc_to_target": "{{{'noEntity': 0, 'placeName': 1, 'persName': 2, 'orgName': 3, 'time': 4, 'geogName': 5}.get(target)}}",
+      "doc_to_choice": [
+        "Brak nazwanej jednostki",
+        "Nazwa miejsca",
+        "Nazwa osoby",
+        "Nazwa organizacji",
+        "Czas",
+        "Nazwa geograficzna"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polish_klej_ner_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_klej_ner_multiple_choice": 5
+  },
+  "higher_is_better": {
+    "polish_klej_ner_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_klej_ner_multiple_choice": {
+      "original": 2058,
+      "effective": 2058
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_klej_ner_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8774905,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_klej_ner_multiple_choice": "382e085067293307f61df6d4b8dde438e9a35b2296d59d664ba9e1861a8fb319"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2272911.893820256,
+  "end_time": 2274523.996078617,
+  "total_evaluation_time_seconds": "1612.1022583609447"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_klej_ner_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-57-18.271570.json ADDED Viewed

	@@ -0,0 +1,112 @@

+{
+  "results": {
+    "polish_klej_ner_regex": {
+      "exact_match,score-first": 0.5515063168124392,
+      "exact_match_stderr,score-first": 0.010965697594667088,
+      "alias": "polish_klej_ner_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_klej_ner_regex": []
+  },
+  "configs": {
+    "polish_klej_ner_regex": {
+      "task": "polish_klej_ner_regex",
+      "dataset_path": "allegro/klej-nkjp-ner",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Zdanie: \"{{sentence}}\"\nPytanie: Jakiego rodzaju jest nazwana jednostka, jeżeli występuje w podanym zdaniu?\nMożliwe odpowiedzi:\nA - Brak nazwanej jednostki\nB - Nazwa miejsca\nC - Nazwa osoby\nD - Nazwa organizacji\nE - Czas\nF - Nazwa geograficzna\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{'noEntity': 'A', 'placeName': 'B', 'persName': 'C', 'orgName': 'D', 'time': 'E', 'geogName': 'F'}.get(target)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ",",
+          ";"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCDEF]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence}}"
+    }
+  },
+  "versions": {
+    "polish_klej_ner_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_klej_ner_regex": 5
+  },
+  "higher_is_better": {
+    "polish_klej_ner_regex": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polish_klej_ner_regex": {
+      "original": 2058,
+      "effective": 2058
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_klej_ner_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.60427,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_klej_ner_regex": "ab6f4267720bbc662460a7390651b5fc2a339d12301ae3ba0cba80f4ffe4fe5f"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 780407.631669254,
+  "end_time": 783305.771917303,
+  "total_evaluation_time_seconds": "2898.140248048934"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_pes_1723381722/results_2024-08-27T17-50-52.063138.json ADDED Viewed

The diff for this file is too large to render. See raw diff

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_polqa_closed_book_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-16-23.588300.json ADDED Viewed

	@@ -0,0 +1,106 @@

+{
+  "results": {
+    "polish_polqa_closed_book": {
+      "exact_match,none": 0.7144340602284528,
+      "exact_match_stderr,none": 0.014562862295117392,
+      "levenshtein,none": 0.8328141225337488,
+      "levenshtein_stderr,none": "N/A",
+      "alias": "polish_polqa_closed_book"
+    }
+  },
+  "group_subtasks": {
+    "polish_polqa_closed_book": []
+  },
+  "configs": {
+    "polish_polqa_closed_book": {
+      "task": "polish_polqa_closed_book",
+      "dataset_path": "ipipan/polqa",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "process_docs": "def process_docs_closed(dataset: datasets.Dataset):\n    def _helper(doc):\n      doc[\"answers\"] = ast.literal_eval(doc['answers'])\n      return doc\n\n    used = set()\n\n    return dataset.remove_columns(COLUMNS_TO_REMOVE).filter(lambda example: example[\"relevant\"] and example['question'] not in used and (used.add(example['question']) or True)).map(_helper)\n",
+      "doc_to_text": "Pytanie: {{question}} \n Prawidłowa odpowiedź:",
+      "doc_to_target": "answers",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def levenshtein(predictions, references):\n    _prediction = predictions[0][0].lower()\n    prediction_number = get_number(_prediction)\n\n    _prediction = re.sub('\\.? ?(</s>)* ?$','',_prediction)\n\n    for reference in references:\n        reference_number = get_number(reference)\n\n        if reference_number is not None:\n            if reference_number == prediction_number:\n                return 1\n        else:\n            ld = distance(_prediction, reference.lower())\n            if ld<len(reference)/2:\n                return 1\n    return 0\n",
+          "aggregation": "def agg_levenshtein(items):\n    return sum(items)/len(items)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "\n",
+          "</s>"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{question}}"
+    }
+  },
+  "versions": {
+    "polish_polqa_closed_book": "Yaml"
+  },
+  "n-shot": {
+    "polish_polqa_closed_book": 5
+  },
+  "higher_is_better": {
+    "polish_polqa_closed_book": {
+      "exact_match": true,
+      "levenshtein": true
+    }
+  },
+  "n-samples": {
+    "polish_polqa_closed_book": {
+      "original": 963,
+      "effective": 963
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_polqa_closed_book/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.604107,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_polqa_closed_book": "87d8cfbe97dc8a4ad77df54784eea533389ce029734e03de52acd682a4293a8e"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 780407.631446274,
+  "end_time": 780851.088669831,
+  "total_evaluation_time_seconds": "443.4572235570522"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_polqa_open_book_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-47-00.491423.json ADDED Viewed

	@@ -0,0 +1,106 @@

+{
+  "results": {
+    "polish_polqa_open_book": {
+      "exact_match,none": 0.803306342780027,
+      "exact_match_stderr,none": 0.005163192439920857,
+      "levenshtein,none": 0.9239203778677463,
+      "levenshtein_stderr,none": "N/A",
+      "alias": "polish_polqa_open_book"
+    }
+  },
+  "group_subtasks": {
+    "polish_polqa_open_book": []
+  },
+  "configs": {
+    "polish_polqa_open_book": {
+      "task": "polish_polqa_open_book",
+      "dataset_path": "ipipan/polqa",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "process_docs": "def process_docs_open(dataset: datasets.Dataset):\n    def _helper(doc):\n      doc[\"answers\"] = ast.literal_eval(doc['answers'])\n      return doc\n\n    used = set()\n\n    return dataset.remove_columns(COLUMNS_TO_REMOVE).filter(lambda example: example[\"relevant\"] and (example['passage_text'],example['question']) not in used and (used.add((example['passage_text'],example['question'])) or True)).map(_helper)\n",
+      "doc_to_text": "Kontekst: {{passage_text}} \n Pytanie: {{question}} \n Prawidłowa odpowiedź:",
+      "doc_to_target": "answers",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def levenshtein(predictions, references):\n    _prediction = predictions[0][0].lower()\n    prediction_number = get_number(_prediction)\n\n    _prediction = re.sub('\\.? ?(</s>)* ?$','',_prediction)\n\n    for reference in references:\n        reference_number = get_number(reference)\n\n        if reference_number is not None:\n            if reference_number == prediction_number:\n                return 1\n        else:\n            ld = distance(_prediction, reference.lower())\n            if ld<len(reference)/2:\n                return 1\n    return 0\n",
+          "aggregation": "def agg_levenshtein(items):\n    return sum(items)/len(items)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "\n",
+          "</s>"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{passage_text}} {{question}}"
+    }
+  },
+  "versions": {
+    "polish_polqa_open_book": "Yaml"
+  },
+  "n-shot": {
+    "polish_polqa_open_book": 5
+  },
+  "higher_is_better": {
+    "polish_polqa_open_book": {
+      "exact_match": true,
+      "levenshtein": true
+    }
+  },
+  "n-samples": {
+    "polish_polqa_open_book": {
+      "original": 5928,
+      "effective": 5928
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_polqa_open_book/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6044483,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_polqa_open_book": "1b7dbda5fd3d68d2b8f1d9ca3aecb84324d7c15639dca3a82f584f73f81e734f"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 780407.632126545,
+  "end_time": 782687.990413396,
+  "total_evaluation_time_seconds": "2280.3582868510857"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_polqa_reranking_multiple_choice_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-50-02.859037.json ADDED Viewed

	@@ -0,0 +1,101 @@

+{
+  "results": {
+    "polish_polqa_reranking_multiple_choice": {
+      "acc,none": 0.8563222912896372,
+      "acc_stderr,none": 0.0031115351999876245,
+      "acc_norm,none": 0.8563222912896372,
+      "acc_norm_stderr,none": 0.0031115351999876245,
+      "alias": "polish_polqa_reranking_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_polqa_reranking_multiple_choice": []
+  },
+  "configs": {
+    "polish_polqa_reranking_multiple_choice": {
+      "task": "polish_polqa_reranking_multiple_choice",
+      "dataset_path": "ipipan/polqa",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "process_docs": "def process_docs(dataset: datasets.Dataset):\n    def _helper(doc):\n      return doc\n\n    used = set()\n\n    return dataset.remove_columns(COLUMNS_TO_REMOVE).filter(lambda example: (example['passage_text'],example['question']) not in used and (used.add((example['passage_text'],example['question'])) or True)).map(_helper)\n",
+      "doc_to_text": "Kontekst: {{passage_text}} \n Pytanie: {{question}} \n Czy kontekst jest relewantny dla pytania? \n Odpowiedz krótko \"Tak\" lub \"Nie\". Prawidłowa odpowiedź:",
+      "doc_to_target": "{{relevant|int}}",
+      "doc_to_choice": [
+        "Nie",
+        "Tak"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{passage_text}} {{question}}"
+    }
+  },
+  "versions": {
+    "polish_polqa_reranking_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_polqa_reranking_multiple_choice": 5
+  },
+  "higher_is_better": {
+    "polish_polqa_reranking_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_polqa_reranking_multiple_choice": {
+      "original": 12709,
+      "effective": 12709
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_polqa_reranking_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8773608,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_polqa_reranking_multiple_choice": "81b0a5c9f7c49792c084d2efb013d9475b0a80d66176de68f5f7c09c2464494a"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2272911.894071405,
+  "end_time": 2275374.35065494,
+  "total_evaluation_time_seconds": "2462.4565835352987"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_poquad_open_book_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T17-09-42.653951.json ADDED Viewed

	@@ -0,0 +1,104 @@

+{
+  "results": {
+    "polish_poquad_open_book": {
+      "exact_match,none": 0.37682165163081194,
+      "exact_match_stderr,none": 0.0063833666826593255,
+      "levenshtein,none": 0.6878903539208883,
+      "levenshtein_stderr,none": "N/A",
+      "alias": "polish_poquad_open_book"
+    }
+  },
+  "group_subtasks": {
+    "polish_poquad_open_book": []
+  },
+  "configs": {
+    "polish_poquad_open_book": {
+      "task": "polish_poquad_open_book",
+      "dataset_path": "clarin-pl/poquad",
+      "training_split": "train",
+      "test_split": "validation",
+      "doc_to_text": "Tytuł: {{title}} \n Kontekst: {{context}} \n Pytanie: {{question}} \n Prawidłowa odpowiedź (krótki cytat z Kontekstu):",
+      "doc_to_target": "def doc_to_target(doc):\n    answer_list = doc[\"answers\"][\"text\"]\n    if len(answer_list) > 0:\n        answer = answer_list[0]\n    else:\n        answer = \"bez odpowiedzi\"\n    return \" \" + answer\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def levenshtein(predictions, references):\n    _prediction = predictions[0].lower().lstrip()\n    prediction_number = get_number(_prediction)\n\n    _prediction = re.sub('.? ?(</s>)* ?$', '', _prediction)\n\n    for reference in references:\n        reference_number = get_number(reference)\n\n        if reference_number is not None:\n            if reference_number == prediction_number:\n                return 1\n        else:\n            ld = distance(_prediction, reference.lower().lstrip())\n            if ld < len(reference)/2:\n                return 1\n    return 0\n",
+          "aggregation": "def agg_levenshtein(items):\n    return sum(items)/len(items)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "\n",
+          "</s>"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{context}} {{question}}"
+    }
+  },
+  "versions": {
+    "polish_poquad_open_book": "Yaml"
+  },
+  "n-shot": {
+    "polish_poquad_open_book": 5
+  },
+  "higher_is_better": {
+    "polish_poquad_open_book": {
+      "exact_match": true,
+      "levenshtein": true
+    }
+  },
+  "n-samples": {
+    "polish_poquad_open_book": {
+      "original": 5764,
+      "effective": 5764
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_poquad_open_book/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6042013,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_poquad_open_book": "4052fd29bcd59435f258c0169cde1f29c3f22a618395f32cebf32e166bd3bf38"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 780407.631651954,
+  "end_time": 787650.152180919,
+  "total_evaluation_time_seconds": "7242.520528964931"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_ppc_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-13-23.877449.json ADDED Viewed

	@@ -0,0 +1,101 @@

+{
+  "results": {
+    "polish_ppc_multiple_choice": {
+      "acc,none": 0.789,
+      "acc_stderr,none": 0.012909130321042095,
+      "acc_norm,none": 0.789,
+      "acc_norm_stderr,none": 0.012909130321042095,
+      "alias": "polish_ppc_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_ppc_multiple_choice": []
+  },
+  "configs": {
+    "polish_ppc_multiple_choice": {
+      "task": "polish_ppc_multiple_choice",
+      "dataset_path": "sdadas/ppc",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Zdanie A: \"{{sentence_A}}\"\nZdanie B: \"{{sentence_B}}\"\nPytanie: jaka jest zależność między zdaniami A i B? Możliwe odpowiedzi:\nA - znaczą dokładnie to samo\nB - mają podobne znaczenie\nC - mają różne znaczenie\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{label|int - 1}}",
+      "doc_to_choice": [
+        "A",
+        "B",
+        "C"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence_A}} {{sentence_B}}"
+    }
+  },
+  "versions": {
+    "polish_ppc_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_ppc_multiple_choice": 5
+  },
+  "higher_is_better": {
+    "polish_ppc_multiple_choice": {
+      "acc": true,
+      "acc_norm": true
+    }
+  },
+  "n-samples": {
+    "polish_ppc_multiple_choice": {
+      "original": 1000,
+      "effective": 1000
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_ppc_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381736.8326252,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_ppc_multiple_choice": "c3554bdb1ae93597ea2150e4ff1a633019458db699b0cb1639d96dd3970b6939"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 779444.242971983,
+  "end_time": 779719.526449868,
+  "total_evaluation_time_seconds": "275.2834778849501"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_ppc_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-30-24.424865.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+  "results": {
+    "polish_ppc_regex": {
+      "exact_match,score-first": 0.793,
+      "exact_match_stderr,score-first": 0.01281855355784399,
+      "alias": "polish_ppc_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_ppc_regex": []
+  },
+  "configs": {
+    "polish_ppc_regex": {
+      "task": "polish_ppc_regex",
+      "dataset_path": "sdadas/ppc",
+      "training_split": "train",
+      "validation_split": "validation",
+      "test_split": "test",
+      "doc_to_text": "Zdanie A: \"{{sentence_A}}\"\nZdanie B: \"{{sentence_B}}\"\nPytanie: jaka jest zależność między zdaniami A i B? Możliwe odpowiedzi:\nA - wszystkie odpowiedzi poprawne\nB - znaczą dokładnie to samo\nC - mają podobne znaczenie\nD - mają różne znaczenie\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{0: 'A', 1: 'B', 2: 'C', 3: 'D'}.get(label|int)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{sentence_A}} {{sentence_B}}"
+    }
+  },
+  "versions": {
+    "polish_ppc_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_ppc_regex": 5
+  },
+  "higher_is_better": {
+    "polish_ppc_regex": {
+      "exact_match": true
+    }
+  },
+  "n-samples": {
+    "polish_ppc_regex": {
+      "original": 1000,
+      "effective": 1000
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_ppc_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.8771076,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_ppc_regex": "a218e651c94f2f850a86a4c0b91c5b5a37007e52526c54eff802a95592defbe3"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 2272911.893985835,
+  "end_time": 2274195.92427883,
+  "total_evaluation_time_seconds": "1284.030292995274"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_psc_multiple_choice_1723381711/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-18-48.485190.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "results": {
+    "polish_psc_multiple_choice": {
+      "acc,none": 0.9461966604823747,
+      "acc_stderr,none": 0.006875233780063374,
+      "f1,none": 0.9042904290429042,
+      "f1_stderr,none": "N/A",
+      "acc_norm,none": 0.9461966604823747,
+      "acc_norm_stderr,none": 0.006875233780063374,
+      "alias": "polish_psc_multiple_choice"
+    }
+  },
+  "group_subtasks": {
+    "polish_psc_multiple_choice": []
+  },
+  "configs": {
+    "polish_psc_multiple_choice": {
+      "task": "polish_psc_multiple_choice",
+      "dataset_path": "allegro/klej-psc",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Tekst: \"{{extract_text}}\"\nPodsumowanie: \"{{summary_text}}\"\nPytanie: Czy podsumowanie dla podanego tekstu jest poprawne?\nOdpowiedz krótko \"Tak\" lub \"Nie\". Prawidłowa odpowiedź:",
+      "doc_to_target": "{{label|int}}",
+      "doc_to_choice": [
+        "Nie",
+        "Tak"
+      ],
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "acc",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "acc_norm",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"B\", \"C\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "multiple_choice",
+      "repeats": 1,
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{extract_text}} {{summary_text}}"
+    }
+  },
+  "versions": {
+    "polish_psc_multiple_choice": "Yaml"
+  },
+  "n-shot": {
+    "polish_psc_multiple_choice": 5
+  },
+  "higher_is_better": {
+    "polish_psc_multiple_choice": {
+      "acc": true,
+      "acc_norm": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_psc_multiple_choice": {
+      "original": 1078,
+      "effective": 1078
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_psc_multiple_choice/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381736.8328693,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.00\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_psc_multiple_choice": "20f66e13606e4708007e9a49fc374f8348f8309b56413b4ea31956ce9f49c601"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 779444.242721803,
+  "end_time": 780044.13483785,
+  "total_evaluation_time_seconds": "599.8921160469763"
+}

eval-results/bielik2/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_psc_regex_1723381722/__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2/results_2024-08-11T15-47-28.998766.json ADDED Viewed

	@@ -0,0 +1,118 @@

+{
+  "results": {
+    "polish_psc_regex": {
+      "exact_match,score-first": 0.8942486085343229,
+      "exact_match_stderr,score-first": 0.00937053376963659,
+      "f1,score-first": 0.9228687415426252,
+      "f1_stderr,score-first": "N/A",
+      "alias": "polish_psc_regex"
+    }
+  },
+  "group_subtasks": {
+    "polish_psc_regex": []
+  },
+  "configs": {
+    "polish_psc_regex": {
+      "task": "polish_psc_regex",
+      "dataset_path": "allegro/klej-psc",
+      "training_split": "train",
+      "test_split": "test",
+      "doc_to_text": "Fragment 1: \"{{extract_text}}\"\nFragment 2: \"{{summary_text}}\"\nPytanie: jaka jest zależność między fragmentami 1 i 2?\nMożliwe odpowiedzi:\nA - wszystkie odpowiedzi poprawne\nB - dotyczą tego samego artykułu\nC - dotyczą różnych artykułów\nD - brak poprawnej odpowiedzi\nPrawidłowa odpowiedź:",
+      "doc_to_target": "{{{0: 'A', 1: 'C', 2: 'B', 3: 'D'}.get(label|int + 1)}}",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "num_fewshot": 5,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "def f1(predictions, references):\n    _prediction = predictions[0]\n    _reference = references[0]\n    string_label = [\"B\", \"C\"]\n    reference = string_label.index(_reference)\n    prediction = (\n        string_label.index(_prediction)\n        if _prediction in string_label\n        else 0\n    )\n\n    return (prediction, reference)\n",
+          "aggregation": "def agg_f1(items):\n    predictions, references = zip(*items)\n    references, predictions = np.asarray(references), np.asarray(predictions)\n\n    return sklearn.metrics.f1_score(references, predictions)\n",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          ".",
+          ","
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 50
+      },
+      "repeats": 1,
+      "filter_list": [
+        {
+          "name": "score-first",
+          "filter": [
+            {
+              "function": "regex",
+              "regex_pattern": "(\\b[ABCD]\\b)"
+            },
+            {
+              "function": "take_first"
+            }
+          ]
+        }
+      ],
+      "should_decontaminate": true,
+      "doc_to_decontamination_query": "{{extract_text}} {{summary_text}}"
+    }
+  },
+  "versions": {
+    "polish_psc_regex": "Yaml"
+  },
+  "n-shot": {
+    "polish_psc_regex": 5
+  },
+  "higher_is_better": {
+    "polish_psc_regex": {
+      "exact_match": true,
+      "f1": true
+    }
+  },
+  "n-samples": {
+    "polish_psc_regex": {
+      "original": 1078,
+      "effective": 1078
+    }
+  },
+  "config": {
+    "model": "hf",
+    "model_args": "pretrained=speakleash/Bielik-11B-v2.1-Instruct,dtype=bfloat16,trust_remote_code=True",
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": "sqlite_caches/plgchriso/models/bielik_11B-v2_dpo/dpo5-001_e2_-5_polish_psc_regex/",
+    "limit": null,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": null,
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": "2132286",
+  "date": 1723381748.6041322,
+  "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Rocky Linux 9.3 (Blue Onyx) (x86_64)\nGCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)\nClang version: Could not collect\nCMake version: Could not collect\nLibc version: glibc-2.34\n\nPython version: 3.10.4 (main, Dec 14 2022, 11:01:42) [GCC 11.3.0] (64-bit runtime)\nPython platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34\nIs CUDA available: True\nCUDA runtime version: 12.1.105\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Could not collect\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                       x86_64\nCPU op-mode(s):                     32-bit, 64-bit\nAddress sizes:                      43 bits physical, 48 bits virtual\nByte Order:                         Little Endian\nCPU(s):                             128\nOn-line CPU(s) list:                0-127\nVendor ID:                          AuthenticAMD\nModel name:                         AMD EPYC 7742 64-Core Processor\nCPU family:                         23\nModel:                              49\nThread(s) per core:                 1\nCore(s) per socket:                 64\nSocket(s):                          2\nStepping:                           0\nFrequency boost:                    enabled\nCPU max MHz:                        2250.0000\nCPU min MHz:                        1500.0000\nBogoMIPS:                           4500.15\nFlags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es\nVirtualization:                     AMD-V\nL1d cache:                          4 MiB (128 instances)\nL1i cache:                          4 MiB (128 instances)\nL2 cache:                           64 MiB (128 instances)\nL3 cache:                           512 MiB (32 instances)\nNUMA node(s):                       4\nNUMA node0 CPU(s):                  0-31\nNUMA node1 CPU(s):                  32-63\nNUMA node2 CPU(s):                  64-95\nNUMA node3 CPU(s):                  96-127\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit:        Not affected\nVulnerability L1tf:                 Not affected\nVulnerability Mds:                  Not affected\nVulnerability Meltdown:             Not affected\nVulnerability Mmio stale data:      Not affected\nVulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled\nVulnerability Spec rstack overflow: Mitigation; SMT disabled\nVulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds:                Not affected\nVulnerability Tsx async abort:      Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] Could not collect",
+  "transformers_version": "4.43.1",
+  "upper_git_hash": "2132286315025b3abd7a22b7309f7052be200287",
+  "task_hashes": {
+    "polish_psc_regex": "62e1c0c7b4494ec1f99bb0c7eeaad898f5b3e48f9263f8212e2a9759d5499045"
+  },
+  "model_source": "hf",
+  "model_name": "speakleash/Bielik-11B-v2.1-Instruct",
+  "model_name_sanitized": "__net__pr2__projects__plgrid__plggspkl__plgchriso__models__bielik_11B-v2_dpo__dpo5-001_e2",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "chat_template": null,
+  "chat_template_sha": null,
+  "start_time": 780407.631957345,
+  "end_time": 782716.499648762,
+  "total_evaluation_time_seconds": "2308.8676914171083"
+}