ryanu
/

EEVE-10.8-BOOK-v0.1

@@ -1,65 +1,55 @@
-Book (사회과학, 기술과학, 철학, 법학, 예술 등) - 5000개
-qlora
-max_seq_length=1024
-num_train_epochs=3
-per_device_train_batch_size=8
-gradient_accumulation_steps=32,
-evaluation_strategy="steps"
-eval_steps=2000,
-logging_steps=25,
-optim="paged_adamw_8bit",
-learning_rate=2e-4,
-lr_scheduler_type="cosine",
-warmup_steps=10,
-warmup_ratio=0.05,
-report_to="tensorboard",
-weight_decay=0.01,
-max_steps=-1,
-| Model | rouge-1 | rouge-2 | rouge-l |
-|-------|---------|---------|---------|
-| **Book** | | | |
-| yanolja/EEVE-Korean-Instruct-2.8B-v1.0 | 0.2095 | 0.0866 | 0.1985 |
-| ryanu/EEVE-10.8-BOOK-v0.1 | **0.2454 | **0.1158 | **0.2404 |
-| meta-llama/llama-3-8b-instruct | 0.2137 | 0.0883 | 0.2020 |
-| meta-llama/llama-3-70b-instruct | 0.2269 | 0.0925 | 0.2186 |
-| **Paper** | | | |
-| yanolja/EEVE-Korean-Instruct-2.8B-v1.0 | 0.1934 | 0.0829 | 0.1832 |
-| meta-llama/llama-3-8b-instruct | **0.2044 | **0.0868 | 0.1895 |
-| meta-llama/llama-3-70b-instruct | 0.1935 | 0.0783 | 0.1836 |
-| ryanu/EEVE-10.8-BOOK-v0.1 | 0.2004 | 0.0860 | **0.1938 |
@@ -71,4 +61,5 @@ prompt template
 문장: {context}
 요약: {summary}
 -------------------------------

+| 파라미터 | 값 |
+|----------|-----|
+| Task     | Book (사회과학, 기술과학, 철학, 법학, 예술 등) |
+| 데이터 크기 | 5000개 |
+| 모델     | qlora |
+| max_seq_length | 1024 |
+| num_train_epochs | 3 |
+| per_device_train_batch_size | 8 |
+| gradient_accumulation_steps | 32 |
+| evaluation_strategy | "steps" |
+| eval_steps | 2000 |
+| logging_steps | 25 |
+| optim | "paged_adamw_8bit" |
+| learning_rate | 2e-4 |
+| lr_scheduler_type | "cosine" |
+| warmup_steps | 10 |
+| warmup_ratio | 0.05 |
+| report_to | "tensorboard" |
+| weight_decay | 0.01 |
+| max_steps | -1 |
+| **Book** | | | |
+| 모델 이름                                     | Rouge-1 | Rouge-2 | Rouge-L |
+|----------------------------------------------|---------|---------|---------|
+| *ryanu/EEVE-10.8-BOOK-v0.1                   | 0.2454  | 0.1158  | 0.2404  |
+| meta-llama/llama-3-70b-instruct              | 0.2269  | 0.0925  | 0.2186  |
+| meta-llama/llama-3-8b-instruct               | 0.2137  | 0.0883  | 0.2020  |
+| yanolja/EEVE-Korean-Instruct-2.8B-v1.0       | 0.2095  | 0.0866  | 0.1985  |
+| mistralai/mixtral-8x7b-instruct-v0-1         | 0.1735  | 0.0516  | 0.1668  |
+| ibm-mistralai/mixtral-8x7b-instruct-v01-q    | 0.1724  | 0.0534  | 0.1630  |
+| **Paper** | | | |
+| 모델 이름                                     | Rouge-1 | Rouge-2 | Rouge-L |
+|----------------------------------------------|---------|---------|---------|
+| *meta-llama/llama-3-8b-instruct               | 0.2044  | 0.0868  | 0.1895  |
+| ryanu/EEVE-10.8-BOOK-v0.1                    | 0.2004  | 0.0860  | 0.1938  |
+| meta-llama/llama-3-70b-instruct              | 0.1935  | 0.0783  | 0.1836  |
+| yanolja/EEVE-Korean-Instruct-2.8B-v1.0       | 0.1934  | 0.0829  | 0.1832  |
+| mistralai/mixtral-8x7b-instruct-v0-1         | 0.1774  | 0.0601  | 0.1684  |
+| ibm-mistralai/mixtral-8x7b-instruct-v01-q    | 0.1702  | 0.0561  | 0.1605  |
+| **RAG Q&A** | | | |
+| 모델 이름                                     | Rouge-1 | Rouge-2 | Rouge-L |
+|----------------------------------------------|---------|---------|---------|
+| *meta-llama/llama-3-70b-instruct             | 0.4418  | 0.2986  | 0.4297  |
+| *meta-llama/llama-3-8b-instruct              | 0.4391  | 0.3100  | 0.4273  |
+| mistralai/mixtral-8x7b-instruct-v0-1         | 0.4022  | 0.2653  | 0.3916  |
+| ibm-mistralai/mixtral-8x7b-instruct-v01-q    | 0.3105  | 0.1763  | 0.2960  |
+| yanolja/EEVE-Korean-Instruct-10.8B-v1.0      | 0.3191  | 0.2069  | 0.3136  |
 문장: {context}
 요약: {summary}
 -------------------------------