|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
basemodel: meta-llama/Meta-Llama-3-8B-Instruct |
|
datasets: |
|
- Saxo/total_ko_train_set_1_without_wiki_with_orca |
|
language: |
|
- ko |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<div align="center"> |
|
<img src="https://www.linkbricks.com/wp-content/uploads/2022/03/%E1%84%85%E1%85%B5%E1%86%BC%E1%84%8F%E1%85%B3%E1%84%87%E1%85%B3%E1%84%85%E1%85%B5%E1%86%A8%E1%84%89%E1%85%B3%E1%84%85%E1%85%A9%E1%84%80%E1%85%A9-2-1024x804.png" /> |
|
</div> |
|
|
|
|
|
AI ์ ๋น
๋ฐ์ดํฐ ๋ถ์ ์ ๋ฌธ ๊ธฐ์
์ธ Linkbricks์ ๋ฐ์ดํฐ์ฌ์ด์ธํฐ์คํธ์ธ ์ง์ค์ฑ(Saxo) ์ด์ฌ๊ฐ meta-llama/Meta-Llama-3-8B๋ฅผ ๋ฒ ์ด์ค๋ชจ๋ธ๋ก GCP์์ H100-80G 8๊ฐ๋ฅผ ํตํด SFT-DPO ํ๋ จ์ ํ(8000 Tokens) ํ๊ธ ๊ธฐ๋ฐ ๋ชจ๋ธ. |
|
ํ ํฌ๋์ด์ ๋ ๋ผ๋ง3๋ ๋์ผํ๋ฉฐ ํ๊ธ VOCA ํ์ฅ์ ํ์ง ์์ ๋ฒ์ ์
๋๋ค. ํ๊ธ์ด 20๋ง๊ฐ ์ด์ ํฌํจ๋ ํ๊ธ์ ์ฉ ํ ํฌ๋์ด์ ๋ชจ๋ธ์ ๋ณ๋ ์ฐ๋ฝ ์ฃผ์๊ธฐ ๋ฐ๋๋๋ค. |
|
|
|
Dr. Yunsung Ji (Saxo), a data scientist at Linkbricks, a company specializing in AI and big data analytics, trained the meta-llama/Meta-Llama-3-8B base model on 8 H100-60Gs on GCP for 4 hours of instructional training (8000 Tokens). |
|
Accelerate, Deepspeed Zero-3 libraries were used. |
|
|
|
www.linkbricks.com, www.linkbricks.vc |
|
|
|
## Configuration including BitsandBytes |
|
--- |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_use_double_quant=False, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch_dtype |
|
) |
|
|
|
|
|
args = TrainingArguments( |
|
output_dir=project_name, |
|
run_name=run_name_str, |
|
overwrite_output_dir=True, |
|
num_train_epochs=20, |
|
per_device_train_batch_size=1, |
|
gradient_accumulation_steps=4, #1 |
|
gradient_checkpointing=True, |
|
optim="paged_adamw_32bit", |
|
#optim="adamw_8bit", |
|
logging_steps=10, |
|
save_steps=100, |
|
save_strategy="epoch", |
|
learning_rate=2e-4, #2e-4 |
|
weight_decay=0.01, |
|
max_grad_norm=1, #0.3 |
|
max_steps=-1, |
|
warmup_ratio=0.1, |
|
group_by_length=False, |
|
fp16 = not torch.cuda.is_bf16_supported(), |
|
bf16 = torch.cuda.is_bf16_supported(), |
|
#fp16 = True, |
|
lr_scheduler_type="cosine", #"constant", |
|
disable_tqdm=False, |
|
report_to='wandb', |
|
push_to_hub=False |
|
) |
|
|