Built with Axolotl

See axolotl config

axolotl version: 0.4.1

base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

# load_in_4bit: true

chat_template: chatml
datasets:
  - path: /workspace/datasets/dolphin201-sharegpt2.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/SystemChat_filtered_sharegpt.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/SystemChat_multilingual_sharegpt.jsonl
    type: sharegpt
    conversation: chatml
  # - path: /workspace/datasets/SystemChat-2.0-Arabic/SystemChatArabic_sharegpt.jsonl
  #   type: sharegpt
  #   conversation: chatml
  - path: /workspace/datasets/dolphin-coder-translate-sharegpt2.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/dolphin-coder-codegen-sharegpt2.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/not_samantha_norefusals.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/Orca-Math-resort-unfiltered.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/agent_instruct_react_unfiltered.jsonl
    type: sharegpt  
    conversation: chatml
  - path: /workspace/datasets/toolbench_instruct_j1s1_3k_unfiltered.jsonl
    type: sharegpt  
    conversation: chatml
  - path: /workspace/datasets/toolbench_negative_unfiltered.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/toolbench_react_10p_unfiltered.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/toolbench_tflan_cot_30p_unfiltered.jsonl
    type: sharegpt
    conversation: chatml
  - path: /workspace/datasets/openhermes200k_unfiltered.jsonl
    type: sharegpt 
    conversation: chatml

dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./llama-3-8b-2.9.3

sequence_len: 8192
sample_packing: false
pad_to_sequence_len: false

# adapter: qlora
# lora_r: 16
# lora_alpha: 32
# lora_dropout: 0.05
# lora_target_modules:
#   - q_proj
#   - k_proj
#   - v_proj
#   - o_proj
#   - gate_proj
#   - up_proj
#   - down_proj

wandb_project: 2.9.3-llama-3-8b
# wandb_entity: oaaic
# wandb_watch:
# wandb_name:
# wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_8bit
lr_scheduler: cosine
learning_rate: 1e-5
# max_grad_norm: 1.0

train_on_inputs: false
group_by_length: false
bf16: true
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
logging_steps: 1
flash_attention: true
deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
warmup_steps: 10
evals_per_epoch: 2
saves_per_epoch: 2
save_total_limit: 2
weight_decay: 0.1
special_tokens:
  eos_token: "<|im_end|>"
  pad_token: "<|end_of_text|>"
tokens:
  - "<|im_start|>"
  - "<|im_end|>"

Visualize in Weights & Biases

llama-3-8b-2.9.3

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5771

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
1.005 0.0001 1 0.9649
0.6468 0.5000 5058 0.6022
0.6648 1.0000 10116 0.5731
0.4983 1.5000 15174 0.5668
0.394 2.0000 20232 0.5478
0.3182 2.4999 25290 0.5781
0.2916 2.9999 30348 0.5771

Framework versions

  • Transformers 4.42.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
2,832
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for cognitivecomputations/dolphin-2.9.3-llama-3-8b

Finetuned
(374)
this model
Finetunes
10 models
Merges
1 model
Quantizations
6 models

Space using cognitivecomputations/dolphin-2.9.3-llama-3-8b 1