Hi AlzarTakkarsen

#4
by Teera - opened
This comment has been hidden

Hi I can share some hyperparameters.

This model is base on maywell/Synatra-7B-v0.3-RP

lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:

  • gate_proj
  • down_proj
  • up_proj
  • q_proj
  • v_proj
  • k_proj
  • o_proj

optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

with batch size 20

Sign up or log in to comment