Hi AlzarTakkarsen
#4
by
Teera
- opened
This comment has been hidden
Hi I can share some hyperparameters.
This model is base on maywell/Synatra-7B-v0.3-RP
lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
with batch size 20