Model Summary

Hanscripter is an instruction-tuned language model focused on translation classical Chinese (i.e WenYanwen 文言文) to English. Our Github repo.

  • Base Model: Meta-Llama-3-8B-Instruct
  • SFT Dataset: KaifengGGG/WenYanWen_English_Parallel
  • Fine-tune Method: QLoRA

Version

Usage

Fine-tuning Details

Below are detailed descriptions of the various parameters and technologies used.

LoRA Parameters

  • lora_r: 64
  • lora_alpha: 16
  • lora_dropout: 0.1

Quantization

The model uses Bitsandbytes for state-of-the-art model quantization, enhancing computational efficiency:

  • use_4bit: True - Enables the use of 4-bit quantization.
  • bnb_4bit_compute_dtype: "float16" - The datatype used for computation in quantized state.
  • bnb_4bit_quant_type: "nf4" - Specifies the quantization type.
  • use_nested_quant: False - Nested quantization is not used.

Training Arguments

Settings for training the model are as follows:

  • num_train_epochs: 10
  • fp16: False
  • bf16: True - Optimized for use with A100 GPUs, employing Brain Floating Point (bf16).
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • gradient_accumulation_steps: 4
  • gradient_checkpointing: True
  • max_grad_norm: 0.3
  • learning_rate: 0.0002
  • weight_decay: 0.001
  • optim: "paged_adamw_32bit"
  • lr_scheduler_type: "cosine"
  • max_steps: -1
  • warmup_ratio: 0.03
  • group_by_length: True
Downloads last month
13
Safetensors
Model size
8.03B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train KaifengGGG/Llama3-8b-Hanscripter