|
--- |
|
language: |
|
- en |
|
- lug |
|
tags: |
|
- llama-3.1 |
|
- gemma-2b |
|
- finetuned |
|
- english-luganda |
|
- translation |
|
- peft |
|
- qlora |
|
--- |
|
|
|
# final_model_8b_16 |
|
|
|
This model is finetuned for English-Luganda bidirectional translation tasks. It's trained using QLoRA (Quantized Low-Rank Adaptation) on the original LLaMA-3.1-8B model. |
|
|
|
## Model Details |
|
|
|
### Base Model Information |
|
- Base model: unsloth/Meta-Llama-3.1-8B |
|
- Model family: LLaMA-3.1-8B |
|
- Type: Base |
|
- Original model size: 8B parameters |
|
|
|
### Training Configuration |
|
- Training method: QLoRA (4-bit quantization) |
|
- LoRA rank (r): 16 |
|
- LoRA alpha: 16 |
|
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
- LoRA dropout: 0 |
|
- Learning rate: 2e-5 |
|
- Batch size: 2 |
|
- Gradient accumulation steps: 4 |
|
- Max sequence length: 2048 |
|
- Weight decay: 0.01 |
|
- Training steps: 100,000 |
|
- Warmup steps: 1000 |
|
- Save interval: 10,000 steps |
|
- Optimizer: AdamW (8-bit) |
|
- LR scheduler: Cosine |
|
- Mixed precision: bf16 |
|
- Gradient checkpointing: Enabled (unsloth) |
|
|
|
### Dataset Information |
|
- Training data: Parallel English-Luganda corpus |
|
- Data sources: |
|
- SALT dataset (salt-train-v1.4) |
|
- Extracted parallel sentences |
|
- Synthetic code-mixed data |
|
- Bidirectional translation: Trained on both English→Luganda and Luganda→English |
|
- Total training examples: Varies by direction |
|
|
|
### Usage |
|
This model uses an instruction-based prompt format: |
|
``` |
|
Below is an instruction that describes a task, |
|
paired with an input that provides further context. |
|
Write a response that appropriately completes the request. |
|
|
|
### Instruction: |
|
Translate the following text to [target_lang] |
|
|
|
### Input: |
|
[input text] |
|
|
|
### Response: |
|
[translation] |
|
``` |
|
|
|
## Training Infrastructure |
|
- Trained using unsloth optimization library |
|
- Hardware: Single A100 GPU |
|
- Quantization: 4-bit training enabled |
|
|
|
## Limitations |
|
- The model is specialized for English-Luganda translation |
|
- Performance may vary based on domain and complexity of text |
|
- Limited to the context length of 16 tokens |
|
|
|
## Citation and Contact |
|
If you use this model, please cite: |
|
- Original LLaMA-3.1 model by Meta AI |
|
- QLoRA paper: Dettmers et al. (2023) |
|
- unsloth optimization library |
|
|