File size: 5,589 Bytes
269e35a 8579c16 819655b 269e35a 819655b 8579c16 269e35a 819655b 71afc1a 819655b 71afc1a 269e35a 819655b 269e35a 819655b 71afc1a 819655b 71afc1a 269e35a 819655b 269e35a 819655b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
---
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
model_name: llama-8b-south-africa
languages:
- Xhosa
- Zulu
- Tswana
- Northern Sotho
- Afrikaans
license: apache-2.0
tags:
- african-languages
- multilingual
- instruction-tuning
- transfer-learning
library_name: peft
model_description: |
This model is a fine-tuned version of Meta's LLaMA-3.1-8B-Instruct model, specifically adapted for South African languages. The training data consists of the Alpaca Cleaned dataset translated into five South African languages: Xhosa, Zulu, Tswana, Northern Sotho, and Afrikaans using machine translation techniques.
Key Features:
- Base architecture: LLaMA-3.1-8B-Instruct
- Training approach: Instruction tuning via translated datasets
- Target languages: 5 South African languages
- Cost-efficient: Total cost ~$1,870 ($370/language for translation + $15 for training)
training_details:
hyperparameters:
learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 8
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: "Adam with betas=(0.9,0.999) and epsilon=1e-08"
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1
seed: 42
distributed_type: multi-GPU
results:
final_loss: 1.0959
validation_loss: 0.0571
total_steps: 5596
completed_epochs: 0.9999
model_evaluation:
xhosa:
afrimgsm:
accuracy: 0.02
afrimmlu:
accuracy: 0.29
afrixnli:
accuracy: 0.44
zulu:
afrimgsm:
accuracy: 0.045
afrimmlu:
accuracy: 0.29
afrixnli:
accuracy: 0.43
limitations: |
- Current evaluation metrics are limited to Xhosa and Zulu due to Iroko language availability
- Machine translation was used for training data generation, which may impact quality
- Low performance on certain tasks (particularly AfriMGSM) suggests room for improvement
framework_versions:
pytorch: 2.4.1+cu121
transformers: 4.44.2
peft: 0.12.0
datasets: 3.0.0
tokenizers: 0.19.1
resources:
benchmark_visualization: assets/Benchmarks_(1).pdf
training_dataset: https://huggingface.co/datasets/yahma/alpaca-cleaned
---
# LLaMA-3.1-8B South African Languages Model
This model card provides detailed information about the LLaMA-3.1-8B model fine-tuned for South African languages. The model demonstrates cost-effective cross-lingual transfer learning for African language processing.
## Model Overview
The model is based on Meta's LLaMA-3.1-8B-Instruct architecture and has been fine-tuned on translated versions of the Alpaca Cleaned dataset. The training approach leverages machine translation to create instruction-tuning data in five South African languages, making it a cost-effective solution for multilingual AI development.
## Training Methodology
### Dataset Preparation
The training data was created by translating the Alpaca Cleaned dataset into five target languages:
- Xhosa
- Zulu
- Tswana
- Northern Sotho
- Afrikaans
Machine translation was used to generate the training data, with a cost of $370 per language.
### Training Process
The model was trained using the PEFT (Parameter-Efficient Fine-Tuning) library on the Akash Compute Network. Key aspects of the training process include:
- Single epoch training
- Multi-GPU distributed training setup
- Cosine learning rate schedule with 10% warmup
- Adam optimizer with β1=0.9, β2=0.999, ε=1e-08
- Total training cost: $15
## Performance Evaluation
### Evaluation Scope
Current evaluation metrics are available for two languages:
1. Xhosa (xho)
2. Zulu (zul)
Evaluation was conducted using three benchmark datasets:
### AfriMGSM Results
- Xhosa: 2.0% accuracy
- Zulu: 4.5% accuracy
### AfriMMIU Results
- Xhosa: 29.0% accuracy
- Zulu: 29.0% accuracy
### AfriXNLI Results
- Xhosa: 44.0% accuracy
- Zulu: 43.0% accuracy
## Limitations and Considerations
1. **Evaluation Coverage**
- Only Xhosa and Zulu could be evaluated due to limitations in available benchmarking tools
- Performance on other supported languages remains unknown
2. **Training Data Quality**
- Reliance on machine translation may impact the quality of training data
- Potential artifacts or errors from the translation process could affect model performance
3. **Performance Gaps**
- Notably low performance on AfriMGSM tasks indicates room for improvement
- Further investigation needed to understand performance disparities across tasks
## Technical Requirements
The model requires the following framework versions:
- PyTorch: 2.4.1+cu121
- Transformers: 4.44.2
- PEFT: 0.12.0
- Datasets: 3.0.0
- Tokenizers: 0.19.1
## Usage Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model_name = "meta-llama/llama-8b-south-africa"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Example usage for text generation
text = "Translate to Xhosa: Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
```
## License
This model is released under the Apache 2.0 license. The full license text can be found at https://www.apache.org/licenses/LICENSE-2.0.txt
## Acknowledgments
- Meta AI for the base LLaMA-3.1-8B-Instruct model
- Akash Network for providing computing resources
- Contributors to the Alpaca Cleaned dataset
- The African NLP community for benchmark datasets and evaluation tools |