chad-brouze commited on
Commit
71afc1a
·
verified ·
1 Parent(s): 271b4d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -64
README.md CHANGED
@@ -1,70 +1,109 @@
1
- ---
2
  base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
3
- datasets:
4
- - generator
5
  library_name: peft
6
  license: llama3.1
7
- tags:
8
- - trl
9
- - sft
10
- - generated_from_trainer
11
  model-index:
12
- - name: llama-8b-south-africa
13
- results: []
14
- ---
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
-
19
- # llama-8b-south-africa
20
-
21
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the generator dataset.
22
- It achieves the following results on the evaluation set:
23
- - Loss: 1.0571
24
-
25
- ## Model description
26
-
27
- More information needed
28
-
29
- ## Intended uses & limitations
30
-
31
- More information needed
32
-
33
- ## Training and evaluation data
34
-
35
- More information needed
36
-
37
- ## Training procedure
38
-
39
- ### Training hyperparameters
40
-
41
- The following hyperparameters were used during training:
42
- - learning_rate: 0.0002
43
- - train_batch_size: 4
44
- - eval_batch_size: 8
45
- - seed: 42
46
- - distributed_type: multi-GPU
47
- - gradient_accumulation_steps: 2
48
- - total_train_batch_size: 8
49
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
- - lr_scheduler_type: cosine
51
- - lr_scheduler_warmup_ratio: 0.1
52
- - num_epochs: 1
53
-
54
- ### Training results
55
-
56
- | Training Loss | Epoch | Step | Validation Loss |
57
- |:-------------:|:------:|:----:|:---------------:|
58
- | 1.0959 | 0.9999 | 5596 | 1.0571 |
59
-
60
-
61
- ### Framework versions
62
-
63
- - PEFT 0.12.0
64
- - Transformers 4.44.2
65
- - Pytorch 2.4.1+cu121
66
- - Datasets 3.0.0
67
- - Tokenizers 0.19.1
68
 
69
- ### Terms of Use
70
- This model is governed by a Apache 2.0 License.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
 
 
2
  library_name: peft
3
  license: llama3.1
 
 
 
 
4
  model-index:
5
+ - name: llama-8b-south-africa
 
 
6
 
7
+ model_description:
8
+ name: llama-8b-south-africa
9
+ description: |
10
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the generator dataset.
11
+ [Alapa Cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) translated into Xhose, Zulu, Tswana, Northern Sotho and Afrikaans using machine translation.
12
+
13
+ details: |
14
+ The model could only be evaluated in Xhosa and Zulu due to Iroko language availability. Its aim is to show cross-lingual transfer can be achieved at a low cost. Translation cost roughly $370 per language and training cost roughly $15 using an Akash Compute Network GPU.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
+ intended_use: This model is intended to be used for research.
17
+
18
+ evaluation_results:
19
+ - task:
20
+ type: text-generation
21
+ name: African Language Evaluation
22
+ dataset:
23
+ name: afrimgsm_direct_xho
24
+ type: text-classification
25
+ split: test
26
+ metrics:
27
+ - name: Accuracy
28
+ type: accuracy
29
+ value: 0.02
30
+ - name: Dataset
31
+ type: dataset
32
+ value: MGS-Xho Direct
33
+
34
+ - task:
35
+ type: text-generation
36
+ name: African Language Evaluation
37
+ dataset:
38
+ name: afrimmlu_direct_xho
39
+ type: text-classification
40
+ split: test
41
+ metrics:
42
+ - name: Accuracy
43
+ type: accuracy
44
+ value: 0.29
45
+ - name: Dataset
46
+ type: dataset
47
+ value: MMLU-Xho Direct
48
+
49
+ - task:
50
+ type: text-generation
51
+ name: African Language Evaluation
52
+ dataset:
53
+ name: afrixnli_en_direct_xho
54
+ type: text-classification
55
+ split: test
56
+ metrics:
57
+ - name: Accuracy
58
+ type: accuracy
59
+ value: 0.44
60
+ - name: Dataset
61
+ type: dataset
62
+ value: XNLI-Xho Direct
63
+
64
+ - task:
65
+ type: text-generation
66
+ name: African Language Evaluation
67
+ dataset:
68
+ name: afrimgsm_direct_zul
69
+ type: text-classification
70
+ split: test
71
+ metrics:
72
+ - name: Accuracy
73
+ type: accuracy
74
+ value: 0.045
75
+ - name: Dataset
76
+ type: dataset
77
+ value: MGS-Zul Direct
78
+
79
+ - task:
80
+ type: text-generation
81
+ name: African Language Evaluation
82
+ dataset:
83
+ name: afrimmlu_direct_zul
84
+ type: text-classification
85
+ split: test
86
+ metrics:
87
+ - name: Accuracy
88
+ type: accuracy
89
+ value: 0.29
90
+ - name: Dataset
91
+ type: dataset
92
+ value: MMLU-Zul Direct
93
+
94
+ - task:
95
+ type: text-generation
96
+ name: African Language Evaluation
97
+ dataset:
98
+ name: afrixnli_en_direct_zul
99
+ type: text-classification
100
+ split: test
101
+ metrics:
102
+ - name: Accuracy
103
+ type: accuracy
104
+ value: 0.43
105
+ - name: Dataset
106
+ type: dataset
107
+ value: XNLI-Zul Direct
108
+
109
+ terms_of_use: This model is governed by a Apache 2.0 License.