File size: 3,107 Bytes
9db4394
 
 
b7c077c
 
72d8dc9
 
 
 
 
ff37e0f
72d8dc9
b7c077c
 
 
 
 
 
 
 
 
 
47255fc
b7c077c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47255fc
9df5712
 
 
d23a266
9df5712
 
 
 
 
 
8cb59a0
 
 
0265c12
8cb59a0
 
0265c12
243c3a4
8cb59a0
 
0265c12
8cb59a0
 
 
787df88
 
30390ad
 
787df88
 
30390ad
 
8cb59a0
243c3a4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: openrail
---
Experimental Tagalog loras: safe or accurate outputs not guaranteed (not for production use)!

Note: better/best results with 
* Prompting in Tagalog
* Using format "Human: (prompt)\nAssistant:"

Example:
"Ito ay isang chat log sa pagitan ng AI Assistant na nagta-Tagalog at isang Pilipino. Magsimula ng chat:\nHuman: Hello po?\nAssistant:"

# lt2_08162023
* Fine tuned on a small dataset of 14 items, manually edited
* 1 epoch (barely any noticable results)
* From chat LLaMA-2-7b
* Lora of chat-tagalog v0.1

# lt2_08162023a
* Fine tuned on a small dataset of 14 items, manually edited
* 20 epochs (more observable effects)
* From chat LLaMA-2-7b
* Lora of [chat-tagalog v0.1a](https://huggingface.co/922-Narra/llama-2-7b-chat-tagalog-v0.1a)

# lt2_08162023b
* Fine tuned on a small dataset of 14 items, manually edited
* 10 epochs
* From chat LLaMA-2-7b
* Lora of chat-tagalog v0.1b

# lt2_08162023c
* Fine tuned on a small dataset of 14 items, manually edited
* 50 epochs (overfitted)
* From chat LLaMA-2-7b
* Lora of chat-tagalog v0.1c

# lt2_08162023d
* Fine tuned on a small dataset of 14 items, manually edited
* 30 epochs (v0.1a further trained and cut-off before overfit)
* From chat LLaMA-2-7b
* Lora of [chat-tagalog v0.1d](https://huggingface.co/922-Narra/llama-2-7b-chat-tagalog-v0.1d)

# llama-2-7b-tagalog-v0.2 loras (08/26/2023)
* Fine tuned on dataset of ~10k items (mixed)
* 2/2a/2b fine-tuned for 1/2/3 epochs
* From chat LLaMA-2-7b
* Future attempt planned with cleaner chat/dialogue data

# hopia-3b-v0.1 (08/26/2023)
* Fine tuned on a small dataset of 14 items, manually edited
* 20 epochs
* From Open LLaMA 3b

# llama-2-7b-tagalog-v0.3 loras (09/01/2023)
* Fine tuned on a dataset of ~1k items (Tagalog-focused dataset, based off Tagalog sentences augmented by LLaMA-2-13b base to create a 3-turn dialogue dataset between Human and Assistant)
* 3/3a fine-tuned for 1/2 epochs
* From chat LLaMA-2-7b
* Experiment on partially synthetic data (and observing capability of LLaMA-2 base on generating Tagalog): will be further curating dataset
* Loras for [chat-tagalog v0.3)](https://huggingface.co/922-Narra/llama-2-7b-chat-tagalog-v0.3) and [chat-tagalog v0.3](https://huggingface.co/922-Narra/llama-2-7b-chat-tagalog-v0.3a)

# llama-2-7b-tagalog-v0.3WC2 (09/01/2023)
* Fine tuned on experimental dataset of ~6k items (Tagalog-focused dataset, based off Tagalog sentences and Wiki entries augmented by LLaMA-2-13b to create a dialogue-QnA dataset between Human and Assistant)
* 1 epoch
* From chat LLaMA-2-7b

# llama-2-13b-tagalog-v0.3 loras (09/01-02/2023)
* Fine tuned on experimental datasets of ~1k items (Tagalog-focused dataset, based off Tagalog sentences augmented by LLaMA-2-13b base to create a 3-turn dialogue dataset between Human and Assistant)
* 3 fine-tuned for 1 epoch, rank = 16, lora alpha = 32
* 3a with rank = 8
* 3b for 2 epochs
* 3c for 1 epoch, lr = 1e-4, warmup steps = 0.1
* 3d with lr = 2e-4, rank = 32, lora alpha = 64
* 3e for 2 epochs
* From LLaMA-2-13b
* Trying LLaMA-2-13b chat/other base and curated dataset for next attempts