File size: 3,169 Bytes
9db4394
 
 
b7c077c
 
72d8dc9
 
 
 
 
fa97316
72d8dc9
b7c077c
 
 
 
 
 
 
 
 
 
47255fc
b7c077c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47255fc
9df5712
 
 
d23a266
9df5712
 
 
 
 
 
8cb59a0
 
 
a10c8d5
8cb59a0
 
 
a10c8d5
8cb59a0
 
 
 
 
a10c8d5
8cb59a0
 
a10c8d5
8cb59a0
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: openrail
---
Experimental Tagalog loras: safe or accurate outputs not guaranteed (not for production use)!

Note: better/best results with 
* Prompting in Tagalog
* Using format "Human: (prompt)\nAssistant:"

Example:
"Ito ay isang chat log ni Assistant, isang Tagalog na AI, at ni Human, isang taong Pilipino. Nakaintindi at nagsasalita ni Assistant ng Tagalog lang. Magsimula ng usapan:\nHuman: Hello po?\nAssistant: Kumusta ka naman?"

# lt2_08162023
* Fine tuned on a small dataset of 14 items, manually edited
* 1 epoch (barely any noticable results)
* From chat LLaMA-2-7b
* Lora of chat-tagalog v0.1

# lt2_08162023a
* Fine tuned on a small dataset of 14 items, manually edited
* 20 epochs (more observable effects)
* From chat LLaMA-2-7b
* Lora of [chat-tagalog v0.1a](https://huggingface.co/922-Narra/llama-2-7b-chat-tagalog-v0.1a)

# lt2_08162023b
* Fine tuned on a small dataset of 14 items, manually edited
* 10 epochs
* From chat LLaMA-2-7b
* Lora of chat-tagalog v0.1b

# lt2_08162023c
* Fine tuned on a small dataset of 14 items, manually edited
* 50 epochs (overfitted)
* From chat LLaMA-2-7b
* Lora of chat-tagalog v0.1c

# lt2_08162023d
* Fine tuned on a small dataset of 14 items, manually edited
* 30 epochs (v0.1a further trained and cut-off before overfit)
* From chat LLaMA-2-7b
* Lora of [chat-tagalog v0.1d](https://huggingface.co/922-Narra/llama-2-7b-chat-tagalog-v0.1d)

# llama-2-7b-tagalog-v0.2 loras (08/26/2023)
* Fine tuned on dataset of ~10k items (mixed)
* 2/2a/2b fine-tuned for 1/2/3 epochs
* From chat LLaMA-2-7b
* Future attempt planned with cleaner chat/dialogue data

# hopia-3b-v0.1 (08/26/2023)
* Fine tuned on a small dataset of 14 items, manually edited
* 20 epochs
* From Open LLaMA 3b

# llama-2-7b-tagalog-v0.3 loras (09/01/2023)
* Fine tuned on a dataset of ~1k items (Tagalog/Taglish dataset, based off Tagalog sentences augmented by LLaMA-2-13b base to create a 3-turn dialogue dataset between Human and Assistant)
* 3/3a fine-tuned for 1/2 epochs
* From chat LLaMA-2-7b
* v0.3 seems to be balanced between Tagalog translation and leveraging pretrained data, more than v0.3a (which may speak more Tagalog but be less accurate or helpful); will be further curating dataset
* Lora of [chat-tagalog v0.3 (recommended)](https://huggingface.co/922-Narra/llama-2-7b-chat-tagalog-v0.3) and [chat-tagalog v0.3](https://huggingface.co/922-Narra/llama-2-7b-chat-tagalog-v0.3a)

# llama-2-7b-tagalog-v0.3WC2 (09/01/2023)
* Fine tuned on experimental dataset of ~6k items (Tagalog/Taglish dataset, based off Tagalog sentences and Wiki entries augmented by LLaMA-2-13b to create a dialogue-QnA dataset between Human and Assistant)
* 1 epoch
* From chat LLaMA-2-7b
* Tends to fall into repetition loop

# llama-2-13b-tagalog-v0.3 loras (09/01/2023)
* Fine tuned on dataset of ~1k items (Tagalog/Taglish dataset, based off Tagalog sentences augmented by LLaMA-2-13b base to create a 3-turn dialogue dataset between Human and Assistant)
* 3/3a fine-tuned for 1 epoch, rank = 16/8
* From LLaMA-2-13b
* Less helpful results than 7b (suspecting base and dataset, trying LLaMA-2-13b chat and curated dataset for next attempts)