metadata

license: openrail

Experimental Tagalog loras: safe or accurate outputs not guaranteed (not for production use)!

Note: better/best results with

Prompting in Tagalog
Using format "Human: (prompt)\nAssistant:"

Example: "Ito ay isang chat log ni Assistant, isang Tagalog na AI, at ni Human, isang taong Pilipino. Nakaintindi at nagsasalita ni Assistant ng Tagalog lang. Magsimula ng usapan:\nHuman: Hello po?\nAssistant: Kumusta ka naman?"

lt2_08162023

Fine tuned on a small dataset of 14 items, manually edited
1 epoch (barely any noticable results)
From chat LLaMA-2-7b
Lora of chat-tagalog v0.1

lt2_08162023a

Fine tuned on a small dataset of 14 items, manually edited
20 epochs (more observable effects)
From chat LLaMA-2-7b
Lora of chat-tagalog v0.1a

lt2_08162023b

Fine tuned on a small dataset of 14 items, manually edited
10 epochs
From chat LLaMA-2-7b
Lora of chat-tagalog v0.1b

lt2_08162023c

Fine tuned on a small dataset of 14 items, manually edited
50 epochs (overfitted)
From chat LLaMA-2-7b
Lora of chat-tagalog v0.1c

lt2_08162023d

Fine tuned on a small dataset of 14 items, manually edited
30 epochs (v0.1a further trained and cut-off before overfit)
From chat LLaMA-2-7b
Lora of chat-tagalog v0.1d

llama-2-7b-tagalog-v0.2 loras (08/26/2023)

Fine tuned on dataset of ~10k items (mixed)
2/2a/2b fine-tuned for 1/2/3 epochs
From chat LLaMA-2-7b
Future attempt planned with cleaner chat/dialogue data

hopia-3b-v0.1 (08/26/2023)

Fine tuned on a small dataset of 14 items, manually edited
20 epochs
From Open LLaMA 3b

llama-2-7b-tagalog-v0.3 loras (09/01/2023)

Fine tuned on a dataset of ~1k items (Tagalog/Taglish dataset, based off Tagalog sentences augmented by LLaMA-2-13b base to create a 3-turn dialogue dataset between Human and Assistant)
3/3a fine-tuned for 1/2 epochs
From chat LLaMA-2-7b
Experiment on partially synthetic data (and observing capability of LLaMA-2 base on generating Tagalog): will be further curating dataset for better attempts
Loras for chat-tagalog v0.3) and chat-tagalog v0.3

llama-2-7b-tagalog-v0.3WC2 (09/01/2023)

Fine tuned on experimental dataset of ~6k items (Tagalog/Taglish dataset, based off Tagalog sentences and Wiki entries augmented by LLaMA-2-13b to create a dialogue-QnA dataset between Human and Assistant)
1 epoch
From chat LLaMA-2-7b

llama-2-13b-tagalog-v0.3 loras (09/01/2023)

Fine tuned on dataset of ~1k items (Tagalog/Taglish dataset, based off Tagalog sentences augmented by LLaMA-2-13b base to create a 3-turn dialogue dataset between Human and Assistant)
3/3a fine-tuned for 1 epoch, rank = 16/8
From LLaMA-2-13b
Trying LLaMA-2-13b chat/other base and curated dataset for next attempts