922-Narra
/

llama-2-7b-chat-tagalog-v0.3

Text Generation

text-generation-inference

Model card Files Files and versions

922CA commited on Sep 2, 2023

Commit

fe0f60a

·

1 Parent(s): 09c1d62

Update README.md

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -2,10 +2,9 @@
 license: llama2
 ---
 # Taga-llama-v0.3:
-* Test Tagalog model, particularly with using a partially synthetic dataset (observing the Tagalog capability of LLaMA-2)
-* Fine tuned on dataset of ~1k items (Tagalog/Taglish dataset, based off Tagalog sentences augmented by LLaMA-2-13b base to create a 3-turn dialogue dataset between Human and Assistant); dataset to be further refined
-* Trained on llama 7b chat
-* May still switch to Taglish or English: please see usage
 * [QLoras (hf and GGML)](https://huggingface.co/922-Narra/tagalog-lm-lora-tests/tree/main/llama-2-7b-chat-tagalog-0.3)
 ### USAGE
@@ -26,8 +25,10 @@ Use "Human" and "Assistant" and prompt with Tagalog. Example:
 * grad steps: 4
 ### WARNINGS AND DISCLAIMERS
-Note that aside from formatting and other minor edits, dataset used is mostly as is generated by LM. As such, while this version may be better at coherency or chatting than previous Tagalog ones, conversations may still switch between languages or easily derail.
-There is a chance that the model may switch back to English (albeit still understand Tagalog inputs) as conversations grow longer, resulting in English-Tagalog conversations: this may be because of the limited 3-turn nature of the dataset.
 Finally, this model is not guaranteed to output aligned or safe outputs nor is it meant for production use - use at your own risk!

 license: llama2
 ---
 # Taga-llama-v0.3:
+* Test Tagalog model
+* Fine tuned on an experimental Tagalog-focused dataset of ~1k items (based off Tagalog sentences augmented by LLaMA-2-13b base to create a mostly 3-turn dialogue dataset between Human and Assistant)
+* Base: LLaMA-2 7b chat
 * [QLoras (hf and GGML)](https://huggingface.co/922-Narra/tagalog-lm-lora-tests/tree/main/llama-2-7b-chat-tagalog-0.3)
 ### USAGE
 * grad steps: 4
 ### WARNINGS AND DISCLAIMERS
+Note that aside from formatting and other minor edits, dataset used is mostly as is augmented by LM. As such, while this version may be better at coherency or chatting than previous Tagalog ones, conversations may still switch between languages or easily derail.
+There is a chance that the model may switch back to English (albeit still understand Tagalog inputs) as conversations grow longer, resulting in English-Tagalog conversations: this may be because of the limited 3-turn nature of the dataset. Additionally, Taglish occuring in the dataset or any use of English may sometimes make the model more likely to output Taglish or even English responses.
+Note that we use a partially synthetic dataset due to the lack of readily available Tagalog dialogue datasets, but take this as an opportunity to observe the Tagalog capability of LLaMA-2. However, we plan to further curate the dataset (and fine tune later model versions on this) and release a final cleaned version.
 Finally, this model is not guaranteed to output aligned or safe outputs nor is it meant for production use - use at your own risk!