File size: 619 Bytes

---
license: apache-2.0
language:
- su
- en
- id
---
This is a fine tune of Mistral-7B-v0.1 on a very limited range of Sundanese language datasets that are available.
This is a learning project for me where I just wanted to see if it's possible to teach a model a new language that it does not inherently support with just a QLora fine tune. It won't only speak sundanese but it just adds sundanese capability to the model that is to me impressive for the limited data and short amount of training time.

Datasets used:
Sundanese sources from this repo. Cleaned and deduped myself.
https://github.com/w11wo/nlp-datasets