File size: 619 Bytes
be5a030 5054926 be5a030 5054926 |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
---
license: apache-2.0
language:
- su
- en
- id
---
This is a fine tune of Mistral-7B-v0.1 on a very limited range of Sundanese language datasets that are available.
This is a learning project for me where I just wanted to see if it's possible to teach a model a new language that it does not inherently support with just a QLora fine tune. It won't only speak sundanese but it just adds sundanese capability to the model that is to me impressive for the limited data and short amount of training time.
Datasets used:
Sundanese sources from this repo. Cleaned and deduped myself.
https://github.com/w11wo/nlp-datasets |