File size: 619 Bytes
be5a030
 
5054926
 
 
 
be5a030
5054926
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
---
license: apache-2.0
language:
- su
- en
- id
---
This is a fine tune of Mistral-7B-v0.1 on a very limited range of Sundanese language datasets that are available.
This is a learning project for me where I just wanted to see if it's possible to teach a model a new language that it does not inherently support with just a QLora fine tune. It won't only speak sundanese but it just adds sundanese capability to the model that is to me impressive for the limited data and short amount of training time.

Datasets used:
Sundanese sources from this repo. Cleaned and deduped myself.
https://github.com/w11wo/nlp-datasets