|
--- |
|
license: llama2 |
|
language: |
|
- hu |
|
- en |
|
tags: |
|
- puli |
|
- llama |
|
- finetuned |
|
base_model: ariel-ml/PULI-LlumiX-32K-instruct-f16-0.2 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# PULI LlumiX 32K instruct (6.74B billion parameter) |
|
|
|
<img src="logo.webp" width="340" style="margin-left:'auto' margin-right:'auto' display:'block'"/> |
|
|
|
Intruct finetuned version of NYTK/PULI-LlumiX-32K. |
|
|
|
## Provided files |
|
| Quant method | Bits | Use case | |
|
| ---- | ---- | ---- | |
|
| Q3_K_M | 3 | very small, high quality loss | |
|
| Q4_K_S | 4 | small, greater quality loss | |
|
| Q4_K_M | 4 | medium, balanced quality - recommended | |
|
| Q5_K_S | 5 | large, low quality loss - recommended | |
|
| Q5_K_M | 5 | large, very low quality loss - recommended | |
|
| Q6_K | 6 | very large, extremely low quality loss | |
|
| Q8_0 | 8 | very large, extremely low quality loss - not recommended | |
|
|
|
## Training platform |
|
[Runpod](https://runpod.ui) RTX 4090 GPU |
|
|
|
## Hyper parameters |
|
|
|
- Epoch: 3 |
|
- LoRA rank (r): 16 |
|
- LoRA alpha: 16 |
|
- Lr: 2e-4 |
|
- Lr scheduler: cosine |
|
- Optimizer: adamw_8bit |
|
- Weight decay: 0.01 |
|
|
|
## Dataset |
|
|
|
boapps/szurkemarha |
|
|
|
Only Hungarian instructions were selected: ~53000 prompts. |
|
|
|
## Prompt format: ChatML |
|
|
|
``` |
|
<|im_start|>system |
|
Egy segítőkész mesterséges intelligencia asszisztens vagy. Válaszold meg a kérdést legjobb tudásod szerint!<|im_end|> |
|
<|im_start|>user |
|
Ki a legerősebb szuperhős?<|im_end|> |
|
<|im_start|>assistant |
|
A legerősebb szuperhős a Marvel univerzumában Hulk.<|im_end|> |
|
``` |
|
|
|
## Base model |
|
|
|
- Trained with OpenChatKit [github](https://github.com/togethercomputer/OpenChatKit) |
|
- The [LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) model were continuously pretrained on Hungarian dataset |
|
- The model has been extended to a context length of 32K with position interpolation |
|
- Checkpoint: 100 000 steps |
|
|
|
## Base model dataset for continued pretraining |
|
|
|
- Hungarian: 7.9 billion words, documents (763K) that exceed 5000 words in length |
|
- English: Long Context QA (2 billion words), BookSum (78 million words) |
|
|
|
## Limitations |
|
|
|
- max_seq_length = 32 768 |
|
- float16 |
|
- vocab size: 32 000 |