File size: 1,448 Bytes
6f55672 fdf735d 8236d32 fdf735d b8e6174 fdf735d 44b5268 b8e6174 fdf735d 74f14c3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
license: mit
---
### SuperHOT Prototype 2 w/ 16K Context
This is a second prototype of SuperHOT, a NSFW focused LoRA, this time with 16K context and no RLHF, using the same technique described in [the github blog](https://kaiokendev.github.io/til#extending-context-to-8k).
Tests have shown that the model does indeed leverage the extended context at 8K, so naturally, let's try going even further.
#### Looking for Merged & Quantized Models?
- 13B 16K GGML: [tmpupload/superhot-13b-16k-no-rlhf-test-GGML](https://huggingface.co/tmpupload/superhot-13b-16k-no-rlhf-test-GGML)
- 13B 16K CUDA (no groupsize): [tmpupload/superhot-13b-16k-no-rlhf-test-GPTQ](https://huggingface.co/tmpupload/superhot-13b-16k-no-rlhf-test-GPTQ)
#### Using the monkey-patch?
You will need to **use either the monkeypatch** or, if you are already using the monkeypatch, **change the scaling factor to 0.125 and the maximum sequence length to 16384**
#### Using Oobabooga with Exllama?
- `python server.py --max_seq_len 16384 --compress_pos_emb 8 --loader exllama_hf`
I trained the LoRA with the following configuration:
- 1200 samples (~400 samples over 2048 sequence length)
- learning rate of 3e-4
- 3 epochs
- The exported modules are:
- q_proj
- k_proj
- v_proj
- o_proj
- no bias
- Rank = 4
- Alpha = 8
- no dropout
- weight decay of 0.1
- AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5
- Trained on 4-bit base model
- Cutoff of 4096 |