|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- allenai/tulu-3-sft-personas-instruction-following |
|
- PocketDoc/Dans-Prosemaxx-Gutenberg |
|
- ToastyPigeon/SpringDragon-Instruct |
|
- allura-org/fujin-cleaned-stage-2 |
|
base_model: |
|
- internlm/internlm3-8b-instruct |
|
--- |
|
|
|
# Ruby-Music-8B |
|
|
|
*Note that this model is based on InternLM3, **not** LLaMA 3.* |
|
|
|
A roleplaying/creative-writing fine tune of [internlm/internlm3-8b-instruct](https://huggingface.co/internlm/internlm3-8b-instruct), provided as an alternative to L3 8B for folks with 8GB VRAM. |
|
|
|
This was trained on a mix of private instruct (\~1k samples) and roleplaying (\~2.5k human and \~1k synthetic samples), along with the following public datasets: |
|
- allenai/tulu-3-sft-personas-instruction-following (\~500 samples) |
|
- PocketDoc/Dans-Prosemaxx-Gutenberg (all samples) |
|
- ToastyPigeon/SpringDragon-Instruct (\~500 samples) |
|
- allura-org/fujin-cleaned-stage-2 (\~500 samples) |
|
|
|
The instruct format is standard ChatML: |
|
``` |
|
<|im_start|>system |
|
{system prompt}<|im_end|> |
|
<|im_start|>user |
|
{user message}<|im_end|> |
|
<|im_start|>assistant |
|
{assistant response}<|im_end|> |
|
``` |
|
|
|
## Recommended sampler settings: |
|
- temp 1 |
|
- smoothing factor 0.5, smoothing curve 1 |
|
- DRY 0.5/1.75/5/1024 |
|
|
|
There may be better sampler settings, but this at least has proven stable in my testing. InternLM3 requires a high amount of tail filtering (high min-p, top-a, or something similar) to avoid making strange typos and spelling mistakes. *Note: this might be a current issue with llama.cpp and the GGUF versions I tested.* |
|
|
|
## Notes: |
|
I noticed this model has trouble outputting the EOS token sometimes (despite confirming that `<|im_end|>` appears at the end of every turn in the training data). This can cause it to ramble at the end of a message instead of ending its turn. |
|
|
|
You can either cut the end out of the messages until it picks up the right response length, or use logit bias. I've had success getting right-sized turns setting logit bias for `<|im_end|>` to 2. |