|
--- |
|
datasets: |
|
- cognitivecomputations/WizardLM_evol_instruct_V2_196k_unfiltered_merged_split |
|
- cognitivecomputations/Code-74k-ShareGPT-Vicuna |
|
- jondurbin/airoboros-3.1 |
|
- Norquinal/claude_multiround_chat_30k |
|
- Doctor-Shotgun/no-robots-sharegpt |
|
language: |
|
- en |
|
tags: |
|
- llama |
|
- llama 2 |
|
- smol_llama |
|
--- |
|
# smol_llama-220M-GQA-32k-theta-sft |
|
|
|
Experimental model meant to serve as a long-context speculative decoding model. |
|
|
|
Created using [Doctor-Shotgun/smol_llama-220M-GQA-32k-theta](https://huggingface.co/Doctor-Shotgun/smol_llama-220M-GQA-32k-theta) and finetuning at 32768 context length on several instruction datasets. |
|
|
|
This variant uses the rope theta (rope frequency base) method for context extension. |
|
|
|
The trained instruction format is Alpaca: |
|
``` |
|
### Instruction: |
|
{{instruction}} |
|
|
|
### Input: |
|
{{user input}} |
|
|
|
### Response: |
|
{{model response}} |
|
``` |