Doctor-Shotgun
/

smol_llama-220M-GQA-32k-theta-sft

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

smol_llama-220M-GQA-32k-theta-sft / README.md

Doctor-Shotgun's picture

Update README.md

129bcc2 about 1 year ago

|

history blame contribute delete

849 Bytes

	---
	datasets:
	- cognitivecomputations/WizardLM_evol_instruct_V2_196k_unfiltered_merged_split
	- cognitivecomputations/Code-74k-ShareGPT-Vicuna
	- jondurbin/airoboros-3.1
	- Norquinal/claude_multiround_chat_30k
	- Doctor-Shotgun/no-robots-sharegpt
	language:
	- en
	tags:
	- llama
	- llama 2
	- smol_llama
	---
	# smol_llama-220M-GQA-32k-theta-sft

	Experimental model meant to serve as a long-context speculative decoding model.

	Created using [Doctor-Shotgun/smol_llama-220M-GQA-32k-theta](https://huggingface.co/Doctor-Shotgun/smol_llama-220M-GQA-32k-theta) and finetuning at 32768 context length on several instruction datasets.

	This variant uses the rope theta (rope frequency base) method for context extension.

	The trained instruction format is Alpaca:
	```
	### Instruction:
	{{instruction}}

	### Input:
	{{user input}}

	### Response:
	{{model response}}
	```