This model is sparsetral-16x7B-v2 further tuned utilizing SPIN on OpenHermes-2.5 mixed with traditional DPO samples. This is iteration_0, plan to keep making iterations until improvements stop.
Training
- 8x A6000s
- Base model is sparsetral-16x7B-v2
- Forked version of unsloth for efficient training
- Sequence Length: 4096
- Effective batch size: 64
- Learning Rate: 5e-7 with linear decay (0.1 warmup ratio)
- Epochs: 2
- 50k samples (~15k traditional dpo samples, rest SPIN)
- QLoRA:
- 256 r and 256 alpha
target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "adapter_down", "adapter_up", ]
Prompt Format
<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval()
system_str = "<|im_start|>system\n{message}<|im_end|>\n"
user_str = "<|im_start|>user\n{message}<|im_end|>\n"
assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"
def construct_prompt(messages):
prompt = ""
for message in messages:
if message["from"] in ["human", "user"]:
prompt += user_str.format(
message=message["value"]
)
elif message["from"] in ["gpt", "assistant"]:
prompt += assistant_str.format(
message=message["value"]
)
elif message["from"] in ["system", "instruction"]:
prompt += system_str.format(
message=message["value"]
)
else:
raise ValueError(
f"Unknown message type: {message['from']}"
)
return prompt + "<|im_start|>assistant\n"
system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
user = "Are you sentient?"
messages = [
{"from": "system", "value": system},
{"from": "user", "value": user},
]
prompt = construct_prompt(messages)
inputs = tokenizer(prompt, return_tensors="pt")
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
Other Information
Paper reference: Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Forked repo with mistral support (sparsetral)
If you are interested in faster inferencing, check out our fork of vLLM that adds sparsetral support
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.