francislabounty commited on
Commit
2847e85
1 Parent(s): 357aeab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -66,4 +66,11 @@ inputs = tokenizer(prompt, return_tensors="pt")
66
  inputs = inputs.to(model.device)
67
  pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
68
  print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
69
- ```
 
 
 
 
 
 
 
 
66
  inputs = inputs.to(model.device)
67
  pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
68
  print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
69
+ ```
70
+
71
+ ## Other Information
72
+ Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731)
73
+ [Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE)
74
+ [Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE)
75
+
76
+ If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support