Safetensors
qwen2
dsarfati commited on
Commit
62016c4
·
verified ·
1 Parent(s): b9ac124

Added vllm run commands

Browse files

Added an example of using vllm in a basic configuration as well as the advanced configuration with ngram spec dec.

Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -22,6 +22,24 @@ The model has been fine-tuned using the [zeta dataset](https://huggingface.co/da
22
  The dataset used for training is available at:
23
  [zed-industries/zeta](https://huggingface.co/datasets/zed-industries/zeta)
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## Learn More
26
 
27
  For more insights about the model and its integration in Zed, check out the official blog post:
 
22
  The dataset used for training is available at:
23
  [zed-industries/zeta](https://huggingface.co/datasets/zed-industries/zeta)
24
 
25
+ ## Running Zeta
26
+
27
+ ### vLLM - Simple
28
+
29
+ `vllm serve zed-industries/zeta --served-model-name zeta`
30
+
31
+ ### vLLM - Advanced
32
+
33
+ - [Quantization](https://docs.vllm.ai/en/latest/features/quantization/fp8.html#) vLLM supports FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs such as Nvidia H100 and AMD MI300x.
34
+
35
+ - [NGram Speculative Decoding](https://docs.vllm.ai/en/latest/features/spec_decode.html#speculating-by-matching-n-grams-in-the-prompt) configures vLLM to use
36
+ speculative decoding where proposals are generated by matching n-grams in the prompt. This is a great fit for edit predictions since many of the tokens are already present in the prompt and
37
+ the model is only needed to generate changes to the code file.
38
+
39
+
40
+ `vllm serve zed-industries/zeta --served-model-name zeta --enable-prefix-caching --enable-chunked-prefill --quantization="fp8" --speculative-model [ngram] --ngram-prompt-lookup-max 4 --ngram-prompt-lookup-min 2 --num-speculative-tokens 8`
41
+
42
+
43
  ## Learn More
44
 
45
  For more insights about the model and its integration in Zed, check out the official blog post: