lmzheng commited on
Commit
1b15dde
·
verified ·
1 Parent(s): 04deaba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -12
README.md CHANGED
@@ -1,18 +1,26 @@
1
  ## Usage: Serving with SGLang
2
 
3
- Download the weights. You can replace `/local/grok-2` with any other folder name you prefer.
4
 
5
- ```
6
- hf download xai-org/grok-2 --local-dir /local/grok-2
7
- ```
 
 
8
 
9
- Launch a server.
10
- ```
11
- python3 -m sglang.launch_server --model /local/grok-2 --tokenizer-path /local/grok-2/tokenizer.tok.json --tp 8 --quantization fp8 --attention-backend triton
12
- ```
13
 
14
- Send a request.
15
 
16
- ```
17
- python3 -m sglang.test.send_one --prompt "Human: What is your name? <|separator|>\n\nAssistant:"
18
- ```
 
 
 
 
 
 
 
 
 
 
1
  ## Usage: Serving with SGLang
2
 
3
+ - Download the weights. You can replace `/local/grok-2` with any other folder name you prefer.
4
 
5
+ ```
6
+ hf download xai-org/grok-2 --local-dir /local/grok-2
7
+ ```
8
+ You might encounter some errors during the download. Please retry until the download is successful.
9
+ If the download succeeds, the folder should contain **42 files** and be approximately 500 GB.
10
 
11
+ - Launch a server.
 
 
 
12
 
13
+ Install the latest SGLang inference engine by following the instructions at https://docs.sglang.ai/get_started/install.html.
14
 
15
+ Then, you can launch an inference server. This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory).
16
+ ```
17
+ python3 -m sglang.launch_server --model /local/grok-2 --tokenizer-path /local/grok-2/tokenizer.tok.json --tp 8 --quantization fp8 --attention-backend triton
18
+ ```
19
+
20
+ - Send a request.
21
+
22
+ ```
23
+ python3 -m sglang.test.send_one --prompt "Human: What is your name? <|separator|>\n\nAssistant:"
24
+ ```
25
+
26
+ Learn more about other ways of sending requests [here](https://docs.sglang.ai/basic_usage/send_request.html).