lmzheng commited on
Commit
2880060
·
verified ·
1 Parent(s): 1b15dde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -3
README.md CHANGED
@@ -5,22 +5,27 @@
5
  ```
6
  hf download xai-org/grok-2 --local-dir /local/grok-2
7
  ```
 
8
  You might encounter some errors during the download. Please retry until the download is successful.
9
  If the download succeeds, the folder should contain **42 files** and be approximately 500 GB.
10
 
11
  - Launch a server.
12
 
13
- Install the latest SGLang inference engine by following the instructions at https://docs.sglang.ai/get_started/install.html.
14
 
15
- Then, you can launch an inference server. This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory).
16
  ```
17
  python3 -m sglang.launch_server --model /local/grok-2 --tokenizer-path /local/grok-2/tokenizer.tok.json --tp 8 --quantization fp8 --attention-backend triton
18
  ```
19
 
20
  - Send a request.
21
 
 
 
22
  ```
23
  python3 -m sglang.test.send_one --prompt "Human: What is your name? <|separator|>\n\nAssistant:"
24
  ```
25
 
26
- Learn more about other ways of sending requests [here](https://docs.sglang.ai/basic_usage/send_request.html).
 
 
 
5
  ```
6
  hf download xai-org/grok-2 --local-dir /local/grok-2
7
  ```
8
+
9
  You might encounter some errors during the download. Please retry until the download is successful.
10
  If the download succeeds, the folder should contain **42 files** and be approximately 500 GB.
11
 
12
  - Launch a server.
13
 
14
+ Install the latest SGLang inference engine (>= v0.5.1) from https://github.com/sgl-project/sglang/
15
 
16
+ Use the command below to launch an inference server. This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory).
17
  ```
18
  python3 -m sglang.launch_server --model /local/grok-2 --tokenizer-path /local/grok-2/tokenizer.tok.json --tp 8 --quantization fp8 --attention-backend triton
19
  ```
20
 
21
  - Send a request.
22
 
23
+ This is a post-trained model, so please use the correct [chat template](https://github.com/sgl-project/sglang/blob/97a38ee85ba62e268bde6388f1bf8edfe2ca9d76/python/sglang/srt/tokenizer/tiktoken_tokenizer.py#L106).
24
+
25
  ```
26
  python3 -m sglang.test.send_one --prompt "Human: What is your name? <|separator|>\n\nAssistant:"
27
  ```
28
 
29
+ You should be able to see the model output its name, Grok.
30
+
31
+ Learn more about other ways to send requests [here](https://docs.sglang.ai/basic_usage/send_request.html).