jerryzh168 commited on
Commit
d7d66d0
·
verified ·
1 Parent(s): 8b3ab58

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -14
README.md CHANGED
@@ -101,11 +101,6 @@ print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_
101
  # Model Quality
102
  We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
103
 
104
- ## Installing the nightly version to get most recent updates
105
- ```
106
- pip install git+https://github.com/EleutherAI/lm-evaluation-harness
107
- ```
108
-
109
  ## baseline
110
  ```
111
  lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 8
@@ -116,9 +111,6 @@ lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks
116
  lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq --tasks hellaswag --device cuda:0 --batch_size 8
117
  ```
118
 
119
- `TODO: more complete eval results`
120
-
121
-
122
  | Benchmark | | |
123
  |----------------------------------|----------------|---------------------|
124
  | | Phi-4 mini-Ins | phi4-mini-int4wo |
@@ -155,12 +147,6 @@ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq
155
 
156
  Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
157
 
158
- ## Download vllm source code and install vllm
159
- ```
160
- git clone [email protected]:vllm-project/vllm.git
161
- VLLM_USE_PRECOMPILED=1 pip install .
162
- ```
163
-
164
  ## Download dataset
165
  Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
166
 
 
101
  # Model Quality
102
  We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
103
 
 
 
 
 
 
104
  ## baseline
105
  ```
106
  lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 8
 
111
  lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq --tasks hellaswag --device cuda:0 --batch_size 8
112
  ```
113
 
 
 
 
114
  | Benchmark | | |
115
  |----------------------------------|----------------|---------------------|
116
  | | Phi-4 mini-Ins | phi4-mini-int4wo |
 
147
 
148
  Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
149
 
 
 
 
 
 
 
150
  ## Download dataset
151
  Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
152