jerryzh168 commited on
Commit
e38437b
·
verified ·
1 Parent(s): 8d76f91

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -4
README.md CHANGED
@@ -62,21 +62,27 @@ print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_
62
 
63
  # Model Quality
64
  We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
65
-
66
  ```
 
67
  # Installing the nightly version to get most recent updates
 
68
  pip install git+https://github.com/EleutherAI/lm-evaluation-harness
 
69
 
70
  # baseline
 
71
  lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 8
 
72
 
73
  # int4wo-hqq
 
74
  lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-int4wo-hqq --tasks hellaswag --device cuda:0 --batch_size 8
75
  ```
76
 
77
  `TODO: more complete eval results`
78
 
79
- | Benchmark | |
 
80
  |----------------------------------|-------------|-------------------|
81
  | | Phi-4 mini-Ins | phi4-mini-int4wo |
82
  | **Popular aggregated benchmark** | | |
@@ -91,12 +97,13 @@ lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-int4wo-hqq --tas
91
  Our int4wo is only optimized for batch size 1, so we'll only benchmark the batch size 1 performance with vllm.
92
  For batch size N, please see our [gemlite checkpoint](https://huggingface.co/jerryzh168/phi4-mini-int4wo-gemlite).
93
 
94
- ```
95
  # Install latest vllm to get the most recent changes
 
96
  pip install git+https://github.com/vllm-project/vllm.git
 
97
 
98
  # Download dataset
99
- Download sharegpt dataset: wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
100
 
101
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
102
  # benchmark_latency
 
62
 
63
  # Model Quality
64
  We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
 
65
  ```
66
+
67
  # Installing the nightly version to get most recent updates
68
+ ```
69
  pip install git+https://github.com/EleutherAI/lm-evaluation-harness
70
+ ```
71
 
72
  # baseline
73
+ ```
74
  lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 8
75
+ ```
76
 
77
  # int4wo-hqq
78
+ ```
79
  lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-int4wo-hqq --tasks hellaswag --device cuda:0 --batch_size 8
80
  ```
81
 
82
  `TODO: more complete eval results`
83
 
84
+
85
+ | Benchmark | | |
86
  |----------------------------------|-------------|-------------------|
87
  | | Phi-4 mini-Ins | phi4-mini-int4wo |
88
  | **Popular aggregated benchmark** | | |
 
97
  Our int4wo is only optimized for batch size 1, so we'll only benchmark the batch size 1 performance with vllm.
98
  For batch size N, please see our [gemlite checkpoint](https://huggingface.co/jerryzh168/phi4-mini-int4wo-gemlite).
99
 
 
100
  # Install latest vllm to get the most recent changes
101
+ ```
102
  pip install git+https://github.com/vllm-project/vllm.git
103
+ ```
104
 
105
  # Download dataset
106
+ Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
107
 
108
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
109
  # benchmark_latency