jerryzh168 commited on
Commit
4989b4d
·
verified ·
1 Parent(s): e38437b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -59,10 +59,9 @@ def benchmark_fn(f, *args, **kwargs):
59
  torchao.quantization.utils.recommended_inductor_config_setter()
60
  quantized_model = torch.compile(quantized_model, mode="max-autotune")
61
  print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_new_tokens=128))
62
-
63
  # Model Quality
64
  We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
65
- ```
66
 
67
  # Installing the nightly version to get most recent updates
68
  ```
@@ -119,7 +118,7 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
119
 
120
  # benchmark_serving
121
 
122
- We also benchmarked the throughput with real serving environment.
123
 
124
  ## baseline
125
  Server:
 
59
  torchao.quantization.utils.recommended_inductor_config_setter()
60
  quantized_model = torch.compile(quantized_model, mode="max-autotune")
61
  print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_new_tokens=128))
62
+ ```
63
  # Model Quality
64
  We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
 
65
 
66
  # Installing the nightly version to get most recent updates
67
  ```
 
118
 
119
  # benchmark_serving
120
 
121
+ We also benchmarked the throughput in a serving environment.
122
 
123
  ## baseline
124
  Server: