jerryzh168 commited on
Commit
8355945
·
verified ·
1 Parent(s): f961b02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -273,13 +273,13 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
273
  # Model Performance
274
 
275
  ## Results (H100 machine)
276
- | Benchmark | | |
277
- |----------------------------------|----------------|--------------------------|
278
- | | Phi-4 mini-Ins | phi4-mini-float8dq |
279
- | latency (batch_size=1) | 1.64s | 1.41s (16% speedup) |
280
- | latency (batch_size=128) | 3.1s | 2.72s (14% speedup) |
281
- | serving (num_prompts=1) | 1.35 req/s | 1.57 req/s (16% speedup) |
282
- | serving (num_prompts=1000) | 66.68 req/s | 80.53 req/s (21% speedup)|
283
 
284
  Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
285
 
 
273
  # Model Performance
274
 
275
  ## Results (H100 machine)
276
+ | Benchmark | | |
277
+ |----------------------------------|----------------|-------------------------------|
278
+ | | Phi-4 mini-Ins | Phi-4-mini-instruct-float8dq |
279
+ | latency (batch_size=1) | 1.64s | 1.41s (16% speedup) |
280
+ | latency (batch_size=128) | 3.1s | 2.72s (14% speedup) |
281
+ | serving (num_prompts=1) | 1.35 req/s | 1.57 req/s (16% speedup) |
282
+ | serving (num_prompts=1000) | 66.68 req/s | 80.53 req/s (21% speedup) |
283
 
284
  Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
285