shubhrapandit commited on
Commit
ee3f8f2
·
verified ·
1 Parent(s): 8a7ddc7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -234,11 +234,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
234
  <th>Model</th>
235
  <th>Average Cost Reduction</th>
236
  <th>Latency (s)</th>
237
- <th>QPD</th>
238
  <th>Latency (s)th>
239
- <th>QPD</th>
240
  <th>Latency (s)</th>
241
- <th>QPD</th>
242
  </tr>
243
  </thead>
244
  <tbody>
@@ -311,6 +311,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
311
  </tbody>
312
  </table>
313
 
 
 
 
314
 
315
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
316
 
@@ -329,11 +332,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
329
  <th>Model</th>
330
  <th>Average Cost Reduction</th>
331
  <th>Maximum throughput (QPS)</th>
332
- <th>QPD</th>
333
  <th>Maximum throughput (QPS)</th>
334
- <th>QPD</th>
335
  <th>Maximum throughput (QPS)</th>
336
- <th>QPD</th>
337
  </tr>
338
  </thead>
339
  <tbody style="text-align: center">
@@ -404,4 +407,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
404
  <td>6777</td>
405
  </tr>
406
  </tbody>
407
- </table>
 
 
 
 
 
 
 
234
  <th>Model</th>
235
  <th>Average Cost Reduction</th>
236
  <th>Latency (s)</th>
237
+ <th>Queries Per Dollar</th>
238
  <th>Latency (s)th>
239
+ <th>Queries Per Dollar</th>
240
  <th>Latency (s)</th>
241
+ <th>Queries Per Dollar</th>
242
  </tr>
243
  </thead>
244
  <tbody>
 
311
  </tbody>
312
  </table>
313
 
314
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
315
+
316
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
317
 
318
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
319
 
 
332
  <th>Model</th>
333
  <th>Average Cost Reduction</th>
334
  <th>Maximum throughput (QPS)</th>
335
+ <th>Queries Per Dollar</th>
336
  <th>Maximum throughput (QPS)</th>
337
+ <th>Queries Per Dollar</th>
338
  <th>Maximum throughput (QPS)</th>
339
+ <th>Queries Per Dollar</th>
340
  </tr>
341
  </thead>
342
  <tbody style="text-align: center">
 
407
  <td>6777</td>
408
  </tr>
409
  </tbody>
410
+ </table>
411
+
412
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
413
+
414
+ **QPS: Queries per second.
415
+
416
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).