Update README.md
Browse files
README.md
CHANGED
@@ -234,11 +234,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
234 |
<th>Model</th>
|
235 |
<th>Average Cost Reduction</th>
|
236 |
<th>Latency (s)</th>
|
237 |
-
<th>
|
238 |
<th>Latency (s)th>
|
239 |
-
<th>
|
240 |
<th>Latency (s)</th>
|
241 |
-
<th>
|
242 |
</tr>
|
243 |
</thead>
|
244 |
<tbody>
|
@@ -311,6 +311,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
311 |
</tbody>
|
312 |
</table>
|
313 |
|
|
|
|
|
|
|
314 |
|
315 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
316 |
|
@@ -329,11 +332,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
329 |
<th>Model</th>
|
330 |
<th>Average Cost Reduction</th>
|
331 |
<th>Maximum throughput (QPS)</th>
|
332 |
-
<th>
|
333 |
<th>Maximum throughput (QPS)</th>
|
334 |
-
<th>
|
335 |
<th>Maximum throughput (QPS)</th>
|
336 |
-
<th>
|
337 |
</tr>
|
338 |
</thead>
|
339 |
<tbody style="text-align: center">
|
@@ -404,4 +407,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
404 |
<td>6777</td>
|
405 |
</tr>
|
406 |
</tbody>
|
407 |
-
</table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
234 |
<th>Model</th>
|
235 |
<th>Average Cost Reduction</th>
|
236 |
<th>Latency (s)</th>
|
237 |
+
<th>Queries Per Dollar</th>
|
238 |
<th>Latency (s)th>
|
239 |
+
<th>Queries Per Dollar</th>
|
240 |
<th>Latency (s)</th>
|
241 |
+
<th>Queries Per Dollar</th>
|
242 |
</tr>
|
243 |
</thead>
|
244 |
<tbody>
|
|
|
311 |
</tbody>
|
312 |
</table>
|
313 |
|
314 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
315 |
+
|
316 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|
317 |
|
318 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
319 |
|
|
|
332 |
<th>Model</th>
|
333 |
<th>Average Cost Reduction</th>
|
334 |
<th>Maximum throughput (QPS)</th>
|
335 |
+
<th>Queries Per Dollar</th>
|
336 |
<th>Maximum throughput (QPS)</th>
|
337 |
+
<th>Queries Per Dollar</th>
|
338 |
<th>Maximum throughput (QPS)</th>
|
339 |
+
<th>Queries Per Dollar</th>
|
340 |
</tr>
|
341 |
</thead>
|
342 |
<tbody style="text-align: center">
|
|
|
407 |
<td>6777</td>
|
408 |
</tr>
|
409 |
</tbody>
|
410 |
+
</table>
|
411 |
+
|
412 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
413 |
+
|
414 |
+
**QPS: Queries per second.
|
415 |
+
|
416 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|