EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-cpu-int4-rtn-block-32

Performance Metrics

CPU-INT4-RTN-BLOCK-32

We measured the performance of CPU-INT4-RTN-BLOCK-32 on AMD Ryzen 9 7940HS /w Radeon 78

Prompt Length Generation Length Average Throughput (tps)
128 128 -
128 256 -
128 512 -
128 1024 -
256 128 -
256 256 -
256 512 -
256 1024 -
512 128 -
512 256 -
512 512 -
512 1024 -
1024 128 -
1024 256 -
1024 512 -
1024 1024 -
Downloads last month
6
Inference Examples
Inference API (serverless) has been turned off for this model.

Collection including EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-cpu-int4-rtn-block-32