library_name: optimum | |
tags: | |
- onnx | |
- quantized | |
- int8 | |
- intent-classification | |
base_model: rbojja/intent-classification-small | |
# Intent Classification ONNX Quantized | |
Quantized ONNX version for fast inference. | |
## Usage | |
```python | |
from optimum.onnxruntime import ORTModelForFeatureExtraction | |
from transformers import AutoTokenizer | |
model = ORTModelForFeatureExtraction.from_pretrained("pythn/intent-classification-onnx-quantized") | |
tokenizer = AutoTokenizer.from_pretrained("pythn/intent-classification-onnx-quantized") | |
text = "I want to book a flight" | |
inputs = tokenizer(text, return_tensors="pt") | |
outputs = model(**inputs) | |
``` | |
## Performance | |
- ~4x smaller size | |
- 2-4x faster inference | |
- Minimal accuracy loss | |