Quantized BERT-base MNLI model with 90% of usntructured sparsity

The pruned and quantized model in the OpenVINO IR. The pruned model was taken from this source and quantized with the code below using HF Optimum for OpenVINO:

from functools import partial
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from optimum.intel.openvino import OVConfig, OVQuantizer

model_id = "neuralmagic/oBERT-12-downstream-pruned-unstructured-90-mnli" #"typeform/distilbert-base-uncased-mnli" 
model = AutoModelForSequenceClassification.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
save_dir = "./nm_mnli_90"

def preprocess_function(examples, tokenizer):
   return tokenizer(examples["premise"], examples["hypothesis"], padding="max_length", max_length=128, truncation=True)

# Load the default quantization configuration detailing the quantization we wish to apply
quantization_config = OVConfig()
# Instantiate our OVQuantizer using the desired configuration
quantizer = OVQuantizer.from_pretrained(model, feature="sequence-classification")
# Create the calibration dataset used to perform static quantization

calibration_dataset = quantizer.get_calibration_dataset(
   "glue",
   dataset_config_name="mnli",
   preprocess_function=partial(preprocess_function, tokenizer=tokenizer),
   num_samples=100,
   dataset_split="train",
)
# Apply static quantization and export the resulting quantized model to OpenVINO IR format
quantizer.quantize(
   quantization_config=quantization_config,
   calibration_dataset=calibration_dataset,
   save_directory=save_dir,
)
# Save the tokenizer
tokenizer.save_pretrained(save_dir)
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.