Skip to content

v1.4.0: OVModels for OpenVINO inference

Compare
Choose a tag to compare
@echarlaix echarlaix released this 26 Sep 15:32
· 947 commits to main since this release

OVModel classes were integrated with the 🤗 Hub in order to easily export models through the OpenVINO IR, save and load those resulting models, as well as to easily perform inference.

  • Add OVModel classes enabling OpenVINO inference #21

Below is an example that downloads a DistilBERT model from the Hub, exports it through the OpenVINO IR and saves it:

from optimum.intel.openvino import OVModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = OVModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)
model.save_pretrained(save_directory)

The currently supported model topologies are the following :

  • OVModelForSequenceClassification
  • OVModelForTokenClassification
  • OVModelForQuestionAnswering
  • OVModelForFeatureExtraction
  • OVModelForMaskedLM
  • OVModelForImageClassification
  • OVModelForSeq2SeqLM

Pipelines

The Transformers pipelines support was added, providing an easy way to use OVModels for inference.

-from transformers import AutoModelForSeq2SeqLM
+from optimum.intel.openvino import OVModelForSeq2SeqLM
from transformers import AutoTokenizer, pipeline

model_id = "Helsinki-NLP/opus-mt-en-fr"
-model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
+model = OVModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
text = "He never went out without a book under his arm, and he often came back with two."
outputs = pipe(text)

By default, OVModels support dynamic shapes enabling inputs of every shapes (without any constraint on the batch size or sequence length). To decrease latency, static shapes can be enabled by giving the desired inputs shapes.

  • Add OVModel static shapes #41
model.reshape(1, 20)

FP16 precision can also be enabled.

  • Add OVModel fp16 support #45
model.half()