v1.4.0: OVModels for OpenVINO inference
OVModel classes were integrated with the 🤗 Hub in order to easily export models through the OpenVINO IR, save and load those resulting models, as well as to easily perform inference.
- Add OVModel classes enabling OpenVINO inference #21
Below is an example that downloads a DistilBERT model from the Hub, exports it through the OpenVINO IR and saves it:
from optimum.intel.openvino import OVModelForSequenceClassification
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = OVModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)
model.save_pretrained(save_directory)
The currently supported model topologies are the following :
OVModelForSequenceClassification
OVModelForTokenClassification
OVModelForQuestionAnswering
OVModelForFeatureExtraction
OVModelForMaskedLM
OVModelForImageClassification
OVModelForSeq2SeqLM
Pipelines
The Transformers pipelines support was added, providing an easy way to use OVModels for inference.
-from transformers import AutoModelForSeq2SeqLM
+from optimum.intel.openvino import OVModelForSeq2SeqLM
from transformers import AutoTokenizer, pipeline
model_id = "Helsinki-NLP/opus-mt-en-fr"
-model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
+model = OVModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
text = "He never went out without a book under his arm, and he often came back with two."
outputs = pipe(text)
By default, OVModels support dynamic shapes enabling inputs of every shapes (without any constraint on the batch size or sequence length). To decrease latency, static shapes can be enabled by giving the desired inputs shapes.
- Add OVModel static shapes #41
model.reshape(1, 20)
FP16 precision can also be enabled.
- Add OVModel fp16 support #45
model.half()