Releases · huggingface/optimum-intel

23 Jan 17:45

echarlaix

v1.6.1

0a8c181

v1.6.1: Patch release

Fix INC ONNX export (#154)
Add IPEX backend support to INC quantization (#161)
Add Neural Coder support (#146)
Make the batch size and sequence length axis dynamic for causal language model for OpenVINO Runtime (#168)

Assets 2

03 Jan 21:34

echarlaix

v1.6.0

9833693

v1.6.0: INC API refactorization

Refactorization of the INC API for neural-compressor v2.0 (#118)

The INCQuantizer should be used to apply post-training (dynamic or static) quantization.

from transformers import AutoModelForQuestionAnswering
from neural_compressor.config import PostTrainingQuantConfig
from optimum.intel.neural_compressor import INCQuantizer

model_name = "distilbert-base-cased-distilled-squad"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
# Load the quantization configuration detailing the quantization we wish to apply
quantization_config = PostTrainingQuantConfig(approach="dynamic")
quantizer = INCQuantizer.from_pretrained(model)
# Apply dynamic quantization and save the resulting model in the given directory
quantizer.quantize(quantization_config=quantization_config, save_directory="quantized_model")

The INCTrainer should be used to apply and combine during training compression techniques such as pruning, quantization and distillation.

from transformers import TrainingArguments, default_data_collator
-from transformers import Trainer
+from optimum.intel.neural_compressor import INCTrainer
+from neural_compressor import QuantizationAwareTrainingConfig

# Load the quantization configuration detailing the quantization we wish to apply
+quantization_config = QuantizationAwareTrainingConfig()

-trainer = Trainer(
+trainer = INCTrainer(
    model=model,
+   quantization_config=quantization_config,
    args=TrainingArguments("quantized_model", num_train_epochs=3.0),
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
)
trainer.save_model()

To load a quantized model, you can just replace your AutoModelForXxx class with the corresponding INCModelForXxx class.

from optimum.intel.neural_compressor import INCModelForSequenceClassification

loaded_model_from_hub = INCModelForSequenceClassification.from_pretrained(
    "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
)

Assets 2

25 Dec 23:42

echarlaix

v1.5.5

7f448fb

v1.5.5: Patch release

Fix FP16 models output results for dynamic input shapes (#139)
Modify OpenVINO required version (#141)
Improves latency performance for Seq2Seq models inference using OpenVINO runtime (#131)

Assets 2

12 Dec 17:55

echarlaix

v1.5.4

a236013

v1.5.4: Patch release

Fix the IPEX context-manager enabling inference mode, by returning the original model when IPEX cannot optimize it (#132)

Assets 2

05 Dec 14:30

echarlaix

v1.5.3

ab728cc

v1.5.3: Patch release

Fix GenerationMixin import for transformers version >= 4.25.0 (#127)
Modify temporarily the maximum required transformers version until fix in OpenVINO export of GPT2 model (#120)

Assets 2

23 Nov 13:28

echarlaix

v1.5.2

ee4f05b

v1.5.2: Patch release

Fix OVModel model configuration loading for optimum v1.5.0 (#110)
Add possibility to save the ONNX model resulting from the OpenVINO export (#99)
Add OVModel options for model compilation (#108)
Add default model loading for IncQuantizedModel (#113)

Assets 2

14 Nov 14:24

echarlaix

v1.5.1

ce26de0

v1.5.1: Patch release

Add Neural Compressor torch 1.13 quantization support (#95)
Remove the IncTrainer's deprecated _load_state_dict_in_model method (#84)
Add ModelForVision2Seq INC support (#70)
Rename OVModel device attribute for transformers v2.24.0 compatibility (#94)
Rename openvino model file name (#93)

Assets 2

21 Oct 10:07

echarlaix

v1.5.0

6995ef5

v1.5.0: OpenVINO quantization

Quantization

Add OVQuantizer enabling OpenVINO NNCF post-training static quantization (#50)
Add OVTrainer enabling OpenVINO NNCF quantization aware training (#67)
Add OVConfig the configuration which contains the quantization process informations (#65)

The quantized model resulting from the OVQuantizer and the OVTrainer are exported to the OpenVINO IR and can be loaded with the corresponding OVModelForXxx to perform inference with OpenVINO Runtime.

OVModel

Add OVModelForCausalLM enabling OpenVINO Runtime for models with a causal language modeling head (#76)

Assets 2

26 Sep 15:32

echarlaix

v1.4.0

4a31472

v1.4.0: OVModels for OpenVINO inference

OVModel classes were integrated with the 🤗 Hub in order to easily export models through the OpenVINO IR, save and load those resulting models, as well as to easily perform inference.

Add OVModel classes enabling OpenVINO inference #21

Below is an example that downloads a DistilBERT model from the Hub, exports it through the OpenVINO IR and saves it:

from optimum.intel.openvino import OVModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = OVModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)
model.save_pretrained(save_directory)

The currently supported model topologies are the following :

OVModelForSequenceClassification
OVModelForTokenClassification
OVModelForQuestionAnswering
OVModelForFeatureExtraction
OVModelForMaskedLM
OVModelForImageClassification
OVModelForSeq2SeqLM

Pipelines

The Transformers pipelines support was added, providing an easy way to use OVModels for inference.

-from transformers import AutoModelForSeq2SeqLM
+from optimum.intel.openvino import OVModelForSeq2SeqLM
from transformers import AutoTokenizer, pipeline

model_id = "Helsinki-NLP/opus-mt-en-fr"
-model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
+model = OVModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
text = "He never went out without a book under his arm, and he often came back with two."
outputs = pipe(text)

By default, OVModels support dynamic shapes enabling inputs of every shapes (without any constraint on the batch size or sequence length). To decrease latency, static shapes can be enabled by giving the desired inputs shapes.

Add OVModel static shapes #41

model.reshape(1, 20)

FP16 precision can also be enabled.

Add OVModel fp16 support #45

model.half()

Assets 2

07 Sep 17:06

echarlaix

v1.3.1

2c3111d

v1.3.1: Patch release

Adapt INC configuration and quantized model loading for transformers release 4.22 #27
Fix loss computation when distillation is activated while the weights corresponding to the distillation loss is set to 0 #26

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization

OVModel

Pipelines

Releases: huggingface/optimum-intel

v1.6.1: Patch release

v1.6.0: INC API refactorization

v1.5.5: Patch release

v1.5.4: Patch release

v1.5.3: Patch release

v1.5.2: Patch release

v1.5.1: Patch release

v1.5.0: OpenVINO quantization

Quantization

OVModel

v1.4.0: OVModels for OpenVINO inference

Pipelines

v1.3.1: Patch release