Skip to content

Releases: huggingface/optimum-intel

v1.6.1: Patch release

23 Jan 17:45
Compare
Choose a tag to compare
  • Fix INC ONNX export (#154)
  • Add IPEX backend support to INC quantization (#161)
  • Add Neural Coder support (#146)
  • Make the batch size and sequence length axis dynamic for causal language model for OpenVINO Runtime (#168)

v1.6.0: INC API refactorization

03 Jan 21:34
Compare
Choose a tag to compare

Refactorization of the INC API for neural-compressor v2.0 (#118)

The INCQuantizer should be used to apply post-training (dynamic or static) quantization.

from transformers import AutoModelForQuestionAnswering
from neural_compressor.config import PostTrainingQuantConfig
from optimum.intel.neural_compressor import INCQuantizer

model_name = "distilbert-base-cased-distilled-squad"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
# Load the quantization configuration detailing the quantization we wish to apply
quantization_config = PostTrainingQuantConfig(approach="dynamic")
quantizer = INCQuantizer.from_pretrained(model)
# Apply dynamic quantization and save the resulting model in the given directory
quantizer.quantize(quantization_config=quantization_config, save_directory="quantized_model")

The INCTrainer should be used to apply and combine during training compression techniques such as pruning, quantization and distillation.

from transformers import TrainingArguments, default_data_collator
-from transformers import Trainer
+from optimum.intel.neural_compressor import INCTrainer
+from neural_compressor import QuantizationAwareTrainingConfig

# Load the quantization configuration detailing the quantization we wish to apply
+quantization_config = QuantizationAwareTrainingConfig()

-trainer = Trainer(
+trainer = INCTrainer(
    model=model,
+   quantization_config=quantization_config,
    args=TrainingArguments("quantized_model", num_train_epochs=3.0),
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
)
trainer.save_model()

To load a quantized model, you can just replace your AutoModelForXxx class with the corresponding INCModelForXxx class.

from optimum.intel.neural_compressor import INCModelForSequenceClassification

loaded_model_from_hub = INCModelForSequenceClassification.from_pretrained(
    "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
)

v1.5.5: Patch release

25 Dec 23:42
Compare
Choose a tag to compare
  • Fix FP16 models output results for dynamic input shapes (#139)
  • Modify OpenVINO required version (#141)
  • Improves latency performance for Seq2Seq models inference using OpenVINO runtime (#131)

v1.5.4: Patch release

12 Dec 17:55
Compare
Choose a tag to compare

Fix the IPEX context-manager enabling inference mode, by returning the original model when IPEX cannot optimize it (#132)

v1.5.3: Patch release

05 Dec 14:30
Compare
Choose a tag to compare
  • Fix GenerationMixin import for transformers version >= 4.25.0 (#127)
  • Modify temporarily the maximum required transformers version until fix in OpenVINO export of GPT2 model (#120)

v1.5.2: Patch release

23 Nov 13:28
Compare
Choose a tag to compare
  • Fix OVModel model configuration loading for optimum v1.5.0 (#110)
  • Add possibility to save the ONNX model resulting from the OpenVINO export (#99)
  • Add OVModel options for model compilation (#108)
  • Add default model loading for IncQuantizedModel (#113)

v1.5.1: Patch release

14 Nov 14:24
Compare
Choose a tag to compare
  • Add Neural Compressor torch 1.13 quantization support (#95)
  • Remove the IncTrainer's deprecated _load_state_dict_in_model method (#84)
  • Add ModelForVision2Seq INC support (#70)
  • Rename OVModel device attribute for transformers v2.24.0 compatibility (#94)
  • Rename openvino model file name (#93)

v1.5.0: OpenVINO quantization

21 Oct 10:07
Compare
Choose a tag to compare

Quantization

  • Add OVQuantizer enabling OpenVINO NNCF post-training static quantization (#50)
  • Add OVTrainer enabling OpenVINO NNCF quantization aware training (#67)
  • Add OVConfig the configuration which contains the quantization process informations (#65)

The quantized model resulting from the OVQuantizer and the OVTrainer are exported to the OpenVINO IR and can be loaded with the corresponding OVModelForXxx to perform inference with OpenVINO Runtime.

OVModel

Add OVModelForCausalLM enabling OpenVINO Runtime for models with a causal language modeling head (#76)

v1.4.0: OVModels for OpenVINO inference

26 Sep 15:32
Compare
Choose a tag to compare

OVModel classes were integrated with the 🤗 Hub in order to easily export models through the OpenVINO IR, save and load those resulting models, as well as to easily perform inference.

  • Add OVModel classes enabling OpenVINO inference #21

Below is an example that downloads a DistilBERT model from the Hub, exports it through the OpenVINO IR and saves it:

from optimum.intel.openvino import OVModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = OVModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)
model.save_pretrained(save_directory)

The currently supported model topologies are the following :

  • OVModelForSequenceClassification
  • OVModelForTokenClassification
  • OVModelForQuestionAnswering
  • OVModelForFeatureExtraction
  • OVModelForMaskedLM
  • OVModelForImageClassification
  • OVModelForSeq2SeqLM

Pipelines

The Transformers pipelines support was added, providing an easy way to use OVModels for inference.

-from transformers import AutoModelForSeq2SeqLM
+from optimum.intel.openvino import OVModelForSeq2SeqLM
from transformers import AutoTokenizer, pipeline

model_id = "Helsinki-NLP/opus-mt-en-fr"
-model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
+model = OVModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
text = "He never went out without a book under his arm, and he often came back with two."
outputs = pipe(text)

By default, OVModels support dynamic shapes enabling inputs of every shapes (without any constraint on the batch size or sequence length). To decrease latency, static shapes can be enabled by giving the desired inputs shapes.

  • Add OVModel static shapes #41
model.reshape(1, 20)

FP16 precision can also be enabled.

  • Add OVModel fp16 support #45
model.half()

v1.3.1: Patch release

07 Sep 17:06
Compare
Choose a tag to compare
  • Adapt INC configuration and quantized model loading for transformers release 4.22 #27
  • Fix loss computation when distillation is activated while the weights corresponding to the distillation loss is set to 0 #26