diff --git a/notebooks/openvino/question_answering_quantization_jpqd.ipynb b/notebooks/openvino/question_answering_quantization_jpqd.ipynb
new file mode 100644
index 0000000000..b8d925850e
--- /dev/null
+++ b/notebooks/openvino/question_answering_quantization_jpqd.ipynb
@@ -0,0 +1,904 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "eaed3927-e315-46d3-8889-df3f3bbcbf6b",
+ "metadata": {},
+ "source": [
+ "# Joint Pruning Quantization and Distillation with OpenVINO and NNCF\n",
+ "\n",
+ "With quantization, we reduce the precision of the model's weights and activations from floating point (FP32) to integer (INT8). This results in a smaller model with faster inference times with OpenVINO Runtime. \n",
+ "\n",
+ "Please see the [Optimum OpenVINO model compression documentation](https://huggingface.co/docs/optimum/intel/optimization_ov#optimizationhttps://huggingface.co/docs/optimum/intel/optimization_ov#optimization) for more information about compressing models with NNCF and JPQD.\n",
+ "\n",
+ "JPQD is applied during training/finetuning of the model. It's not ideal to train models for a long time in a notebook and we recommend to run the [question-answering example](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino/question-answering) in a terminal to quantize the model yourself. \n",
+ "\n",
+ "To follow this notebook, you do not need to compress the model yourself, you can use the already compressed model that we uploaded to the Hugging Face hub.\n",
+ "\n",
+ "A laptop or desktop with a recent Intel Core processor is recommended for best results. To install the requirements for this notebook, please do `pip install \"optimum[openvino]\" \"evaluate[evaluator]\" ipywidgets datasets` or uncomment the cell below to install the requirements in your current Python environment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "3d4e47b2-89cb-4ffa-84f3-11919fa367e6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# %pip install \"optimum-intel[openvino]\" \"evaluate[evaluator]\" ipywidgets datasets"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "0407fc92-c052-47b7-8721-01836adf3b54",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "2023-03-14 16:03:00.171379: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\n",
+ "2023-03-14 16:03:00.171397: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n",
+ "/home/u158491/.local/lib/python3.8/site-packages/openvino/offline_transformations/__init__.py:10: FutureWarning: The module is private and following namespace `offline_transformations` will be removed in the future.\n",
+ " warnings.warn(\n"
+ ]
+ }
+ ],
+ "source": [
+ "import random\n",
+ "import tempfile\n",
+ "from pathlib import Path\n",
+ "\n",
+ "import datasets\n",
+ "import evaluate\n",
+ "import pandas as pd\n",
+ "import transformers\n",
+ "from evaluate import evaluator\n",
+ "from optimum.intel.openvino import OVModelForQuestionAnswering\n",
+ "from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\n",
+ "\n",
+ "from openvino.runtime import Core\n",
+ "\n",
+ "transformers.logging.set_verbosity_error()\n",
+ "datasets.logging.set_verbosity_error()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "889a16fe-2bc0-477e-b8d6-02a4f7508f03",
+ "metadata": {},
+ "source": [
+ "## Settings\n",
+ "\n",
+ "We will compare the accuracy and performance of the quantized and pruned model with that of an FP32 bert-base-uncased model which was also finetuned on the SQuAD dataset, following the [Transformers question-answering example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering#fine-tuning-bert-on-squad10). \n",
+ "\n",
+ "We give the model_ids for the FP32 model and the INT8 model and define the dataset name. If you trained the models yourself, set FP32_MODEL_ID and INT8_MODEL_ID to the directory containing the model and tokenizer files.\n",
+ "\n",
+ "The models were finetuned on the [Stanford Question Answering Dataset (SQuAD)](https://huggingface.co/datasets/squad), a reading comprehension dataset consisting of questions on a set of Wikipedia articles, where the answer to every question is a segment of text from a given context. The models were finetuned on version 1 of the SQuAD dataset, so VERSION_2_WITH_NEGATIVE should be set to False. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "c32f9a76-414b-43d9-9769-af131223f1c1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "FP32_MODEL_ID = \"helenai/bert-base-uncased-squad-v1\"\n",
+ "INT8_MODEL_ID = \"helenai/bert-base-uncased-squad-v1-jpqd-ov-int8@gpu\"\n",
+ "DATASET_NAME = \"squad\"\n",
+ "VERSION_2_WITH_NEGATIVE = False"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "124bd9ad-077c-4f41-b579-0bf978fe6a1e",
+ "metadata": {},
+ "source": [
+ "## Load the Dataset\n",
+ "\n",
+ "The `datasets` library makes it easy to load datasets. Common datasets can be loaded from the Hugging Face Hub by providing the name of the dataset. See https://github.com/huggingface/datasets. We load the SQuAD dataset with `load_dataset`, show a random dataset item, and the list of categories in the dataset.\n",
+ "\n",
+ "Every dataset item in the SQuAD dataset has a unique id, a title which denotes the category, a context and a question, and answers. The answer is a subset of the context, and both the text of the answer, and the start position of the answer in the context (`answer_start`) are returned.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "602fe46f-c96a-4a0f-9338-58339d466f3a",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'id': '572fdbb004bcaa1900d76dd9',\n",
+ " 'title': 'Scottish_Parliament',\n",
+ " 'context': 'The election produced a majority SNP government, making this the first time in the Scottish Parliament where a party has commanded a parliamentary majority. The SNP took 16 seats from Labour, with many of their key figures not returned to parliament, although Labour leader Iain Gray retained East Lothian by 151 votes. The SNP took a further eight seats from the Liberal Democrats and one seat from the Conservatives. The SNP overall majority meant that there was sufficient support in the Scottish Parliament to hold a referendum on Scottish independence.',\n",
+ " 'question': 'When the election produced an SNP majority government, what was it the first occurrence of?',\n",
+ " 'answers': {'text': ['a party has commanded a parliamentary majority',\n",
+ " 'a parliamentary majority',\n",
+ " 'a party has commanded a parliamentary majority'],\n",
+ " 'answer_start': [109, 131, 109]}}"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "examples = datasets.load_dataset(DATASET_NAME, split=\"validation\")\n",
+ "random.choice(examples)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "d86d98b4-d3d6-4fb5-9b3e-53d61813e52a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{'Normans', 'United_Methodist_Church', 'Packet_switching', 'Fresno,_California', 'Private_school', 'Victoria_and_Albert_Museum', '1973_oil_crisis', 'Genghis_Khan', 'Jacksonville,_Florida', 'Intergovernmental_Panel_on_Climate_Change', 'Huguenot', 'Southern_California', 'Chloroplast', 'Apollo_program', 'Construction', 'Imperialism', 'Amazon_rainforest', 'Super_Bowl_50', 'Black_Death', 'Oxygen', 'Harvard_University', 'Martin_Luther', 'Doctor_Who', 'French_and_Indian_War', 'Economic_inequality', 'Teacher', 'Steam_engine', 'Pharmacy', 'Kenya', 'Sky_(United_Kingdom)', 'Rhine', 'Ctenophora', 'Force', 'Civil_disobedience', 'Scottish_Parliament', 'Computational_complexity_theory', 'Geology', 'European_Union_law', 'American_Broadcasting_Company', 'Warsaw', 'Prime_number', 'Immune_system', 'Yuan_dynasty', 'University_of_Chicago', 'Victoria_(Australia)', 'Newcastle_upon_Tyne', 'Nikola_Tesla', 'Islamism'}\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(set([item[\"title\"] for item in examples]))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "37d8bab7-6eed-4a75-9ee5-330586450453",
+ "metadata": {},
+ "source": [
+ "## Load Model and Tokenizer\n",
+ "\n",
+ "We load the PyTorch FP32 model and the OpenVINO INT8 model from the Hugging Face Hub. The models will be automatically downloaded if it has not been downloaded before, or loaded from the cache otherwise. To load the quantized model with OpenVINO, we use the `OVModelForQuestionAnswering` class. It can be used in the same way as [`AutoModelForQuestionAnswering`](https://huggingface.co/docs/transformers/main/model_doc/auto).\n",
+ "\n",
+ "\n",
+ "We also load the tokenizer, which converts the questions and contexts from the dataset to tokens, converting the inputs in a format the model expects."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "38641b14-07d0-49d5-af86-8b5247ae39d8",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'input_ids': [101, 7592, 2088, 999, 102], 'token_type_ids': [0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1]}"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "fp32_model = AutoModelForQuestionAnswering.from_pretrained(FP32_MODEL_ID)\n",
+ "int8_model = OVModelForQuestionAnswering.from_pretrained(INT8_MODEL_ID)\n",
+ "tokenizer = AutoTokenizer.from_pretrained(FP32_MODEL_ID)\n",
+ "\n",
+ "# See how the tokenizer for the given model converts input text to model input values\n",
+ "tokenizer(\"hello world!\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2574cc63-aad3-4c28-aa6f-e553de911ce5",
+ "metadata": {},
+ "source": [
+ "## Compare INT8 and FP32 models\n",
+ "\n",
+ "We compare the accuracy, model size and inference results and latency of the FP32 and INT8 models.\n",
+ "### Inference Pipeline\n",
+ "\n",
+ "Transformers [Pipelines](https://huggingface.co/docs/transformers/main/en/pipeline_tutorial) simplify model inference. A `Pipeline` is created by adding a task, model and tokenizer to the `pipeline` function. Inference is then as simple as `qa_pipeline({\"question\": question, \"context\": context})`.\n",
+ "\n",
+ "We create two pipelines: `hf_qa_pipeline` and `ov_qa_pipeline` to compare the FP32 PyTorch model with the OpenVINO INT8 model. These pipelines will also be used for showing the accuracy difference and for benchmarking later in this notebook.\n",
+ "\n",
+ "For some Intel processors, it can be beneficial to reshape the OpenVINO model to a static shape of (1,384) for faster inference. This requires padding or truncating inputs to the specified sequence length. This can be done by adding `padding`, `max_seq_len` and `truncation` arguments to the `pipeline` function. See Hugging Face's [padding and truncation documentation](https://huggingface.co/docs/transformers/pad_truncation) for more information on the possible values.\n",
+ "\n",
+ "Setting a shorter sequence length in the cell below will speed up inference further, with the possibility of a drop in accuracy, since larger model inputs will be truncated."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "e02e40dd-b208-42b8-9413-6dac61b75476",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "USE_DYNAMIC_SHAPES = False\n",
+ "\n",
+ "if USE_DYNAMIC_SHAPES:\n",
+ " ov_qa_pipeline = pipeline(\n",
+ " \"question-answering\", model=int8_model, tokenizer=tokenizer\n",
+ " )\n",
+ "else:\n",
+ " seq_length = 384\n",
+ " int8_model.reshape(1, seq_length)\n",
+ " int8_model.compile()\n",
+ " ov_qa_pipeline = pipeline(\n",
+ " \"question-answering\", model=int8_model, tokenizer=tokenizer, max_seq_len=seq_length, padding=\"max_length\", truncation=True\n",
+ " )\n",
+ "\n",
+ "hf_qa_pipeline = pipeline(\"question-answering\", model=fp32_model, tokenizer=tokenizer)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8132b4c8-7c06-4da4-a33a-d2e235a97fd9",
+ "metadata": {},
+ "source": [
+ "Show a dataset item and inference results on both pipelines."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "2e23fe96-8d7f-4aa1-816f-707ca1a2f978",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50.\n"
+ ]
+ }
+ ],
+ "source": [
+ "context = examples[0][\"context\"]\n",
+ "question = \"Who won the game?\"\n",
+ "print(context)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "c1168f1c-14de-4aad-977d-122a8d366935",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers'"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "hf_qa_pipeline({\"question\": question, \"context\": context})[\"answer\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "c885d378-2842-49d0-b583-a2fc023558b5",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'Denver Broncos'"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "ov_qa_pipeline({\"question\": question, \"context\": context})[\"answer\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "97a52092-e352-47ef-9ed2-89508bc48d70",
+ "metadata": {},
+ "source": [
+ "### Accuracy\n",
+ "\n",
+ "We load the quantized model and the original FP32 model, and compare the metrics on both models. The [evaluate](https://github.com/huggingface/evaluate) library makes it very easy to evaluate models on a given dataset, with a given metric. For the SQuAD dataset, the F1 score and Exact Match metrics are returned.\n",
+ "\n",
+ "The SQuAD dataset is pretty large and it can take some time to run the evaluation on the full dataset. For demonstration purposes, we evaluate the metrics on a subset of 500 items of the dataset. The metrics on the full validation dataset are:\n",
+ "\n",
+ "```\n",
+ "FP32 exact match 81.5, F1 88.7\n",
+ "INT8 exact match 82.8, F1 89.5\n",
+ "```\n",
+ "\n",
+ "The evaluate function also keeps track of the time it takes to run. This provides an estimate of performance, but keep in mind that other programs running on the computer (including Jupyter), as well as power management settings, can affect performance.\n",
+ "\n",
+ "If you have a processor with an Intel integrated GPU, or a dedicated Intel GPU, you can run inference on the GPU for even faster performance. An 11th generation Intel Core processor or later with Xe graphics, is recommended for iGPU inference. See [OpenVINO documentation](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.htmlhttps://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html) about installing GPU drivers if you are on Linux or macOS. \n",
+ "\n",
+ "Currently, dynamic shapes are supported with some limitations on GPU. In the code below we enable GPU inference if a GPU is available to OpenVINO and if the model is compiled with static shapes. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "bae78873-feed-408b-9d48-f4008cb5ca61",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "random.seed(2023)\n",
+ "num_items = 500\n",
+ "# Set num_items to len(examples) to validate on the entire dataset. That may take a long time!\n",
+ "# num_items = len(examples)\n",
+ "indices = sorted(random.sample(range(len(examples)), k=num_items))\n",
+ "filtered_examples = examples.select(indices)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "f387b276-5b6b-43f0-924e-80f80ae453d2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "squad_eval = evaluator(\"question-answering\")\n",
+ "\n",
+ "hf_eval_results = squad_eval.compute(\n",
+ " model_or_pipeline=hf_qa_pipeline,\n",
+ " data=filtered_examples,\n",
+ " metric=\"squad\",\n",
+ " squad_v2_format=VERSION_2_WITH_NEGATIVE,\n",
+ ")\n",
+ "\n",
+ "devices = (\"CPU\", \"GPU\") if (\"GPU\" in Core().available_devices and not int8_model.is_dynamic) else (\"CPU\",)\n",
+ "ov_eval_results = {}\n",
+ "for device in devices:\n",
+ " int8_model.to(device)\n",
+ " int8_model.compile()\n",
+ " \n",
+ " # run a few warmup inferences \n",
+ " for item in examples.select(range(10)):\n",
+ " ov_qa_pipeline(item[\"question\"], item[\"context\"])\n",
+ " \n",
+ " ov_eval_results[device] = squad_eval.compute(\n",
+ " model_or_pipeline=ov_qa_pipeline,\n",
+ " data=filtered_examples,\n",
+ " metric=\"squad\",\n",
+ " squad_v2_format=VERSION_2_WITH_NEGATIVE,\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "b349329d-7ab2-4a6b-afd7-460323eeb1c7",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " exact_match | \n",
+ " f1 | \n",
+ " latency | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " FP32 | \n",
+ " 80.8 | \n",
+ " 88.4116 | \n",
+ " 126.8 | \n",
+ "
\n",
+ " \n",
+ " INT8 CPU | \n",
+ " 82.6 | \n",
+ " 89.0472 | \n",
+ " 64.4 | \n",
+ "
\n",
+ " \n",
+ " INT8 GPU | \n",
+ " 82.0 | \n",
+ " 88.8044 | \n",
+ " 27.4 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " exact_match f1 latency\n",
+ "FP32 80.8 88.4116 126.8\n",
+ "INT8 CPU 82.6 89.0472 64.4\n",
+ "INT8 GPU 82.0 88.8044 27.4"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "summary = (\n",
+ " pd.DataFrame.from_records(\n",
+ " [hf_eval_results, *ov_eval_results.values()],\n",
+ " columns=[\"exact_match\", \"f1\", \"latency_in_seconds\"],\n",
+ " index = [\"FP32\", *(f\"INT8 {device}\" for device in devices)]\n",
+ ",\n",
+ " )\n",
+ " .round(4)\n",
+ " .dropna()\n",
+ ")\n",
+ "summary[\"latency_in_seconds\"] *= 1000\n",
+ "summary.columns = [\"exact_match\", \"f1\", \"latency\"]\n",
+ "summary"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "3b435f18-5233-4e54-bc98-ad85d468f041",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "INT8 speedup on CPU: 1.97X\n",
+ "INT8 speedup on GPU: 4.63X\n",
+ "12th Gen Intel(R) Core(TM) i7-1270PE\n"
+ ]
+ }
+ ],
+ "source": [
+ "for device in devices:\n",
+ " int8_speedup = summary.loc[\"FP32\"][\"latency\"] / summary.loc[f\"INT8 {device}\"][\"latency\"]\n",
+ " print(f\"INT8 speedup on {device}: {int8_speedup:.2f}X\")\n",
+ "print(Core().get_property(\"CPU\", \"FULL_DEVICE_NAME\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "db183795-6dae-4ef6-847d-042223264149",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2022-11-07T21:25:39.912874Z",
+ "iopub.status.busy": "2022-11-07T21:25:39.912662Z",
+ "iopub.status.idle": "2022-11-07T21:25:39.916029Z",
+ "shell.execute_reply": "2022-11-07T21:25:39.915541Z",
+ "shell.execute_reply.started": "2022-11-07T21:25:39.912859Z"
+ }
+ },
+ "source": [
+ "### Inference Results\n",
+ "\n",
+ "To fully understand the quality of a model, it is useful to look beyond metrics like Exact Match and F1 score and examine model predictions directly. This can give a more complete impression of the model's performance and help identify areas for improvement.\n",
+ "\n",
+ "In the next cell, we go over a selection of items in the filtered validation set, and display the items where the FP32 prediction score is different from the INT8 prediction score\n",
+ "\n",
+ "The table displays the question and the set of correct answers from the dataset, the FP32 prediction and F1 score and the INT8 prediction and F1 score. The results show that for some predictions, the FP32 model is better, and for others, the INT8 model is, and that for the large majority of dataset items both models are equally accurate."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "ab953c89-ed9d-4afa-8953-541c982174ff",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "results = []\n",
+ "int8_better = 0\n",
+ "num_items = 100\n",
+ "metric = evaluate.load(\"squad_v2\" if VERSION_2_WITH_NEGATIVE else \"squad\")\n",
+ "\n",
+ "for item in filtered_examples.select(range(num_items)):\n",
+ " id, title, context, question, answers = item.values()\n",
+ " fp32_answer = hf_qa_pipeline(question, context)[\"answer\"]\n",
+ " int8_answer = ov_qa_pipeline(question, context)[\"answer\"]\n",
+ "\n",
+ " references = [{\"id\": id, \"answers\": answers}]\n",
+ " fp32_predictions = [{\"id\": id, \"prediction_text\": fp32_answer}]\n",
+ " int8_predictions = [{\"id\": id, \"prediction_text\": int8_answer}]\n",
+ "\n",
+ " fp32_score = round(metric.compute(references=references, predictions=fp32_predictions)[\"f1\"], 2)\n",
+ " int8_score = round(metric.compute(references=references, predictions=int8_predictions)[\"f1\"], 2)\n",
+ "\n",
+ " if int8_score != fp32_score:\n",
+ " results.append((question, answers[\"text\"], fp32_answer, fp32_score, int8_answer, int8_score))\n",
+ " if int8_score > fp32_score:\n",
+ " int8_better += 1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "37b78ee3-c330-4ef8-8528-47d5a8b73424",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Question | \n",
+ " Answer | \n",
+ " FP32 prediction | \n",
+ " FP32 F1 | \n",
+ " INT8 prediction | \n",
+ " INT8 F1 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Prior to Super Bowl 50, when were the Carolina Panthers last there? | \n",
+ " [Super Bowl XXXVIII., Super Bowl XXXVIII, Super Bowl XXXVIII] | \n",
+ " Super Bowl XXXVIII | \n",
+ " 100.00 | \n",
+ " Super Bowl appearance prior to Super Bowl 50 | \n",
+ " 36.36 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Which smartphone customers were the only people who could stream the game on their phones? | \n",
+ " [Verizon Wireless customers, Verizon, Verizon] | \n",
+ " Verizon Wireless | \n",
+ " 80.00 | \n",
+ " Verizon Wireless customers | \n",
+ " 100.00 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Who stripped the ball from Cam Newton while sacking him on this drive? | \n",
+ " [Von Miller, Von Miller, Miller] | \n",
+ " Von Miller | \n",
+ " 100.00 | \n",
+ " linebacker Von Miller | \n",
+ " 80.00 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " What campaign did the Communist regime initiate after WWII? | \n",
+ " [\"Bricks for Warsaw\", Bricks for Warsaw, Bricks for Warsaw] | \n",
+ " Bricks for Warsaw | \n",
+ " 100.00 | \n",
+ " Bricks for Warsaw\" campaign | \n",
+ " 85.71 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " What were Tesla's mother's special abilities? | \n",
+ " [making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems, making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems, making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems] | \n",
+ " memorize Serbian epic poems | \n",
+ " 47.06 | \n",
+ " craft tools, mechanical appliances, and the ability to memorize Serbian epic poems | \n",
+ " 91.67 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " What was Tesla's AC system used for in Pittsburgh? | \n",
+ " [to power the city's streetcars., the city's streetcars, street cars] | \n",
+ " create an alternating current system to power the city's streetcars | \n",
+ " 66.67 | \n",
+ " power the city's streetcars | \n",
+ " 85.71 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " What type of power was displayed at the World's fair by Westinghouse and Tesla? | \n",
+ " [AC power, alternating current, AC power] | \n",
+ " AC power | \n",
+ " 100.00 | \n",
+ " AC | \n",
+ " 66.67 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " Where can Tesla's theories as to what caused the skin damage be found? | \n",
+ " [In his many notes, In his many notes] | \n",
+ " Roentgen rays | \n",
+ " 0.00 | \n",
+ " ozone generated in contact with the skin | \n",
+ " 20.00 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " What did he think was everywhere in the universe? | \n",
+ " [ether, ether] | \n",
+ " ether | \n",
+ " 100.00 | \n",
+ " atoms | \n",
+ " 0.00 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " What fixed set of factors determine the actions of a deterministic Turing machine | \n",
+ " [rules, rules, a fixed set of rules to determine its future actions] | \n",
+ " a fixed set of rules to determine its future actions | \n",
+ " 100.00 | \n",
+ " a fixed set of rules | \n",
+ " 61.54 | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " A non-deterministic Turing machine has the ability to capture what facet of useful analysis? | \n",
+ " [mathematical models, mathematical models, branching] | \n",
+ " mathematical models we want to analyze | \n",
+ " 50.00 | \n",
+ " mathematical models | \n",
+ " 100.00 | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " What factor may make a teacher's role vary? | \n",
+ " [cultures, cultures, cultures] | \n",
+ " among cultures | \n",
+ " 66.67 | \n",
+ " cultures | \n",
+ " 100.00 | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " What is the youngest student a teacher might have? | \n",
+ " [infants, infants, infants] | \n",
+ " infants to adults | \n",
+ " 50.00 | \n",
+ " infants | \n",
+ " 100.00 | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " What's the biggest difference in the teaching relationship for primary and secondary school? | \n",
+ " [the relationship between teachers and children, the relationship between teachers and children., the relationship between teachers and children., the relationship between teachers and children] | \n",
+ " the relationship between teachers and children | \n",
+ " 100.00 | \n",
+ " teachers and children | \n",
+ " 75.00 | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " Where are teachers recruited from? | \n",
+ " [Lehramtstudien (Teaching Education Studies), Lehramtstudien, special university classes] | \n",
+ " special university classes | \n",
+ " 100.00 | \n",
+ " Germany, teachers are mainly civil servants recruited in special university classes | \n",
+ " 42.86 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Question \\\n",
+ "0 Prior to Super Bowl 50, when were the Carolina Panthers last there? \n",
+ "1 Which smartphone customers were the only people who could stream the game on their phones? \n",
+ "2 Who stripped the ball from Cam Newton while sacking him on this drive? \n",
+ "3 What campaign did the Communist regime initiate after WWII? \n",
+ "4 What were Tesla's mother's special abilities? \n",
+ "5 What was Tesla's AC system used for in Pittsburgh? \n",
+ "6 What type of power was displayed at the World's fair by Westinghouse and Tesla? \n",
+ "7 Where can Tesla's theories as to what caused the skin damage be found? \n",
+ "8 What did he think was everywhere in the universe? \n",
+ "9 What fixed set of factors determine the actions of a deterministic Turing machine \n",
+ "10 A non-deterministic Turing machine has the ability to capture what facet of useful analysis? \n",
+ "11 What factor may make a teacher's role vary? \n",
+ "12 What is the youngest student a teacher might have? \n",
+ "13 What's the biggest difference in the teaching relationship for primary and secondary school? \n",
+ "14 Where are teachers recruited from? \n",
+ "\n",
+ " Answer \\\n",
+ "0 [Super Bowl XXXVIII., Super Bowl XXXVIII, Super Bowl XXXVIII] \n",
+ "1 [Verizon Wireless customers, Verizon, Verizon] \n",
+ "2 [Von Miller, Von Miller, Miller] \n",
+ "3 [\"Bricks for Warsaw\", Bricks for Warsaw, Bricks for Warsaw] \n",
+ "4 [making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems, making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems, making home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems] \n",
+ "5 [to power the city's streetcars., the city's streetcars, street cars] \n",
+ "6 [AC power, alternating current, AC power] \n",
+ "7 [In his many notes, In his many notes] \n",
+ "8 [ether, ether] \n",
+ "9 [rules, rules, a fixed set of rules to determine its future actions] \n",
+ "10 [mathematical models, mathematical models, branching] \n",
+ "11 [cultures, cultures, cultures] \n",
+ "12 [infants, infants, infants] \n",
+ "13 [the relationship between teachers and children, the relationship between teachers and children., the relationship between teachers and children., the relationship between teachers and children] \n",
+ "14 [Lehramtstudien (Teaching Education Studies), Lehramtstudien, special university classes] \n",
+ "\n",
+ " FP32 prediction \\\n",
+ "0 Super Bowl XXXVIII \n",
+ "1 Verizon Wireless \n",
+ "2 Von Miller \n",
+ "3 Bricks for Warsaw \n",
+ "4 memorize Serbian epic poems \n",
+ "5 create an alternating current system to power the city's streetcars \n",
+ "6 AC power \n",
+ "7 Roentgen rays \n",
+ "8 ether \n",
+ "9 a fixed set of rules to determine its future actions \n",
+ "10 mathematical models we want to analyze \n",
+ "11 among cultures \n",
+ "12 infants to adults \n",
+ "13 the relationship between teachers and children \n",
+ "14 special university classes \n",
+ "\n",
+ " FP32 F1 \\\n",
+ "0 100.00 \n",
+ "1 80.00 \n",
+ "2 100.00 \n",
+ "3 100.00 \n",
+ "4 47.06 \n",
+ "5 66.67 \n",
+ "6 100.00 \n",
+ "7 0.00 \n",
+ "8 100.00 \n",
+ "9 100.00 \n",
+ "10 50.00 \n",
+ "11 66.67 \n",
+ "12 50.00 \n",
+ "13 100.00 \n",
+ "14 100.00 \n",
+ "\n",
+ " INT8 prediction \\\n",
+ "0 Super Bowl appearance prior to Super Bowl 50 \n",
+ "1 Verizon Wireless customers \n",
+ "2 linebacker Von Miller \n",
+ "3 Bricks for Warsaw\" campaign \n",
+ "4 craft tools, mechanical appliances, and the ability to memorize Serbian epic poems \n",
+ "5 power the city's streetcars \n",
+ "6 AC \n",
+ "7 ozone generated in contact with the skin \n",
+ "8 atoms \n",
+ "9 a fixed set of rules \n",
+ "10 mathematical models \n",
+ "11 cultures \n",
+ "12 infants \n",
+ "13 teachers and children \n",
+ "14 Germany, teachers are mainly civil servants recruited in special university classes \n",
+ "\n",
+ " INT8 F1 \n",
+ "0 36.36 \n",
+ "1 100.00 \n",
+ "2 80.00 \n",
+ "3 85.71 \n",
+ "4 91.67 \n",
+ "5 85.71 \n",
+ "6 66.67 \n",
+ "7 20.00 \n",
+ "8 0.00 \n",
+ "9 61.54 \n",
+ "10 100.00 \n",
+ "11 100.00 \n",
+ "12 100.00 \n",
+ "13 75.00 \n",
+ "14 42.86 "
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.set_option(\"display.max_colwidth\", None)\n",
+ "df = pd.DataFrame(\n",
+ " results,\n",
+ " columns=[\"Question\", \"Answer\", \"FP32 prediction\", \"FP32 F1\", \"INT8 prediction\", \"INT8 F1\"],\n",
+ ")\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "58df445d-af43-4ba1-8195-7d8f00b8f82f",
+ "metadata": {},
+ "source": [
+ "### Model Size\n",
+ "\n",
+ "We save the FP32 and INT8 models to a temporary directory and define a function to show the model size for the PyTorch and OpenVINO models."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "1eeaa81f-7fc5-49ba-80b8-2d95a1310a0c",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "FP32 model size: 435.64 MB\n",
+ "INT8 model size: 147.33 MB\n",
+ "INT8 size decrease: 2.96x\n"
+ ]
+ }
+ ],
+ "source": [
+ "def get_model_size(model_folder, framework):\n",
+ " \"\"\"\n",
+ " Return OpenVINO or PyTorch model size in Mb.\n",
+ " Arguments:\n",
+ " model_folder:\n",
+ " Directory containing a pytorch_model.bin for a PyTorch model, and an openvino_model.xml/.bin for an OpenVINO model.\n",
+ " framework:\n",
+ " Define whether the model is a PyTorch or an OpenVINO model.\n",
+ " \"\"\"\n",
+ " if framework.lower() == \"openvino\":\n",
+ " model_path = Path(model_folder) / \"openvino_model.xml\"\n",
+ " model_size = model_path.stat().st_size + model_path.with_suffix(\".bin\").stat().st_size\n",
+ " elif framework.lower() == \"pytorch\":\n",
+ " model_path = Path(model_folder) / \"pytorch_model.bin\"\n",
+ " model_size = model_path.stat().st_size\n",
+ " model_size /= 1000 * 1000\n",
+ " return model_size\n",
+ "\n",
+ "\n",
+ "with tempfile.TemporaryDirectory() as fp32_model_dir:\n",
+ " fp32_model.save_pretrained(fp32_model_dir)\n",
+ " fp32_model_size = get_model_size(fp32_model_dir, \"pytorch\")\n",
+ "\n",
+ "with tempfile.TemporaryDirectory() as int8_model_dir:\n",
+ " int8_model.save_pretrained(int8_model_dir)\n",
+ " int8_model_size = get_model_size(int8_model_dir, \"openvino\")\n",
+ "\n",
+ "print(f\"FP32 model size: {fp32_model_size:.2f} MB\")\n",
+ "print(f\"INT8 model size: {int8_model_size:.2f} MB\")\n",
+ "print(f\"INT8 size decrease: {fp32_model_size / int8_model_size:.2f}x\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "909c27ee-6cb5-4733-82d2-3f3308d05423",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}