Add JPQD evaluation notebook

huggingface · Mar 12, 2023 · 8f7b129 · 8f7b129
1 parent 60d4877
commit 8f7b129
Showing 1 changed file with 1,145 additions and 0 deletions.
diff --git a/notebooks/openvino/question_answering_quantization_jpqd.ipynb b/notebooks/openvino/question_answering_quantization_jpqd.ipynb
@@ -0,0 +1,1145 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "eaed3927-e315-46d3-8889-df3f3bbcbf6b",
+   "metadata": {},
+   "source": [
+    "# Joint Pruning Quantization and Distillation with OpenVINO and NNCF\n",
+    "\n",
+    "With quantization, we reduce the precision of the model's weights and activations from floating point (FP32) to integer (INT8). This results in a smaller model with faster inference times with OpenVINO Runtime. \n",
+    "\n",
+    "JPQD is applied during training/finetuning of the model. It's not ideal to train models for a long time in a notebook and we recommend to run the [question-answering example](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino/question-answering) in a terminal to quantize the model yourself. \n",
+    "\n",
+    "To follow this notebook, you do not need to compress the model yourself, you can use the already compressed model that we uploaded to the Hugging Face hub.\n",
+    "\n",
+    "A laptop or desktop with a recent Intel Core processor is recommended for best results. To install the requirements for this notebook, please do `pip install \"optimum[openvino]\" \"evaluate[evaluator]\" ipywidgets datasets` or uncomment the cell below to install the requirements in your current Python environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "3d4e47b2-89cb-4ffa-84f3-11919fa367e6",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:31.378027Z",
+     "iopub.status.busy": "2023-03-12T22:12:31.377775Z",
+     "iopub.status.idle": "2023-03-12T22:12:31.380560Z",
+     "shell.execute_reply": "2023-03-12T22:12:31.380091Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:31.378004Z"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# %pip install \"optimum-intel[openvino]\" \"evaluate[evaluator]\" ipywidgets datasets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "0407fc92-c052-47b7-8721-01836adf3b54",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:31.381690Z",
+     "iopub.status.busy": "2023-03-12T22:12:31.381461Z",
+     "iopub.status.idle": "2023-03-12T22:12:33.513287Z",
+     "shell.execute_reply": "2023-03-12T22:12:33.512844Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:31.381670Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/helena/venvs/openvino_env/lib/python3.10/site-packages/openvino/offline_transformations/__init__.py:10: FutureWarning: The module is private and following namespace `offline_transformations` will be removed in the future.\n",
+      "  warnings.warn(\n"
+     ]
+    }
+   ],
+   "source": [
+    "import random\n",
+    "import tempfile\n",
+    "from pathlib import Path\n",
+    "\n",
+    "import datasets\n",
+    "import evaluate\n",
+    "import pandas as pd\n",
+    "import transformers\n",
+    "from evaluate import evaluator\n",
+    "from openvino.runtime import Core\n",
+    "from optimum.intel.openvino import OVModelForQuestionAnswering\n",
+    "from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\n",
+    "\n",
+    "transformers.logging.set_verbosity_error()\n",
+    "datasets.logging.set_verbosity_error()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "889a16fe-2bc0-477e-b8d6-02a4f7508f03",
+   "metadata": {},
+   "source": [
+    "## Settings\n",
+    "\n",
+    "We will compare the accuracy and performance of the quantized and pruned model with that of an FP32 bert-base-uncased model which was also finetuned on the SQuAD dataset, following the [Transformers question-answering example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering#fine-tuning-bert-on-squad10). \n",
+    "\n",
+    "We give the model_ids for the FP32 model and the INT8 model and define the dataset name. If you trained the models yourself, set FP32_MODEL_ID and INT8_MODEL_ID to the directory containing the model and tokenizer files.\n",
+    "\n",
+    "The models were finetuned on the [Stanford Question Answering Dataset (SQuAD)](https://huggingface.co/datasets/squad), a reading comprehension dataset consisting of questions on a set of Wikipedia articles, where the answer to every question is a segment of text from a given context. The models were finetuned on version 1 of the SQuAD dataset, so VERSION_2_WITH_NEGATIVE should be set to False. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "c32f9a76-414b-43d9-9769-af131223f1c1",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:33.514043Z",
+     "iopub.status.busy": "2023-03-12T22:12:33.513822Z",
+     "iopub.status.idle": "2023-03-12T22:12:33.516563Z",
+     "shell.execute_reply": "2023-03-12T22:12:33.516172Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:33.514032Z"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "FP32_MODEL_ID = \"helenai/bert-base-uncased-squad-v1\"\n",
+    "INT8_MODEL_ID = \"helenai/bert-base-uncased-squad-v1-jpqd-ov-int8\"\n",
+    "DATASET_NAME = \"squad\"\n",
+    "VERSION_2_WITH_NEGATIVE = False"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "124bd9ad-077c-4f41-b579-0bf978fe6a1e",
+   "metadata": {},
+   "source": [
+    "## Load the Dataset\n",
+    "\n",
+    "The `datasets` library makes it easy to load datasets. Common datasets can be loaded from the Hugging Face Hub by providing the name of the dataset. See https://github.com/huggingface/datasets. We load the SQuAD dataset with `load_dataset`, show a random dataset item, and the list of categories in the dataset.\n",
+    "\n",
+    "Every dataset item in the SQuAD dataset has a unique id, a title which denotes the category, a context and a question, and answers. The answer is a subset of the context, and both the text of the answer, and the start position of the answer in the context (`answer_start`) are returned.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "602fe46f-c96a-4a0f-9338-58339d466f3a",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:33.517147Z",
+     "iopub.status.busy": "2023-03-12T22:12:33.517032Z",
+     "iopub.status.idle": "2023-03-12T22:12:37.981651Z",
+     "shell.execute_reply": "2023-03-12T22:12:37.981135Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:33.517136Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'id': '56bec6ac3aeaaa14008c9400',\n",
+       " 'title': 'Super_Bowl_50',\n",
+       " 'context': 'Six-time Grammy winner and Academy Award nominee Lady Gaga performed the national anthem, while Academy Award winner Marlee Matlin provided American Sign Language (ASL) translation.',\n",
+       " 'question': 'What did Marlee Matlin translate?',\n",
+       " 'answers': {'text': ['the national anthem',\n",
+       "   'the national anthem',\n",
+       "   'national anthem'],\n",
+       "  'answer_start': [69, 69, 73]}}"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "examples = datasets.load_dataset(DATASET_NAME, split=\"validation\")\n",
+    "random.choice(examples)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "d86d98b4-d3d6-4fb5-9b3e-53d61813e52a",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:37.982277Z",
+     "iopub.status.busy": "2023-03-12T22:12:37.982122Z",
+     "iopub.status.idle": "2023-03-12T22:12:38.352371Z",
+     "shell.execute_reply": "2023-03-12T22:12:38.351844Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:37.982265Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'Force', 'French_and_Indian_War', 'Yuan_dynasty', 'Fresno,_California', 'Sky_(United_Kingdom)', 'Intergovernmental_Panel_on_Climate_Change', 'Apollo_program', 'American_Broadcasting_Company', 'Immune_system', 'Nikola_Tesla', 'Harvard_University', 'Super_Bowl_50', 'Martin_Luther', 'Geology', 'Teacher', 'Newcastle_upon_Tyne', '1973_oil_crisis', 'Construction', 'Oxygen', 'Ctenophora', 'Warsaw', 'Chloroplast', 'Jacksonville,_Florida', 'Islamism', 'Amazon_rainforest', 'Private_school', 'Victoria_(Australia)', 'European_Union_law', 'Genghis_Khan', 'United_Methodist_Church', 'Pharmacy', 'Imperialism', 'Rhine', 'Doctor_Who', 'Computational_complexity_theory', 'Kenya', 'Huguenot', 'Civil_disobedience', 'Scottish_Parliament', 'University_of_Chicago', 'Packet_switching', 'Normans', 'Southern_California', 'Black_Death', 'Economic_inequality', 'Victoria_and_Albert_Museum', 'Steam_engine', 'Prime_number'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(set([item[\"title\"] for item in examples]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37d8bab7-6eed-4a75-9ee5-330586450453",
+   "metadata": {},
+   "source": [
+    "## Load Model and Tokenizer\n",
+    "\n",
+    "We load the PyTorch FP32 model and the OpenVINO INT8 model from the Hugging Face Hub. The models will be automatically downloaded if it has not been downloaded before, or loaded from the cache otherwise.\n",
+    "\n",
+    "We also load the tokenizer, which converts the questions and contexts from the dataset to tokens, converting the inputs in a format the model expects."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "38641b14-07d0-49d5-af86-8b5247ae39d8",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:38.353176Z",
+     "iopub.status.busy": "2023-03-12T22:12:38.352857Z",
+     "iopub.status.idle": "2023-03-12T22:12:47.472630Z",
+     "shell.execute_reply": "2023-03-12T22:12:47.472290Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:38.353157Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "39dcf918f6a34901be0551770043fa03",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Downloading (…)lve/main/config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "bdf0a4c5d08a490cbcbf14e7cc5ce7df",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Downloading (…)n/openvino_model.xml:   0%|          | 0.00/1.11M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "bfa81ca6c462478885ffa8e4ecb4590c",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Downloading openvino_model.bin:   0%|          | 0.00/75.5M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'input_ids': [101, 7592, 2088, 999, 102], 'token_type_ids': [0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1]}"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "fp32_model = AutoModelForQuestionAnswering.from_pretrained(FP32_MODEL_ID)\n",
+    "int8_model = OVModelForQuestionAnswering.from_pretrained(INT8_MODEL_ID)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(FP32_MODEL_ID)\n",
+    "\n",
+    "# See how the tokenizer for the given model converts input text to model input values\n",
+    "tokenizer(\"hello world!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2574cc63-aad3-4c28-aa6f-e553de911ce5",
+   "metadata": {},
+   "source": [
+    "## Compare INT8 and FP32 models\n",
+    "\n",
+    "We compare the accuracy, model size and inference results and latency of the FP32 and INT8 models.\n",
+    "### Inference Pipeline\n",
+    "\n",
+    "Transformers [Pipelines](https://huggingface.co/docs/transformers/main/en/pipeline_tutorial) simplify model inference. A `Pipeline` is created by adding a task, model and tokenizer to the `pipeline` function. Inference is then as simple as `qa_pipeline({\"question\": question, \"context\": context})`.\n",
+    "\n",
+    "We create two pipelines: `hf_qa_pipeline` and `ov_qa_pipeline` to compare the FP32 PyTorch model with the OpenVINO INT8 model. These pipelines will also be used for showing the accuracy difference and for benchmarking later in this notebook.\n",
+    "\n",
+    "For some Intel processors, it can be beneficial to reshape the model to a static shape of (1,384) for faster inference. This requires padding or truncating inputs to the specified sequence length. This can be done by adding `padding`, `max_seq_len` and `truncation` arguments to the `pipeline` function. See Hugging Face's [padding and truncation documentation](https://huggingface.co/docs/transformers/pad_truncation) for more information on the possible values.\n",
+    "\n",
+    "Setting a shorter sequence length in the cell below will speed up inference further, with the possibility of a drop in accuracy, since larger model inputs will be truncated."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "e02e40dd-b208-42b8-9413-6dac61b75476",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:47.474523Z",
+     "iopub.status.busy": "2023-03-12T22:12:47.474322Z",
+     "iopub.status.idle": "2023-03-12T22:12:48.209187Z",
+     "shell.execute_reply": "2023-03-12T22:12:48.208797Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:47.474504Z"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "seq_length = 384\n",
+    "int8_model.reshape(1, seq_length)\n",
+    "int8_model.compile()\n",
+    "ov_qa_pipeline = pipeline(\n",
+    "    \"question-answering\", model=int8_model, tokenizer=tokenizer, max_seq_len=seq_length, padding=\"max_length\", truncation=True\n",
+    ")\n",
+    "\n",
+    "hf_qa_pipeline = pipeline(\"question-answering\", model=fp32_model, tokenizer=tokenizer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8132b4c8-7c06-4da4-a33a-d2e235a97fd9",
+   "metadata": {},
+   "source": [
+    "Show a dataset item and inference results on both pipelines."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "2e23fe96-8d7f-4aa1-816f-707ca1a2f978",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:48.210031Z",
+     "iopub.status.busy": "2023-03-12T22:12:48.209658Z",
+     "iopub.status.idle": "2023-03-12T22:12:48.212599Z",
+     "shell.execute_reply": "2023-03-12T22:12:48.212259Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:48.210017Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50.\n"
+     ]
+    }
+   ],
+   "source": [
+    "context = examples[0][\"context\"]\n",
+    "question = \"Who won the game?\"\n",
+    "print(context)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "c1168f1c-14de-4aad-977d-122a8d366935",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:48.213230Z",
+     "iopub.status.busy": "2023-03-12T22:12:48.213050Z",
+     "iopub.status.idle": "2023-03-12T22:12:48.375304Z",
+     "shell.execute_reply": "2023-03-12T22:12:48.374779Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:48.213214Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers'"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "hf_qa_pipeline({\"question\": question, \"context\": context})[\"answer\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "c885d378-2842-49d0-b583-a2fc023558b5",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:48.376197Z",
+     "iopub.status.busy": "2023-03-12T22:12:48.376016Z",
+     "iopub.status.idle": "2023-03-12T22:12:48.438071Z",
+     "shell.execute_reply": "2023-03-12T22:12:48.437524Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:48.376179Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Denver Broncos'"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "ov_qa_pipeline({\"question\": question, \"context\": context})[\"answer\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97a52092-e352-47ef-9ed2-89508bc48d70",
+   "metadata": {},
+   "source": [
+    "### Accuracy\n",
+    "\n",
+    "We load the quantized model and the original FP32 model, and compare the metrics on both models. The [evaluate](https://github.com/huggingface/evaluate) library makes it very easy to evaluate models on a given dataset, with a given metric. For the SQuAD dataset, the F1 score and Exact Match metrics are returned.\n",
+    "\n",
+    "To load the quantized model with OpenVINO, we use the `OVModelForQuestionAnswering` class. It can be used in the same way as [`AutoModelForQuestionAnswering`](https://huggingface.co/docs/transformers/main/model_doc/auto).\n",
+    "\n",
+    "The pipelines we created in the previous section are used to perform evaluation. \n",
+    "\n",
+    "The SQuAD dataset is pretty large and it can take some time to run the evaluation on the full dataset. For demonstration purposes, we evaluate the metrics on a subset of 500 items of the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "bae78873-feed-408b-9d48-f4008cb5ca61",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:48.438793Z",
+     "iopub.status.busy": "2023-03-12T22:12:48.438632Z",
+     "iopub.status.idle": "2023-03-12T22:12:48.446644Z",
+     "shell.execute_reply": "2023-03-12T22:12:48.446169Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:48.438776Z"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "random.seed(42)\n",
+    "indices = sorted(random.choices(range(len(examples)), k=500))\n",
+    "filtered_examples = examples.select(indices)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "f808cf9e-c821-4342-9e03-f40b92c8e39d",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:12:48.448842Z",
+     "iopub.status.busy": "2023-03-12T22:12:48.447233Z",
+     "iopub.status.idle": "2023-03-12T22:14:43.530079Z",
+     "shell.execute_reply": "2023-03-12T22:14:43.529332Z",
+     "shell.execute_reply.started": "2023-03-12T22:12:48.448819Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>exact_match</th>\n",
+       "      <th>f1</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>FP32</th>\n",
+       "      <td>80.8</td>\n",
+       "      <td>87.95</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>INT8</th>\n",
+       "      <td>83.8</td>\n",
+       "      <td>89.85</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "      exact_match     f1\n",
+       "FP32         80.8  87.95\n",
+       "INT8         83.8  89.85"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "squad_eval = evaluator(\"question-answering\")\n",
+    "\n",
+    "ov_eval_results = squad_eval.compute(\n",
+    "    model_or_pipeline=ov_qa_pipeline,\n",
+    "    data=filtered_examples,\n",
+    "    metric=\"squad\",\n",
+    "    squad_v2_format=VERSION_2_WITH_NEGATIVE,\n",
+    ")\n",
+    "\n",
+    "hf_eval_results = squad_eval.compute(\n",
+    "    model_or_pipeline=hf_qa_pipeline,\n",
+    "    data=filtered_examples,\n",
+    "    metric=\"squad\",\n",
+    "    squad_v2_format=VERSION_2_WITH_NEGATIVE,\n",
+    ")\n",
+    "\n",
+    "pd.DataFrame.from_records(\n",
+    "    [hf_eval_results, ov_eval_results],\n",
+    "    columns=[\"exact_match\", \"f1\"],\n",
+    "    index=[\"FP32\", \"INT8\"],\n",
+    ").round(2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db183795-6dae-4ef6-847d-042223264149",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-11-07T21:25:39.912874Z",
+     "iopub.status.busy": "2022-11-07T21:25:39.912662Z",
+     "iopub.status.idle": "2022-11-07T21:25:39.916029Z",
+     "shell.execute_reply": "2022-11-07T21:25:39.915541Z",
+     "shell.execute_reply.started": "2022-11-07T21:25:39.912859Z"
+    }
+   },
+   "source": [
+    "### Inference Results\n",
+    "\n",
+    "To fully understand the quality of a model, it is useful to look beyond metrics like Exact Match and F1 score and examine model predictions directly. This can give a more complete impression of the model's performance and help identify areas for improvement.\n",
+    "\n",
+    "In the next cell, we go over a selection of items in the filtered validation set, and display the items where the FP32 prediction score is different from the INT8 prediction score\n",
+    "\n",
+    "The table displays the question and the set of correct answers from the dataset, the FP32 prediction and F1 score and the INT8 prediction and F1 score. The results show that for some predictions, the FP32 model is better, and for others, the INT8 model is, and that for the large majority of dataset items both models are equally accurate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "ab953c89-ed9d-4afa-8953-541c982174ff",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:14:43.531142Z",
+     "iopub.status.busy": "2023-03-12T22:14:43.530883Z",
+     "iopub.status.idle": "2023-03-12T22:15:32.547345Z",
+     "shell.execute_reply": "2023-03-12T22:15:32.546629Z",
+     "shell.execute_reply.started": "2023-03-12T22:14:43.531123Z"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "results = []\n",
+    "int8_better = 0\n",
+    "num_items = 200\n",
+    "metric = evaluate.load(\"squad_v2\" if VERSION_2_WITH_NEGATIVE else \"squad\")\n",
+    "\n",
+    "for item in filtered_examples.select(range(num_items)):\n",
+    "    id, title, context, question, answers = item.values()\n",
+    "    fp32_answer = hf_qa_pipeline(question, context)[\"answer\"]\n",
+    "    int8_answer = ov_qa_pipeline(question, context)[\"answer\"]\n",
+    "\n",
+    "    references = [{\"id\": id, \"answers\": answers}]\n",
+    "    fp32_predictions = [{\"id\": id, \"prediction_text\": fp32_answer}]\n",
+    "    int8_predictions = [{\"id\": id, \"prediction_text\": int8_answer}]\n",
+    "\n",
+    "    fp32_score = round(metric.compute(references=references, predictions=fp32_predictions)[\"f1\"], 2)\n",
+    "    int8_score = round(metric.compute(references=references, predictions=int8_predictions)[\"f1\"], 2)\n",
+    "\n",
+    "    if int8_score != fp32_score:\n",
+    "        results.append((question, answers[\"text\"], fp32_answer, fp32_score, int8_answer, int8_score))\n",
+    "        if int8_score > fp32_score:\n",
+    "            int8_better += 1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "37b78ee3-c330-4ef8-8528-47d5a8b73424",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:15:32.548330Z",
+     "iopub.status.busy": "2023-03-12T22:15:32.548128Z",
+     "iopub.status.idle": "2023-03-12T22:15:32.567532Z",
+     "shell.execute_reply": "2023-03-12T22:15:32.566728Z",
+     "shell.execute_reply.started": "2023-03-12T22:15:32.548311Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Question</th>\n",
+       "      <th>Answer</th>\n",
+       "      <th>FP32 prediction</th>\n",
+       "      <th>FP32 F1</th>\n",
+       "      <th>INT8 prediction</th>\n",
+       "      <th>INT8 F1</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>What city did Super Bowl 50 take place in?</td>\n",
+       "      <td>[Santa Clara, Santa Clara, Santa Clara]</td>\n",
+       "      <td>Santa Clara, California</td>\n",
+       "      <td>80.00</td>\n",
+       "      <td>Santa Clara</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Who is the head coach of the Broncos?</td>\n",
+       "      <td>[Gary Kubiak, Gary Kubiak, Kubiak]</td>\n",
+       "      <td>Gary Kubiak</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>John Fox</td>\n",
+       "      <td>0.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>How did the drive end for the Panthers?</td>\n",
+       "      <td>[punt, Newton was sacked, sacked]</td>\n",
+       "      <td>The Panthers could not gain any yards with their possession and had to punt</td>\n",
+       "      <td>14.29</td>\n",
+       "      <td>had to punt</td>\n",
+       "      <td>50.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>Warsaw's sidewalks and sanitation facilities are some examples of things which have what?</td>\n",
+       "      <td>[improved markedly, improved markedly, improved markedly]</td>\n",
+       "      <td>improved</td>\n",
+       "      <td>66.67</td>\n",
+       "      <td>improved markedly</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>How often are elections for the counsel held?</td>\n",
+       "      <td>[every four years, four years, every four years.]</td>\n",
+       "      <td>every four years</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>directly every four years</td>\n",
+       "      <td>85.71</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>What term corresponds to the maximum measurement of time across all functions of n?</td>\n",
+       "      <td>[worst-case time complexity, worst-case time complexity, the worst-case time complexity]</td>\n",
+       "      <td>T(n)</td>\n",
+       "      <td>0.00</td>\n",
+       "      <td>worst-case time complexity T(n)</td>\n",
+       "      <td>85.71</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>Communication complexity is an example of what type of measure?</td>\n",
+       "      <td>[Complexity measures, complexity measures, complexity]</td>\n",
+       "      <td>complexity measures</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>complexity theory</td>\n",
+       "      <td>66.67</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>What theorems are responsible for determining questions of time and space requirements?</td>\n",
+       "      <td>[time and space hierarchy theorems, time and space hierarchy theorems, time and space hierarchy theorems]</td>\n",
+       "      <td>time and space hierarchy theorems respectively. They are called hierarchy theorems</td>\n",
+       "      <td>62.50</td>\n",
+       "      <td>time and space hierarchy theorems</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>What countries is corporal punishment still a normal practice?</td>\n",
+       "      <td>[some Asian, African and Caribbean countries, Asian, African and Caribbean, Asian, African and Caribbean]</td>\n",
+       "      <td>Asian, African and Caribbean countries</td>\n",
+       "      <td>90.91</td>\n",
+       "      <td>Asian, African and Caribbean</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>What did Martin Luther's marriage allow?</td>\n",
+       "      <td>[Protestant clergy to marry., Protestant clergy to marry, clerical marriage]</td>\n",
+       "      <td>Protestant clergy</td>\n",
+       "      <td>66.67</td>\n",
+       "      <td>Protestant clergy to marry</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>How did Luther want people to bring about change?</td>\n",
+       "      <td>[trust God's word, trust God's word, love, patience, charity, and freedom]</td>\n",
+       "      <td>trust God's word</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>trust God's word rather than violence</td>\n",
+       "      <td>66.67</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>What art forms did Luther use to connect his hymns?</td>\n",
+       "      <td>[high art and folk music, high art and folk music, singing of German hymns in connection with worship]</td>\n",
+       "      <td>folk music</td>\n",
+       "      <td>57.14</td>\n",
+       "      <td>high art and folk music</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>What does refusing to preach the Ten Commandments not do?</td>\n",
+       "      <td>[eliminate the accusing law, eliminate the accusing law., eliminate the accusing law]</td>\n",
+       "      <td>does not eliminate the accusing law</td>\n",
+       "      <td>75.00</td>\n",
+       "      <td>eliminate the accusing law</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>What other aspect of Luther's life was affected by his health?</td>\n",
+       "      <td>[writings and comments, his writings and comments, writings and comments.]</td>\n",
+       "      <td>writings and comments</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>physical</td>\n",
+       "      <td>0.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>The reasons for the las two counties to be added are based on what?</td>\n",
+       "      <td>[historical political divisions, historical political divisions, historical political divisions]</td>\n",
+       "      <td>demographics and economic ties</td>\n",
+       "      <td>0.00</td>\n",
+       "      <td>historical political divisions</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>Which of the three heavily populated areas has the least number of inhabitants?</td>\n",
+       "      <td>[San Diego, the San Diego area, San Diego]</td>\n",
+       "      <td>El Centro area</td>\n",
+       "      <td>33.33</td>\n",
+       "      <td>San Diego area</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>Where does southern California's megalopolis standard in terms of population nationwide?</td>\n",
+       "      <td>[third, third, third]</td>\n",
+       "      <td>third most populated</td>\n",
+       "      <td>50.00</td>\n",
+       "      <td>third</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>Which conference do the teams in southern California play in?</td>\n",
+       "      <td>[Pac-12, the Pac-12, Pac-12]</td>\n",
+       "      <td>Pac-12</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>Pac-12 Conference</td>\n",
+       "      <td>66.67</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>WHat allows customers to get Sky+ functions if they do not subscribe to BSkyB's channels?</td>\n",
+       "      <td>[monthly fee, a monthly fee, SkyHD box]</td>\n",
+       "      <td>a monthly fee</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>pay a monthly fee</td>\n",
+       "      <td>80.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>WHat allows customers to get Sky+ functions if they do not subscribe to BSkyB's channels?</td>\n",
+       "      <td>[monthly fee, a monthly fee, SkyHD box]</td>\n",
+       "      <td>a monthly fee</td>\n",
+       "      <td>100.00</td>\n",
+       "      <td>pay a monthly fee</td>\n",
+       "      <td>80.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20</th>\n",
+       "      <td>How does Victoria rank as to population density?</td>\n",
+       "      <td>[most densely populated, most, most densely populated state]</td>\n",
+       "      <td>second-most populous</td>\n",
+       "      <td>0.00</td>\n",
+       "      <td>second-most populous state overall</td>\n",
+       "      <td>25.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>21</th>\n",
+       "      <td>How much does Victoria produce in Australian pears?</td>\n",
+       "      <td>[90%, 90%, 90%]</td>\n",
+       "      <td>nearly 90%</td>\n",
+       "      <td>66.67</td>\n",
+       "      <td>90%</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22</th>\n",
+       "      <td>What did the Soviets intend to use in spacecraft after the success of Zond 5?</td>\n",
+       "      <td>[human cosmonauts, human cosmonauts, human cosmonauts, human cosmonauts]</td>\n",
+       "      <td>animals</td>\n",
+       "      <td>0.00</td>\n",
+       "      <td>human cosmonauts</td>\n",
+       "      <td>100.00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>23</th>\n",
+       "      <td>What was the reason the Italian Constitutional court gave that resulted in Mr. Costa losing his his claim against ENEL?</td>\n",
+       "      <td>[nationalisation law was from 1962, and the treaty was in force from 1958, because the nationalisation law was from 1962, and the treaty was in force from 1958, Costa had no claim, because the nationalisation law was from 1962, and the treaty was in force from 1958, Costa had no claim, because the nationalisation law was from 1962, and the treaty was in force from 1958, Costa had no claim]</td>\n",
+       "      <td>nationalisation of the Italian energy corporations</td>\n",
+       "      <td>11.76</td>\n",
+       "      <td>the Treaty conflicted with national law</td>\n",
+       "      <td>23.53</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                                                                                                   Question  \\\n",
+       "0                                                                                What city did Super Bowl 50 take place in?   \n",
+       "1                                                                                     Who is the head coach of the Broncos?   \n",
+       "2                                                                                   How did the drive end for the Panthers?   \n",
+       "3                                 Warsaw's sidewalks and sanitation facilities are some examples of things which have what?   \n",
+       "4                                                                             How often are elections for the counsel held?   \n",
+       "5                                      What term corresponds to the maximum measurement of time across all functions of n?    \n",
+       "6                                                           Communication complexity is an example of what type of measure?   \n",
+       "7                                   What theorems are responsible for determining questions of time and space requirements?   \n",
+       "8                                                            What countries is corporal punishment still a normal practice?   \n",
+       "9                                                                                  What did Martin Luther's marriage allow?   \n",
+       "10                                                                        How did Luther want people to bring about change?   \n",
+       "11                                                                      What art forms did Luther use to connect his hymns?   \n",
+       "12                                                                What does refusing to preach the Ten Commandments not do?   \n",
+       "13                                                           What other aspect of Luther's life was affected by his health?   \n",
+       "14                                                      The reasons for the las two counties to be added are based on what?   \n",
+       "15                                          Which of the three heavily populated areas has the least number of inhabitants?   \n",
+       "16                                 Where does southern California's megalopolis standard in terms of population nationwide?   \n",
+       "17                                                            Which conference do the teams in southern California play in?   \n",
+       "18                                WHat allows customers to get Sky+ functions if they do not subscribe to BSkyB's channels?   \n",
+       "19                                WHat allows customers to get Sky+ functions if they do not subscribe to BSkyB's channels?   \n",
+       "20                                                                         How does Victoria rank as to population density?   \n",
+       "21                                                                      How much does Victoria produce in Australian pears?   \n",
+       "22                                            What did the Soviets intend to use in spacecraft after the success of Zond 5?   \n",
+       "23  What was the reason the Italian Constitutional court gave that resulted in Mr. Costa losing his his claim against ENEL?   \n",
+       "\n",
+       "                                                                                                                                                                                                                                                                                                                                                                                                      Answer  \\\n",
+       "0                                                                                                                                                                                                                                                                                                                                                                    [Santa Clara, Santa Clara, Santa Clara]   \n",
+       "1                                                                                                                                                                                                                                                                                                                                                                         [Gary Kubiak, Gary Kubiak, Kubiak]   \n",
+       "2                                                                                                                                                                                                                                                                                                                                                                          [punt, Newton was sacked, sacked]   \n",
+       "3                                                                                                                                                                                                                                                                                                                                                  [improved markedly, improved markedly, improved markedly]   \n",
+       "4                                                                                                                                                                                                                                                                                                                                                          [every four years, four years, every four years.]   \n",
+       "5                                                                                                                                                                                                                                                                                                                   [worst-case time complexity, worst-case time complexity, the worst-case time complexity]   \n",
+       "6                                                                                                                                                                                                                                                                                                                                                     [Complexity measures, complexity measures, complexity]   \n",
+       "7                                                                                                                                                                                                                                                                                                  [time and space hierarchy theorems, time and space hierarchy theorems, time and space hierarchy theorems]   \n",
+       "8                                                                                                                                                                                                                                                                                                  [some Asian, African and Caribbean countries, Asian, African and Caribbean, Asian, African and Caribbean]   \n",
+       "9                                                                                                                                                                                                                                                                                                                               [Protestant clergy to marry., Protestant clergy to marry, clerical marriage]   \n",
+       "10                                                                                                                                                                                                                                                                                                                                [trust God's word, trust God's word, love, patience, charity, and freedom]   \n",
+       "11                                                                                                                                                                                                                                                                                                    [high art and folk music, high art and folk music, singing of German hymns in connection with worship]   \n",
+       "12                                                                                                                                                                                                                                                                                                                     [eliminate the accusing law, eliminate the accusing law., eliminate the accusing law]   \n",
+       "13                                                                                                                                                                                                                                                                                                                                [writings and comments, his writings and comments, writings and comments.]   \n",
+       "14                                                                                                                                                                                                                                                                                                          [historical political divisions, historical political divisions, historical political divisions]   \n",
+       "15                                                                                                                                                                                                                                                                                                                                                                [San Diego, the San Diego area, San Diego]   \n",
+       "16                                                                                                                                                                                                                                                                                                                                                                                     [third, third, third]   \n",
+       "17                                                                                                                                                                                                                                                                                                                                                                              [Pac-12, the Pac-12, Pac-12]   \n",
+       "18                                                                                                                                                                                                                                                                                                                                                                   [monthly fee, a monthly fee, SkyHD box]   \n",
+       "19                                                                                                                                                                                                                                                                                                                                                                   [monthly fee, a monthly fee, SkyHD box]   \n",
+       "20                                                                                                                                                                                                                                                                                                                                              [most densely populated, most, most densely populated state]   \n",
+       "21                                                                                                                                                                                                                                                                                                                                                                                           [90%, 90%, 90%]   \n",
+       "22                                                                                                                                                                                                                                                                                                                                  [human cosmonauts, human cosmonauts, human cosmonauts, human cosmonauts]   \n",
+       "23  [nationalisation law was from 1962, and the treaty was in force from 1958, because the nationalisation law was from 1962, and the treaty was in force from 1958, Costa had no claim, because the nationalisation law was from 1962, and the treaty was in force from 1958, Costa had no claim, because the nationalisation law was from 1962, and the treaty was in force from 1958, Costa had no claim]   \n",
+       "\n",
+       "                                                                       FP32 prediction  \\\n",
+       "0                                                              Santa Clara, California   \n",
+       "1                                                                          Gary Kubiak   \n",
+       "2          The Panthers could not gain any yards with their possession and had to punt   \n",
+       "3                                                                             improved   \n",
+       "4                                                                     every four years   \n",
+       "5                                                                                 T(n)   \n",
+       "6                                                                  complexity measures   \n",
+       "7   time and space hierarchy theorems respectively. They are called hierarchy theorems   \n",
+       "8                                               Asian, African and Caribbean countries   \n",
+       "9                                                                    Protestant clergy   \n",
+       "10                                                                    trust God's word   \n",
+       "11                                                                          folk music   \n",
+       "12                                                 does not eliminate the accusing law   \n",
+       "13                                                               writings and comments   \n",
+       "14                                                      demographics and economic ties   \n",
+       "15                                                                      El Centro area   \n",
+       "16                                                                third most populated   \n",
+       "17                                                                              Pac-12   \n",
+       "18                                                                       a monthly fee   \n",
+       "19                                                                       a monthly fee   \n",
+       "20                                                                second-most populous   \n",
+       "21                                                                          nearly 90%   \n",
+       "22                                                                             animals   \n",
+       "23                                  nationalisation of the Italian energy corporations   \n",
+       "\n",
+       "    FP32 F1                          INT8 prediction  INT8 F1  \n",
+       "0     80.00                              Santa Clara   100.00  \n",
+       "1    100.00                                 John Fox     0.00  \n",
+       "2     14.29                              had to punt    50.00  \n",
+       "3     66.67                        improved markedly   100.00  \n",
+       "4    100.00                directly every four years    85.71  \n",
+       "5      0.00          worst-case time complexity T(n)    85.71  \n",
+       "6    100.00                        complexity theory    66.67  \n",
+       "7     62.50        time and space hierarchy theorems   100.00  \n",
+       "8     90.91             Asian, African and Caribbean   100.00  \n",
+       "9     66.67               Protestant clergy to marry   100.00  \n",
+       "10   100.00    trust God's word rather than violence    66.67  \n",
+       "11    57.14                  high art and folk music   100.00  \n",
+       "12    75.00               eliminate the accusing law   100.00  \n",
+       "13   100.00                                 physical     0.00  \n",
+       "14     0.00           historical political divisions   100.00  \n",
+       "15    33.33                           San Diego area   100.00  \n",
+       "16    50.00                                    third   100.00  \n",
+       "17   100.00                        Pac-12 Conference    66.67  \n",
+       "18   100.00                        pay a monthly fee    80.00  \n",
+       "19   100.00                        pay a monthly fee    80.00  \n",
+       "20     0.00       second-most populous state overall    25.00  \n",
+       "21    66.67                                      90%   100.00  \n",
+       "22     0.00                         human cosmonauts   100.00  \n",
+       "23    11.76  the Treaty conflicted with national law    23.53  "
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pd.set_option(\"display.max_colwidth\", None)\n",
+    "df = pd.DataFrame(\n",
+    "    results,\n",
+    "    columns=[\"Question\", \"Answer\", \"FP32 prediction\", \"FP32 F1\", \"INT8 prediction\", \"INT8 F1\"],\n",
+    ")\n",
+    "df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58df445d-af43-4ba1-8195-7d8f00b8f82f",
+   "metadata": {},
+   "source": [
+    "### Model Size\n",
+    "\n",
+    "We save the FP32 and INT8 models to a temporary directory and define a function to show the model size for the PyTorch and OpenVINO models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "1eeaa81f-7fc5-49ba-80b8-2d95a1310a0c",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:15:32.568828Z",
+     "iopub.status.busy": "2023-03-12T22:15:32.568517Z",
+     "iopub.status.idle": "2023-03-12T22:15:33.105047Z",
+     "shell.execute_reply": "2023-03-12T22:15:33.104554Z",
+     "shell.execute_reply.started": "2023-03-12T22:15:32.568805Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "FP32 model size: 435.64 MB\n",
+      "INT8 model size: 76.57 MB\n",
+      "INT8 size decrease: 5.69x\n"
+     ]
+    }
+   ],
+   "source": [
+    "def get_model_size(model_folder, framework):\n",
+    "    \"\"\"\n",
+    "    Return OpenVINO or PyTorch model size in Mb.\n",
+    "    Arguments:\n",
+    "        model_folder:\n",
+    "            Directory containing a pytorch_model.bin for a PyTorch model, and an openvino_model.xml/.bin for an OpenVINO model.\n",
+    "        framework:\n",
+    "            Define whether the model is a PyTorch or an OpenVINO model.\n",
+    "    \"\"\"\n",
+    "    if framework.lower() == \"openvino\":\n",
+    "        model_path = Path(model_folder) / \"openvino_model.xml\"\n",
+    "        model_size = model_path.stat().st_size + model_path.with_suffix(\".bin\").stat().st_size\n",
+    "    elif framework.lower() == \"pytorch\":\n",
+    "        model_path = Path(model_folder) / \"pytorch_model.bin\"\n",
+    "        model_size = model_path.stat().st_size\n",
+    "    model_size /= 1000 * 1000\n",
+    "    return model_size\n",
+    "\n",
+    "\n",
+    "with tempfile.TemporaryDirectory() as fp32_model_dir:\n",
+    "    fp32_model.save_pretrained(fp32_model_dir)\n",
+    "    fp32_model_size = get_model_size(fp32_model_dir, \"pytorch\")\n",
+    "\n",
+    "with tempfile.TemporaryDirectory() as int8_model_dir:\n",
+    "    int8_model.save_pretrained(int8_model_dir)\n",
+    "    int8_model_size = get_model_size(int8_model_dir, \"openvino\")\n",
+    "\n",
+    "print(f\"FP32 model size: {fp32_model_size:.2f} MB\")\n",
+    "print(f\"INT8 model size: {int8_model_size:.2f} MB\")\n",
+    "print(f\"INT8 size decrease: {fp32_model_size / int8_model_size:.2f}x\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f8697a2-0a51-427f-8245-cda56bb8cf18",
+   "metadata": {},
+   "source": [
+    "### Benchmarks\n",
+    "\n",
+    "Compare the inference speed of the quantized OpenVINO model with that of the original PyTorch model by comparing the 'total_time_in_seconds' measured by the `evaluator` we used in the previous section to compute accuracy.\n",
+    "\n",
+    "This benchmark provides an estimate of performance, but keep in mind that other programs running on the computer, as well as power management settings, can affect performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "99ac094c-48a1-40d4-86b7-bfc5c6da78a2",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2023-03-12T22:15:33.105827Z",
+     "iopub.status.busy": "2023-03-12T22:15:33.105634Z",
+     "iopub.status.idle": "2023-03-12T22:15:33.110537Z",
+     "shell.execute_reply": "2023-03-12T22:15:33.109767Z",
+     "shell.execute_reply.started": "2023-03-12T22:15:33.105812Z"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Device: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz\n",
+      "INT8 speedup vs FP32 model: 2.71x\n"
+     ]
+    }
+   ],
+   "source": [
+    "device_name = Core().get_property(ov_qa_pipeline.model._device, \"FULL_DEVICE_NAME\")\n",
+    "int8_speedup = hf_eval_results[\"total_time_in_seconds\"] / ov_eval_results[\"total_time_in_seconds\"]\n",
+    "print(f\"Device: {device_name}\")\n",
+    "print(f\"INT8 speedup vs FP32 model: {int8_speedup:.2f}x\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  },
+  "widgets": {
+   "application/vnd.jupyter.widget-state+json": {
+    "state": {},
+    "version_major": 2,
+    "version_minor": 0
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}