In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
"
+ ],
+ "text/plain": [
+ "XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,\n",
+ " colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,\n",
+ " early_stopping_rounds=None, enable_categorical=False,\n",
+ " eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise',\n",
+ " importance_type=None, interaction_constraints='',\n",
+ " learning_rate=0.300000012, max_bin=256, max_cat_to_onehot=4,\n",
+ " max_delta_step=0, max_depth=6, max_leaves=0, min_child_weight=1,\n",
+ " missing=nan, monotone_constraints='()', n_estimators=100,\n",
+ " n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0,\n",
+ " reg_alpha=0, reg_lambda=1, ...)"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "path_url = 'https://github.com/mattharrison/datasets/raw/master/data/kaggle-survey-2018.zip'\n",
+ "file_name = 'kaggle-survey-2018.zip'\n",
+ "dataset = 'multipleChoiceResponses.csv'\n",
+ "\n",
+ "raw = extract_dataset(path_url, file_name, dataset)\n",
+ "\n",
+ "# Create raw X and raw y\n",
+ "kag_X, kag_y = prepX_y(raw, 'Q6')\n",
+ "\n",
+ "# Split data\n",
+ "kag_X_train, kag_X_test, kag_y_train, kag_y_test = (model_selection\n",
+ " .train_test_split(kag_X, \n",
+ " kag_y, \n",
+ " test_size=.3, \n",
+ " random_state=42, \n",
+ " stratify=kag_y)\n",
+ " )\n",
+ "\n",
+ "\n",
+ "# Transform X with pipeline\n",
+ "pline = pipeline.Pipeline(\n",
+ " [('tweak', PrepDataTransformer()),\n",
+ " ('cat', encoding.OneHotEncoder(top_categories=5, drop_last=True,\n",
+ " variables=['Q1', 'Q3', 'major'])),\n",
+ " ('num_impute', imputation.MeanMedianImputer(imputation_method='median',\n",
+ " variables=['education', 'years_exp']))]\n",
+ " )\n",
+ "\n",
+ "X_train = pline.fit_transform(kag_X_train)\n",
+ "X_test = pline.transform(kag_X_test)\n",
+ "\n",
+ "# Transform y with label encoder\n",
+ "label_encoder = preprocessing.LabelEncoder()\n",
+ "label_encoder.fit(kag_y_train)\n",
+ "y_train = label_encoder.transform(kag_y_train)\n",
+ "y_test = label_encoder.transform(kag_y_test)\n",
+ "\n",
+ "# Combined Data for cross validation/etc\n",
+ "X = pd.concat([X_train, X_test], axis='index')\n",
+ "y = pd.Series([*y_train, *y_test], index=X.index)\n",
+ "\n",
+ "# Default training\n",
+ "xg = xgb.XGBClassifier()\n",
+ "xg.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "4adb845e-45b1-4c65-a986-1a5f8faa9668",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.7337016574585635"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn import linear_model, preprocessing\n",
+ "std = preprocessing.StandardScaler()\n",
+ "lr = linear_model.LogisticRegression(penalty='none')\n",
+ "lr.fit(std.fit_transform(X_train), y_train)\n",
+ "lr.score(std.transform(X_test), y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "076c7d39-5f33-481e-a6b8-f40397e6fe1b",
+ "metadata": {},
+ "source": [
+ "Seems like the logistic regression gave similar results as the xgboost model. We would use the simpler model instead. Let us look more closely at the weights of the model using the `'.coef_'`attribute. Following scikit learn’s convention of adding an\n",
+ "underscore to the attribute learned while training the model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "52d900fc-1061-4513-8af4-525d78434671",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[-1.59051061e-01, -4.02827023e-01, 6.06203075e-01,\n",
+ " -1.45922113e-01, -8.12626418e-02, -6.05916337e-01,\n",
+ " 3.18051156e-02, 3.15043497e-02, -3.13556662e-02,\n",
+ " -4.75174987e-04, -8.04790381e-03, -5.22558905e-02,\n",
+ " -5.08749356e-03, 1.01987638e-01, 3.50075802e-01,\n",
+ " -1.79333801e-01, 2.44937585e-02, -3.38714464e-01]])"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "lr.coef_"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "051e217f-914b-4845-9e3d-77b41a2b2c59",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(8, 4))\n",
+ "\n",
+ "(pd.Series(lr.coef_[0], index=X_train.columns)\n",
+ ".sort_values()\n",
+ ".plot.barh(ax=ax)\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2b51f56c-55b6-4db8-b2ac-2bc6d04194b0",
+ "metadata": {},
+ "source": [
+ "The wider the bar, the higher the impact of the feature. Positive values push towards\n",
+ "the positive label (or Software Engineer). Negative labels push towards the negative\n",
+ "label (or Data Scientist). The 'years_exp' (years of experience) column correlates with software engineering,\n",
+ "and using the R language (found in the 'r' column) correlates with data science. Also, the 'Q1_Prefer not to say' feature\n",
+ "does not have much impact on this model.\n",
+ "\n",
+ "Maybe we should make an even simpler model that only considers features that have an _absolute_ value above 0.2.\n",
+ "\n",
+ "## 17.2 Decision Tree Interpretation\n",
+ "\n",
+ "Another white box model. Let us train one with a depth of 7."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "a7d09b59-6b6e-4bda-92e9-7fdf9c95aec2",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.7337016574585635"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tree7 = tree.DecisionTreeClassifier(max_depth=7)\n",
+ "tree7.fit(X_train, y_train)\n",
+ "tree7.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "d1c4c0d1-63b8-4f61-8d39-046ec38adae7",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([0.05822293, 0.09703777, 0.16172175, 0.08914157, 0.00303678,\n",
+ " 0.28834526, 0.01570148, 0.00530231, 0.00684056, 0.00372007,\n",
+ " 0. , 0.05648182, 0.00414392, 0.0060086 , 0.17292726,\n",
+ " 0.00133682, 0.01253835, 0.01749275])"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tree7.feature_importances_"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "24e5df60-d1c7-42df-8e45-21d56acd4cdb",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlsAAAD3CAYAAAAqhJHNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAABDLklEQVR4nO3deWBOd9r/8fcdSRAJQQhKW4laKo/doEpq66h1mhHEoLYxWjJFLdGIxhaUYppW6GgatJbQtE9FtWoZ2s5QW2uZEjsVJCVBEtnk/P7wcz9NZTkike3z+ivOfZbvuXJal+859/lYDMMwEBEREZECYVPYAxAREREpydRsiYiIiBQgNVsiIiIiBUjNloiIiEgBsi3sAUjxlZGRQWJiInZ2dlgslsIejoiISIEyDIO0tDQqVKiAjY35+So1W5JniYmJREVFFfYwREREHqv69evj5ORken01W5JndnZ2wL2Lzt7evpBHU3QdO3YMDw+Pwh5Gkac6maM6maM65U41Mue3dUpNTSUqKsr6959ZarYkz+7fOrS3t6ds2bKFPJqiTfUxR3UyR3UyR3XKnWpkzu/r9LCPzqjZkkfmPvczriSmFfYwira1/y3sERQPqpM5qpM5qlPuSniN7r4zpLCHAOjbiCIiIiIFSs2WiIiISAHSbUTJUkREBJ9++ikZGRn8/e9/p127doU9JBERkYdy8ODBIrEfNVuSrYoVKxISElLYwxAREcmTli1bPvI+Dh48aN1PSkoKx44de+h96DaiZKtu3bqFPQQREZFiTzNbki2zb8c94/+yvj6cg9/+q0iypzqZozqZozrlTjV6fDSzJSIiIlKANLMlWfLy8irsIYiIiJQImtkSERERKUBqtkREREQKkJotERERkQKkZ7bkkSkb0YQSnj+Wb1Qnc4pAnYpK5pxIcaCZrWJoz549bNiwobCHISIiIiZoZqsY6tixY2EPQURERExSs1XIIiIi2LVrF8nJycTGxjJ06FB27NjBqVOnmDJlClevXmXbtm2kp6fj5OREcHAwkZGRnD17lkmTJhEaGsqWLVuwtbWlVatWTJ48meDgYA4fPkxSUhJz587F3d39geNev34dPz8/bt++jWEYLFiwgOvXr7NgwQJsbW2pWLEiixYtwtHRsRCqIiJFXX5lzhWk4jDGwqYamaNsxBIgMTHR2jSFhYURHh7Ovn37CAsLw8PDg7CwMGxsbBg5ciRHjx61bnfy5Em2bt3K+vXrsbW1xdfXl127dgHg5ubG9OnTsz1mSEgInTt3xsfHh//85z8cOXKEn3/+mW7dujFy5Eh27tzJrVu31GyJSJaK+pvH9Xb03KlG5uRHNqKarSKgUaNGADg5OeHu7o7FYqFSpUqkpaVhZ2fHxIkTcXBw4OrVq6Snp1u3O3v2LE2bNsXOzg6AVq1acerUKSD3XMNz587Rr18/ANq1aweAp6cny5cv55VXXsHV1ZUmTZrk+7mKiIiUNnpAvgiwWCxZLk9LS2P79u0sXbqUgIAAMjIyMAzD+rmbmxtHjhwhPT0dwzDYv3+/tcnKLdfQ3d3dOku2f/9+Fi5cyObNm3n55ZdZs2YNzzzzDOHh4fl0hiIiIqWXZraKMFtbW8qXL4+Xlxf29vZUq1aNmJgY6+cNGjTgpZdewsfHh4yMDFq2bEnXrl05ceJErvseM2YMb775Jl988QUAQUFB3LhxAz8/PxwcHLCzs2PWrFmmxqkg6pxpqt4c1ckc1Umk+FGzVch+m0HYsWNH6zcNGzVqRGhoaJbbhIeHW28dDh8+nOHDh2f63NfXN9fjVqlSheXLl2daVqdOHSIiIh5q/CIiIpIzNVvFzO7du1m9ejWBgYGm1h83bhw3b97MtMzR0ZGQkJACGJ2IiIj8npqtYsbT0xNPT0/T67/33nsFOBoRERHJjR6QFxERESlAarZERERECpBuI8ojUxC1CUUgOLhYUJ3MecQ6KURa5PHSzJaIiIhIAdLM1mP0xhtv0Lt3b1544QXOnDnDggULcHFx4cKFC2RkZDB+/HjatGnDV199xSeffGLd7h//+AenTp1i0aJF2NnZ0b9/f86dO8fevXvJyMigZ8+eDBs2LNvjrlmzhsjISCwWCz169GDo0KH8/e9/p3379vTp04dBgwYxd+5cVq9ejWEYXLlyhaSkJBYsWJBlrqKIiIiYp2brMfL29mbdunW88MILbNq0iebNm5OQkEBQUBBxcXEMHjyYLVu2cP78eT744APKly/PjBkz+O6773B1dSUlJYWNGzcC976V+PHHH+Pq6prju7FOnz7Nl19+ydq1a7FYLAwbNoznn3+eOXPmMGjQIL777jsGDBjAs88+C9x719aCBQvYvXs3CxcufOBdXCJS/JWW8OHScp6PQjUyR0HUxUibNm2YO3cu169f5/vvv6d58+YcOnSII0eOAJCenk5cXBxVq1Zl6tSpVKhQgbNnz9KsWTMgc97h4sWLWbx4Mb/++isdOnTI9phRUVFER0dbZ75u3rzJxYsXcXNzo0+fPnz00UcsWrTIun7btm0BaN68OUFBQflcAREpCkrDG+j1pv3cqUbmKIi6mLFYLPTu3Zu5c+fSvn17atasSc2aNRkzZgzJycmEhIRga2vLu+++y7/+9S/g3hvi7+ch3s87TE1N5auvvmLx4sUYhkHPnj3p2bMnTzzxxAPHdHNzo169eqxcuRKLxUJYWBj169fn0qVLbNmyhSFDhrBgwQJmzJgBwPHjx2nVqhWHDh3imWeeeTyFERERKcHUbD1mXl5evPDCC/zv//4vderUYfr06QwePJiEhAQGDRqEo6MjLVq04OWXX8bBwYGKFSsSExND7dq1rfuwt7enUqVK9O3bl0qVKtG+fXtq1aqV5fEaNmxIu3bt8PHxITU1lSZNmuDi4sKQIUOYPn06rVq1YtiwYWzfvh2APXv2sGPHDjIyMpg3b56pc1I2Ys70r0dzVCdzVCeR4kfN1mN29+5dWrZsaX3w/O23335gnX/84x9ZbtumTRvrz+PGjWPcuHGmjjlq1ChGjRqVadmGDRusP69atQqA7du388orr1jzGUVEROTRqdl6jL7++mvee+895s6dm+/73rBhA5GRkQ8snzhxIs2bN8/344mIiIg5arYeoz/+8Y/88Y9/LJB9DxgwgAEDBjzSPubPn59PoxEREZH79FJTERERkQKkZktERESkAOk2YhG0aNEi3Nzc8PLyeuR9ffPNNzRp0gQbGxvef/99AgMDH32Av6NsRBOU+fcA5fOJSGmhma0SbvXq1SQkJFCtWrUCabREREQkZ5rZKgRpaWm89dZbmTIR4+PjCQkJoUqVKqSlpeHm5sa+fftYv349S5YsAaB9+/Z8//33nD9/nunTp5OWlka5cuVYsmQJv/76K/PnzycjI4Nbt24xffp0bt26xc8//8zUqVNZuHAhU6dOJTw8nO+//56lS5dStmxZnJ2dCQoK4ueff+af//wndnZ2/PLLL/To0YNXX321kCslIiJS/KnZKgQbN26kcuXKmTIR7+ceOjs7M3r06By3X7BgAaNHj6Zjx458+eWX/Pe//+XWrVtMnTqVBg0asHnzZiIiIpgzZw6NGjUiMDAQOzs7AAzDICAggHXr1uHq6sqqVasICQnhhRdeIDo6mi+++ILU1FQ6dOigZksKVFZZY8ppM0d1Mkd1yp1qZI6yEYuhqKgoDh48aM1EvHPnDgCVK1cGyPa9WPdje86dO2ddp0ePHgAcOHCAZcuWUa5cORITE3F0dMxyH3FxcTg6OuLq6gpA69atWbx4MS+88AL169fH1tYWW1tbypUrl09nK5K1378FXW9GN0d1Mkd1yp1qZI6yEYspNzc3atSokSkTMTIykhs3blClShWOHj1KjRo1KFu2LLGxsQBcvnyZmzdvAuDu7s7Ro0d57rnn+OKLL7h58yYREREsWrQId3d33n33XS5fvgzcy2O836TBvYYuISGBmJgYqlevzg8//MDTTz9tXVdERETyl5qtQjBw4MAHMhHnzZvHyJEjqVSpEra2934tHh4eODk54e3tjbu7uzUfccqUKcyYMYOQkBDKlSvHwoULSU9P57XXXqNq1arUqFGDuLg44N4s2ZQpU5g9ezZwr6GaM2cOvr6+WCwWKlWqxLx58zh16lSez0fZiDnTvx5FREo3i/HbaQ+Rh3B/OtXDw0PNVg7UbJmjOpmjOpmjOuVONTInq9uID/v3nl79ICIiIlKA1GyJiIiIFCA1WyIiIiIFSM2WiIiISAHStxFLOGUjFhGlIBtRWYciIlnTzFYJp2xEERGRwqWZLZOSk5OZNm0a0dHRpKWl8eabb7JhwwYuXbrE3bt3GT58OD169GDIkCE0aNCAU6dO4eDgQKtWrfjuu++4desWoaGh7Nixgx07dpCQkEBcXBxjx47lj3/8Iz/88ANLliyhTJky1KlTh1mzZrF582Z2795NcnIyFy9e5K9//SteXl588sknfP7559jY2NCiRQumTp1KVFSUshFFRESKIDVbJq1fv54nnniCJUuWEBUVxfbt26lcuTILFy4kISEBLy8v2rZtC0CTJk2YPn06I0eOpFy5cnz00UdMnTqV/fv3A5CUlMRHH33EjRs38Pb2pnPnzgQEBLB27VqqVq3K0qVL+eyzz7C1tSUhIYEPP/yQ8+fPM2bMGLy8vIiIiCAgIIBmzZqxdu1a0tPTOX36tLIRpVDlR8aactrMUZ3MUZ1ypxqZo2zEx+Ts2bN07NgRgPr167Nu3Tqee+45ABwdHXF3d+fSpUsANG7cGICKFStSr149688pKSnAvTxCGxsbXFxcqFixIjExMcTExDB+/Hjg3ixa+/btefLJJ2nYsCEANWvWJDU1FYB58+YRGhrKokWLaNasGYZhUL16dWUjSqF61Jcj6gWL5qhO5qhOuVONzFE24mN0P4+wa9euXLp0iS1btmBvb0+3bt1ISEggKirKGqeTm+PHjwPw66+/kpCQQI0aNahRowbLli3DycmJHTt24ODgwJUrV7LMKwwPD2fmzJmULVuWkSNHcvjwYebNm6dsRBERkSJIzZZJAwcO5M0332Tw4MHcvXuXlStX8sknn+Dj40NKSgrjxo2jatWqpvb166+/8sorr3D79m3eeustypQpg7+/P6NHj8YwDCpUqMDbb7/NlStXsty+QYMG9OvXj8qVK+Pq6krTpk3p06ePshGLKP3rUUSkdFM24mMWERHB2bNnmTRpUmEP5ZEpG9EcNVvmqE7mqE7mqE65U43MUTaiiIiISBGn24iPmZeXV2EPQURERB4jzWyJiIiIFCA1WyIiIiIFSLcR5ZEpG9GEh8xGVM6giEjJoZktERERkQKkma0SLiEhAX9/f27fvk1cXBze3t54eHgwc+ZMKlSoQNWqVSlbtizz589nzZo1REZGYrFY6NGjB0OHDi3s4YuIiBR7arZKuAsXLtCzZ09efPFFrl27xpAhQ6wvTX3mmWdYsmQJ165d4/Tp03z55ZesXbsWi8XCsGHDeP7553FzcyvsUyiVSmteWWk974elOpmjOuVONTJH2YiSIxcXF1atWsW2bdtwdHQkPT2dmJgYnnnmGeBent2XX35JVFQU0dHRDBs2DICbN29y8eJFNVuFpDS+aFAvWDRHdTJHdcqdamSOshElV6GhoTRr1oxBgwaxd+9edu/eTY0aNTh9+jT16tXjp59+AsDNzY169eqxcuVKLBYLYWFh1K9fv5BHLyIiUvyp2SrhOnXqRGBgIJs3b8bZ2ZkyZcowY8YM3nzzTRwcHLCzs8PV1ZWGDRvSrl07fHx8SE1NpUmTJri6upo6hrIRc6Z/PYqIlG5qtkq4tm3b8tVXX2Va9sknn7B8+XKqVKnCkiVLsLOzA2DUqFGMGjWqMIYpIiJSYqnZKoWqVq3KiBEjcHBwwMnJifnz5xf2kEREREosNVulUPfu3enevXthD0NERKRU0EtNRURERAqQmi0RERGRAqTbiPLIlI1ogolsROUhioiUTLnObB07dowRI0bg4+PDwIEDWbJkCampqQCcPn3aujwwMJC7d+9mu5+IiAgWLVqUadmECRPYt29fttvMnTuX6Oho4uPj2bx5s9lzon///vzyyy+Zlt24cQNfX19GjhzJiBEjmD59OsnJyQBs2LCBtLTsm4Xo6Gh27txp+vhmHTlyhJ49e/LOO+888NmYMWMYM2ZMvh/zfk1FRETk8cix2bp69SqTJ08mICCAdevWsW7dOuzs7Jg3bx4AixcvZuLEiaxfv57k5OR8b0j8/f2pVasWJ0+efOR9r1y5kueee44PP/yQ0NBQypcvz/r16wFYsWIFGRkZ2W67d+9eDh069EjHz8p3333HwIEDeeONNzItv3LlCklJSdy8eZNLly7l6zHv11REREQejxxvI37++ed4e3tTt25dACwWC2PHjqVLly4kJycTHBxMmTJlSE1NJTY2lqpVq+ZpEBEREezevZvk5GQuXrzIX//6V7y8vBgyZAiBgYEsX76cEydOsGHDBjp27EhAQAApKSmULVuW2bNnU7NmTZYsWcK3335LjRo1iIuLe+AYTzzxBF9//TVPPfUULVq0YOrUqVgsFjZu3EhsbCwTJkwgODiYGTNmcPXqVeLi4ujYsSO+vr588MEHJCcn07x5c2rXrs2cOXMAcHZ2JigoiLS0NMaPH49hGKSlpTFz5kwaNGhgPXZaWhpvvvkmly5d4u7duwwfPpzatWuzadMm7OzsqFGjBt26dbOuv2nTJrp06UK5cuVYu3YtU6dOBaBbt240b96cCxcu0LZtW27fvs2RI0eoW7cuCxcu5MqVKw/U5u7du7z66qs4OzvTsWNH9uzZQ2BgIM7Ozvj5+XH79m0Mw2DBggWUK1eOwMBAUlJSiI+PZ+zYsXTt2jVPv1N5eMooUw3MUp3MUZ1ypxqZU6DZiNHR0XTo0CHTMovFgouLC7GxsdSpU4fLly8zfPhwHB0drU3Zw7BYLAAkJCTw4Ycfcv78ecaMGYOXl5d1nTFjxrB+/XoGDBjA+PHjGTJkCJ6envznP/9h0aJF/O1vf2P//v1s2rSJpKQkXnzxxQeO4+PjQ9myZfnwww95/fXXadmyJW+99Rbe3t6EhISwZMkSrly5QrNmzfD29iYlJYWOHTsyfvx4Ro8ezdmzZ+nSpQv9+/cnKCiIevXqsXHjRlauXEnz5s1xcnLinXfe4fTp0yQkJGQ69oYNG6hcuTILFy4kISEBLy8v1q9fz8svv4yLi0umRisjI4PIyEg2bNiAra0tPXv25PXXX6dcuXJcvnyZVatWUa1aNf7whz+wceNGAgIC6NKlC7du3WLBggUP1GbChAnExsby6aefYm9vz549ewAICQmhc+fO+Pj48J///IcjR47g4uLC8OHDadOmDYcOHSI4OFjN1mNU2t8yrzftm6M6maM65U41MqfAsxFr1ar1wG2sjIwMoqOjrbNYTzzxBNu2bWPjxo3Mnz+fBQsWZLmvcuXKWZ/1ui8pKYly5coB0LBhQwBq1qz5wHq/FRUVxYoVK1i5ciWGYWBnZ8fp06fx8PDAxsYGR0fHLDP99u3bx5/+9Cf69etHamoq//znPwkKCiI4ONi6jrOzM0ePHmXv3r04OjpmOY4zZ84wc+ZM4N6MVd26denYsSPnz5/ntddew9bWlldfffWBbZ577jkAHB0dcXd3z/b24LfffktiYqL11mJGRgabN2/G29sbZ2dn6y1ABwcH6tWrB4CTkxMpKSlZ1gagdu3a2NvbZzrOuXPn6NevHwDt2rUD4NSpU4SEhLBp0yYsFgvp6enZ/h5ERETEnBybrb59+zJixAg6d+5MlSpVGD9+PK6urnTq1AkHBwfGjBmDn58fTz/9NBUqVMDGJvtHwBo2bMiyZctITEykQoUKxMfHc+rUKdzd3Tl9+rR1hisrNjY21meq3NzcGDFiBC1atODMmTPs37+funXrsnr1ajIyMkhOTub06dMP7GPVqlVcunSJ/v37Y29vzzPPPMPZs2eBe7NrGRkZRERE4OTkxKxZs7hw4QLh4eEYhpHp+HXr1mXBggXUqlWLgwcPEhsby759+6hevTqhoaEcPnyYxYsXs2bNGuux3d3dOXDgAN26dSMhIYGoqChq166d5blu2rSJOXPm8MILLwD3Ouo5c+bg7e2dY42yq839+v2eu7s7R48epWHDhuzfv59//etfXLhwAW9vbzw9Pfn000/57LPPcjzefcpGzJn+9SgiUrrl2GzVrFmThQsXMnv2bBITE0lOTsbGxgYXFxfi4+MZPXo0fn5+2NnZUb58eeuzTFlxc3Nj0KBBDBo0iAoVKpCeno6/vz8VKlTIdZBPPvkkUVFRhIWFMXXqVOtzRcnJyfj7+9OoUSO6d+9Ov379qF69epbPjs2cOZOZM2eydu1aypUrR+XKlQkMDASgVatWjB49mhkzZjBx4kQOHjxI+fLleeqpp4iJiaF+/fqEhITQuHFjAgMDmTp1qvWbl3PnzsXZ2ZkJEyawatUqbGxsGDt2bKZj9+/fn4CAAHx8fEhJSWHcuHFZjvH69ev89NNPLFmyxLqsZcuWpKSkmHpAP6vaZGfMmDG8+eabfPHFFwAEBQXx008/MXfuXFasWEHNmjWzfPZNREREHo7FMAzjYTc6ceIEderUMdUoScl1/961h4eHZrZyoJktc1Qnc1Qnc1Sn3KlG5mT1zNbD/r2Xp5ea3n++Kivjxo3j5s2bmZY5OjoSEhKSl0OJiIiIFGv5/gb59957L793KSIiIlJsKRtRREREpAApG7EI27NnD1euXGHAgAH5ut+UlBS++OILvL29s11n//79ODk55XjL+L6CzkZUZqCIiBRnmtkqwjp27JjvjRZAbGwsGzduzHGdTz/9lJiYmHw/toiISGmjma3HJCIigl27dpGcnExsbCxDhw5lx44dnDp1iilTpnD16lW2bdtGeno6Tk5OBAcHExkZydmzZ5k0aRKhoaFs2bIFW1tbWrVqxeTJkwkODubw4cMkJSUxd+5c3N3dHzjuwYMHWbBgAba2tlSsWJFFixaxfPlyTp8+zXvvvUe/fv0eiOipUaMG3377LcePH6devXrKUhQREXkEarYeo8TERGvTFBYWRnh4OPv27SMsLAwPDw/CwsKwsbFh5MiRHD161LrdyZMn2bp1K+vXr8fW1hZfX1927doF3Ht/2fTp07M95vbt2+nWrRsjR45k586d3Lp1izFjxhAVFcW4ceP497///UBEz0cffUSHDh3o0aNHkWi0SkJ2V0k4h8dBdTJHdTJHdcqdamROgWYjSv5q1KgRcC9ex93dHYvFQqVKlUhLS8POzo6JEyfi4ODA1atXM0XlnD17lqZNm1rjd1q1asWpU6cAcs2jHDNmDMuXL+eVV17B1dWVJk2aZIohqlatWpGP6Cnu74HRu2zMUZ3MUZ3MUZ1ypxqZkx/ZiHpm6zHKLm4nLS2N7du3s3TpUgICAsjIyOC375p1c3PjyJEjpKenYxiGNaIIso7i+a3Nmzfz8ssvs2bNGp555hnCw8MzxQ/94x//oG/fvixcuJA2bdpYj2uxWMjD+25FRETkdzSzVQTY2tpSvnx5vLy8sLe3p1q1apkeTm/QoAEvvfQSPj4+ZGRk0LJlS7p27cqJEydy3ff//M//4Ofnh4ODA3Z2dsyaNYuqVauSlpbGwoUL6d69e5YRPU2bNmXRokXUrl07y2fBfkvZiCIiItnLU1yPPB7h4eFcuXKF119/vbCHkiXF9ZijqXpzVCdzVCdzVKfcqUbmFFpcjxS83bt3s3r1amtYdm4UkyQiIlI0qdkqojw9PfH09DS9vmKSREREiiY9IC8iIiJSgNRsiYiIiBQg3UaULPn5+dGjRw86duyY67oFlY2oTEQRESkJNLMlIiIiUoA0s1XKnDt3jmnTpmFra0uZMmV4++23Wbp0KWfOnKFOnTocPXqUbdu2FfYwRURESgw1W6XMv//9bxo3boyfnx8HDhxg48aNpKamEh4ezqVLl3jppZcKe4hWJSmzqySdS0FSncxRncxRnXKnGpmjbER5KP369eOf//wno0aNwsnJiYYNG9KkSRMA6tSpwxNPPFHII/w/JeVle3pxoDmqkzmqkzmqU+5UI3OUjSgPbceOHbRs2ZJVq1bRvXt3IiIiOHToEAC//vor165dK+QRioiIlCya2SplPDw8mDx5MsHBwdjY2BAcHExkZCQDBw7E1dUVW9uHvySUjSgiIpI9NVulzJNPPsmGDRsyLWvcuLH15/bt2wMwf/78xzouERGRkkq3EUVEREQKkJotyeT7778v7CGIiIiUKGq2RERERAqQmi0RERGRAqQH5IuwPXv2cOXKFQYMGFDYQ8lRfmQjKgdRRERKKjVbRZiZEGgREREp2tRsPSYRERHs2rWL5ORkYmNjGTp0KDt27ODUqVNMmTKFq1evsm3bNtLT03FycrK+/+rs2bNMmjSJ0NBQtmzZgq2tLa1atbK+K+vw4cMkJSUxd+5c3N3dHzju7du38ff3Jy4uDoDp06fToEEDXnzxRVq0aMG5c+eoWrUqwcHBpKWlMWXKFGJiYqhZsyb79+/nu+++e9ylEhERKVHUbD1GiYmJ1qYpLCyM8PBw9u3bR1hYGB4eHoSFhWFjY8PIkSM5evSodbuTJ0+ydetW1q9fj62tLb6+vuzatQsANzc3pk+fnu0xly9fTtu2bRk0aBDnz59n2rRprFu3jkuXLrFq1Spq1qzJwIEDOXr0KD/99BO1a9fm3Xff5cyZM/Tq1avAa3JfSc/nKunnl19UJ3NUJ3NUp9ypRuYoG7EYadSoEQBOTk64u7tjsVioVKkSaWlp2NnZMXHiRBwcHLh69Srp6enW7c6ePUvTpk2xs7MDoFWrVpw6dQqAunXr5njMqKgo9u7dy9atWwG4desWAJUrV6ZmzZoA1KxZk5SUFM6cOWO9denu7k6VKlXy8exzVpLzuZQ/Zo7qZI7qZI7qlDvVyBxlIxYzFosly+VpaWls376dpUuXEhAQQEZGBoZhWD93c3PjyJEjpKenYxgG+/fvtzZZNjY5/wrd3NwYNmwYa9asYenSpfTu3TvbsdSvX5/Dhw8DcPHiReutRxEREck7zWwVAba2tpQvXx4vLy/s7e2pVq0aMTEx1s8bNGjASy+9hI+PDxkZGbRs2ZKuXbty4sSJXPc9ZswY/P39CQ8PJyEhgXHjxmW7br9+/fDz8+Mvf/kLtWrVMp13qGxEERGR7KnZeky8vLysP3fs2NF6u65Ro0aEhoZmuU14eLj11uHw4cMZPnx4ps99fX1zPW7lypVZtmzZA8t/+6b4JUuWAHDo0CH69evH888/z/nz562zXCIiIpJ3araKqN27d7N69WoCAwNNrT9u3Dhu3ryZaZmjoyMhISGmj1mnTh0mTpzIe++9R3p6OjNmzHiYIYuIiEgW1GwVUZ6ennh6eppe/7333nvkY1arVo01a9Y88n5ERETk/+gBeREREZECpGZLREREpADpNqI8MjPZiMo+FBGR0krNVgE5duwYixcv5s6dOxiGQZs2bRg7diz29vYABAUFUbduXXx8fLLdx759+xg/fjz16tWzLuvVq1e+B1N37tyZrVu36vUNIiIiBUDNVgG4evUqkydPZtmyZdStWxfDMHj//feZN28evr6+TJkyhfPnzzNy5Mhc99W2bVvrqxlERESk+FGzVQA+//xzvL29rW95t1gsjB07li5dujB48GB8fX3Zs2dPnveflpbGW2+9xYULF8jIyGD8+PG0adOG3r1706pVK6Kioqhbty5Vq1blwIED2Nvb88EHH3D9+nUCAwNJSUkhPj6esWPH0rVrV+t+r1y5QkBAACkpKZQtW5bZs2dbI30eVWnP3yrt52+W6mSO6mSO6pQ71cgcZSMWQdHR0XTo0CHTMovFgouLC/b29jRt2tR0s7V3716GDPm/553CwsLYuHEjlStXJigoiLi4OAYPHsyWLVtITEykV69etGzZku7duzNt2jQmTJjA4MGDOX36NHFxcQwfPpw2bdpw6NAhgoODMzVbCxYsYMiQIXh6evKf//yHRYsW8c477+RLTUpz/pbyx8xRncxRncxRnXKnGpmTH9mIarYKQK1atbh06VKmZRkZGURHR1O1atWH2ldWtxGjoqI4ePAgR44cASA9Pd2aY9i4cWMAKlasiLu7u/XnlJQUqlWrRkhICJs2bcJisWQKu76/3xUrVrBy5UoMw7C+vV5ERETyTs1WAejbty8jRoygc+fOVKlShfHjx+Pq6kqnTp1wcHB45P27ublRo0YNxowZQ3JyMiEhIVSqVAnIPuwa4B//+Afe3t54enry6aef8tlnnz2w3xEjRtCiRQvOnDnD/v37TY1H2YgiIiLZU7NVAGrWrMnChQuZPXs2iYmJJCcnY2Njg4uLC/Hx8Tg7Oz/S/gcOHMj06dMZPHgwCQkJDBo0CBub3F+Z1r17d+bOncuKFSuoWbOmdTbsvqlTp1qf6UpOTsbf3/+RxikiIiJgMQzDKOxBlBYnTpygTp06VKhQobCHki/u37v28PDQzFYO9FyEOaqTOaqTOapT7lQjc7J6Zuth/97TzNZj1LBhwyyX50eItIiIiBRNaraKgPwIkRYREZGiSdmIIiIiIgVIM1vyyHLLRlQuooiIlGZqtgpBdrmJFy9eJCAgAMMwaNiwIQEBAZQpUybb/Rw4cID333+f9PR0kpKS8PLy4i9/+QsRERGcPXuWSZMmZVp/woQJLFiwwJrPKCIiIgVPzdZjllNu4rVr15g4cSKtW7fGz8+PnTt30q1btyz3c+nSJebMmcPKlStxcXEhOTmZoUOHUqdOnWyPrYxFERGRx0+vfnjMli9fjr29PSNGjLAuMwyDLl26EBkZiYODA6mpqbz66quMHTuWFi1aZLmf9957D0dHR4YNG2Zddvv2bRwcHPjf//1fPvvsM+zs7Lhx4wY+Pj4MGDCAzp07s3XrVt566y3s7e25fPkyMTExzJ8/n8aNG/Pxxx+zbds20tPTcXJyIjg4OMdZsPtfge37v6dyvI34w6BnH75QIiIiRZRe/VDE5ZSbeP36dWt+oaOjozXIOisxMTEPvErCycnJ+rOtrS0ffvghly9fZvTo0QwYMCDTurVq1WLWrFmEh4ezYcMGAgMDiY+PJywsDBsbG0aOHMnRo0fz5R0spf09LnqXjTmqkzmqkzmqU+5UI3OUjVgM5Zab6ODgwLZt29i4cSPz589nwYIF2e7n6tWrmZadOHGC+xOVzz77LBaLhWrVqpGcnPzA9o0aNQKgRo0aHDp0CBsbG+zs7Jg4cSIODg5cvXr1gexEEREReXhqth6znHITJ06ciJ+fH08//TQVKlTIMYKnV69ejB07lh49elClShUSExOZMWMGY8eOBXLOSMzq8xMnTrB9+3Y2btzInTt38PLywuwdZmUjioiIZE/N1mOWU27i6NGj8fPzw87OjvLlyzNnzpxs91O7dm0mT57MuHHjKFOmDImJifTr1w9PT08iIiIeelxPPfUU5cuXx8vLC3t7e6pVq0ZMTMyjnKqIiIigB+SLjOKYm6hsRHP0XIQ5qpM5qpM5qlPuVCNzlI1Ygig3UUREpGRSs1XEKTdRRESkeFM2ooiIiEgB0syWPLKcshGViygiIqWdZrZERERECpCarSLg2LFjjBgxAh8fHwYOHMiSJUtITU21fh4UFMS6dety3EdERAQNGjTgp59+si5LS0ujTZs2BAcHZ7tdcHBwrvsWERGRvNNtxEKWUzC1r68vU6ZM4fz584wcOTLXfbm5uREZGUnTpk0B+PbbbzNF+BSGgwcPFurxiwrVwRzVyRzVyRzVKXeqkTmPWic1W4Xs888/x9vb25qDaLFYGDt2LF26dGHw4MH4+vqyZ88eU/vq2LEj3333HRkZGdjY2LBlyxZ69uxp/fydd97h2LFjJCYm4u7uzrx58zJt/84777B//34Mw2DYsGG89NJLj3x+eoeL3mVjlupkjupkjuqUO9XInPzIRtRtxEIWHR1NnTp1Mi27H0xtb29vnaUyw87OjmbNmvHDDz+QkJBAQkICNWrUACAhIYGKFSvy0UcfsX79en788UeuXbtm3Xb37t388ssvrF+/ntWrV7N8+XJu3bqVPycpIiJSimlmq5DlFkz9sHr16sWWLVu4cuUK3bp1Iy3t3rcEy5Yty40bN6xB00lJSdbPAKKiojh+/DhDhtz79mB6ejrR0dFUrFgx12MqG1FERCR7mtkqZH379mXjxo2cP3+eW7duMWLECPz9/enUqRMODg4Pvb82bdrw448/8tVXX9G9e3fr8j179nDlyhUWL17MxIkTSU5OzhQ07ebmRps2bVizZg2rVq3ipZdeonbt2vlyjiIiIqWZZrYKWU7B1PHx8Tg7Oz/U/mxsbGjfvj1XrlzB0dHRurxJkyYsW7aM/v37Y29vT506dTIFTXfu3JkffviBQYMGkZSURNeuXTNtLyIiInmjZqsI8PDw4MMPP8y07MSJE9jZ2QHg6+ub6z68vLysP/v5+Vl/9vHxsf786aefPrDdbx+OnDZtmvlBi4iIiClqtoooBVOLiIiUDGq2ihkFU4uIiBQvarbkkWWVjahMRBERkXv0bUQRERGRAqRmqwjILhvx9OnT1mWBgYHcvXs3233s27ePCRMmmD5m+/btAfjggw84cuTII5+DiIiIZE23EQtZTtmI165dY+LEibRu3Ro/Pz927txJt27d8vX4o0ePztf93ae8rcxUD3NUJ3NUJ3NUp9ypRuYoG7GYyykbMTIyEgcHB1JTU4mNjTX9RvnevXvzhz/8gZMnT2KxWFi2bBkODg4EBARw+vRp6tSpQ2pqKnDvNRE9evSgRYsW+Pv7c/v2beLi4vD29mbQoEF5Pi/lbf0f5Y+ZozqZozqZozrlTjUyR9mIJUBO2YjXr1/n8uXL9OrVi7i4OGtDlpvExER69uzJxx9/TPXq1dmzZw979uwhJSWF8PBw3njjDe7cuZNpmwsXLtCzZ09CQ0NZvnw5YWFh+XWKIiIipZpmtgpZbtmIDg4ObNu2jY0bNzJ//nwWLFhgar/PPvsscO8N9SkpKVy+fJkmTZpYj1mzZs1M67u4uLBq1Sq2bduGo6Mj6enpps9B2YgiIiLZ08xWIcspG3HixImcP38egAoVKmBjY/7XZbFYMv3Zzc2NH3/8EYBr165x7dq1TJ+HhobSrFkzFi1aRPfu3TPlJoqIiEjeaWarkOWUjTh69Gj8/Pyws7OjfPnyzJkzJ8/H6dq1KwcPHsTb25tatWpRuXLlTJ936tSJwMBANm/ejLOzM2XKlCE1NRV7e/tHPUUREZFSzWJoCqNIOnHiBHXq1KFChQqFPZRs3X9Q0MPDQ7cRc6CHUM1RncxRncxRnXKnGpmT1QPyD/v3nma2iihlI4qIiJQMaraKGWUjioiIFC96QF5ERESkABXLma1jx46xePFi7ty5g2EYtGnThrFjx2Z6mDsoKIi6devi4+OT7X4iIiJ49913re+5Sk1N5ZVXXqFHjx6mxvHJJ58QHh7O3/72N9Pb5MXHH3/M4MGDMy2Ljo7mxIkTdO7cucCOa9bvg6gVQi0iIvJ/it3M1v14m4CAANatW8e6deuws7Nj3rx5ANy4cYNRo0axc+dOU/vr1asXa9asYc2aNaxYsYL58+ebfu3BN998w9tvv12gjRaQ5fNYe/fu5dChQwV6XBEREXl0xW5mK6d4m+TkZBITE/H19WXPnj0Pve/bt29Trlw5LBYLvXr14umnn8be3p6ZM2fi7+9PXFwcANOnT+fHH3/k2LFj+Pv7s2TJEv71r38RGRmJxWKhR48eDB06FD8/P+Lj44mPj2fFihVUqlQJgODgYH755ReuX79OdHQ006ZNo0OHDnz//fcsXbqUsmXL4uzsTFBQEJ988gk3b94kMDCQwMBAAO7evcsHH3xAcnIyzZs3p2bNmsyePZsyZcpQtmxZZs+eTa1ataznde7cOaZNm4atrS1lypTh7bffxsXFhRkzZnD16lXi4uLo2LEjf//73/njH//Ixo0bcXZ2Zu3atSQlJTFq1KhH/K2JiIiUXsWu2YqOjqZDhw6Zlt2Pt4mNjaVOnTrUqVPHdLMVGRnJTz/9hMVioXz58rz99tsAJCUl8dprr/Hss8+ycOFC2rZty6BBgzh//jzTpk1j3bp1REZGEhgYSEpKCl9++SVr167FYrEwbNgwnn/+eQDatm3LsGHDHjiuvb09K1eu5Pvvvyc0NJTnn3/eOlvn6urKqlWrCAkJYerUqXz88cfWRgugTJkyjB49mrNnz9KlSxe8vLyYO3cujRo1Yvv27cyfP593333Xuv6///1vGjdujJ+fHwcOHODmzZukpaXRrFkzvL29SUlJoWPHjowfP57evXuzZcsW/vKXv/DFF1/k6YF8BZs+SDUxR3UyR3UyR3XKnWpkTqkLos4t3uZh9erVi0mTJmX52f3Zs6ioKPbu3cvWrVsBuHXrVqb1oqKiiI6OtjZVN2/e5OLFi5n28XuNGjUCoEaNGqSmphIXF4ejoyOurq4AtG7dmsWLF5s6h5iYGOv+WrduzTvvvJPp8379+vHPf/6TUaNG4eTkxIQJE3B2dubo0aPs3bsXR0dHazB1v379mDBhAq1bt8bFxQUXFxdTY/gtvbclM73LxhzVyRzVyRzVKXeqkTn5EURd7Jqtvn37MmLECDp37kyVKlUYP348rq6udOrUCQcHh3w91v14HDc3N/r06UPv3r25fv06GzduzLSem5sb9erVY+XKlVgsFsLCwqhfvz5fffXVA7E59/1+eeXKlUlISCAmJobq1avzww8/8PTTTwNk+QyZjY0NGRkZAFSvXp0TJ07QsGFD9u/fb93uvh07dtCyZUvGjRtHZGQkK1eupFGjRjg5OTFr1iwuXLhAeHg4hmFQq1YtnJycWL58Of369ctL2UREROQ3il2zlVO8TXx8PM7Ozvl+zDFjxuDv7094eDgJCQmMGzcu0+cNGzakXbt2+Pj4kJqaSpMmTawzVGZZLBbmzJmDr68vFouFSpUqWR/6d3d3Z9KkSSxatMi6fv369QkJCaFx48bMmTOH2bNnYxgGZcqUISgoKNO+PTw8mDx5MsHBwdjY2DBt2jTs7e2ZOHEiBw8epHz58jz11FPExMTg6upK//79mTNnDgsXLjQ1dgVRi4iIZK/ExPUUh3ib4uLLL7/k1KlTvP766zmup7geczRVb47qZI7qZI7qlDvVyBzF9fxGdvE2oIibh7F48WIOHDjAsmXLCnsoIiIiJUKJabZyoogb8yZOnFjYQxARESlRit1LTUVERESKEzVbIiIiIgVIzVYJtH//fk6cOAFA+/btC/x47nM/o8wbayjzxpoCP5aIiEhxo2arBPr000+JiYkp7GGIiIgIpeQB+ZIiIiKCHTt2kJCQQFxcHL169WLbtm1s2rQJgPHjxzN8+HC+/fZbjh8/Tr169UhNTeWNN94gOjoaZ2dn3n33Xe7cucPkyZNJSEjg7t27vP7667Rr147evXvzhz/8gZMnT2KxWFi2bBlOTk6FfNYiIiLFm5qtYiYpKYmPPvqIGzdu4O3tjaurK6dPn8bFxYVffvmFpk2b0qFDB3r06EGtWrVISkpiwoQJ1K5dmyFDhvDzzz+zdetWnnvuOV555RWuXbuGj48P27dvJzExkZ49exIQEMAbb7zBnj176Nmz50ONTzlbWVNdzFGdzFGdzFGdcqcamVPqshFLu9atW1vfmF+xYkUGDhxIREQEtWrVok+fPg+sX6lSJWrXrg2Ai4sLd+7c4cyZM/Tu3RsAV1dXHB0duXHjBgDPPvsscO9N/SkpKQ89Pr0g70F6caA5qpM5qpM5qlPuVCNz8iMbUc9sFTPHjx8H4NdffyUhIYEXX3yR77//nm+++cbabFksFmueYlbZjO7u7hw4cACAa9eucevWLWvMUXZZjiIiIpI3mtkqZn799VdeeeUVbt++zVtvvYWDgwOtW7fmxo0b1oapadOmLFq0yDqj9Xt/+9vfePPNN/n6669JTk5m1qxZ2Nrm/VJQNqKIiEj21GwVM61bt2bSpEmZlqWnp+Pt7W3988CBAxk4cCAA33//vXX5kiVLrD9nFcezc+dO68+/P4aIiIjkjW4jFnMjRowgOTmZdu3aFfZQREREJAua2SpGvLy8HlgWGhpaCCMRERERszSzJSIiIlKA1GyJiIiIFCA1W0XAnj172LBhQ4Hs+3HkJLrP/axA9isiIlISqNkqAjp27MiAAQMKZN/KSRQRESlcekA+n0VERLBr1y6Sk5OJjY1l6NCh7Nixg1OnTjFlyhSuXr3Ktm3bSE9Px8nJieDgYCIjIzl79iyTJk0iNDSULVu2YGtrS6tWrZg8eTLBwcEcPnyYpKQk5s6di7u7+wPHvXXr1gN5h05OTrnmJCYnJ+Pv709cXBwA06dPp0GDBnTq1Ak3Nzfc3Nzw9/d/3GUUEREpMdRsFYDExERr0xQWFkZ4eDj79u0jLCwMDw8PwsLCsLGxYeTIkRw9etS63cmTJ9m6dSvr16/H1tYWX19fdu3aBYCbmxvTp0/P9pghISFZ5h3mlpP49ddf07ZtWwYNGsT58+eZNm0a69at48qVK0RERFC5cmVT56x8rZypPuaoTuaoTuaoTrlTjcxRNmIR1KhRIwCcnJxwd3fHYrFQqVIl0tLSsLOzY+LEiTg4OHD16lXS09Ot2509e5amTZtiZ2cHQKtWrTh16hQAdevWzfGYOeUd3pdVTmJUVBR79+5l69atwL0ZMoDKlSubbrRAmYg5Uf6YOaqTOaqTOapT7lQjc5SNWERlly+YlpbG9u3bWbp0KQEBAWRkZFgzDOHe7NWRI0dIT0/HMAz2799vbbJsbHL+VWWXd5hbTqKbmxvDhg1jzZo1LF261Nqw5XY8ERERMUczW4+Rra0t5cuXx8vLC3t7e6pVq5bp4fUGDRrw0ksv4ePjQ0ZGBi1btqRr167WbxPmJLu8w9xyEseMGYO/vz/h4eEkJCQwbty4hz6vM/4vP/Q2IiIipYXF+O3UihSK8PBwrly5wuuvv17YQ3ko96dTPTw8FESdA03Vm6M6maM6maM65U41Mier24gP+/eeZrYK2e7du1m9ejWBgYGm1h83bhw3b97MtMzR0ZGQkJACGJ2IiIg8KjVbhczT0xNPT0/T67/33nsFOBoRERHJb3oKWkRERKQAqdkSERERKUBqtkREREQKUIE0W8eOHWPEiBH4+PgwcOBAlixZQmpqaqZ1goKCWLduXY77iYiI4IUXXmDIkCEMGTKEAQMG8OWXX5oexyeffELfvn0fahuA+Ph4Nm/e/FDbmOHn58eePXu4e/cuI0eOxMfH54GH3X+rf//+/PLLL/k6hoiICHbs2MG+ffuYMGFCvu5bREREHpTvzdbVq1eZPHkyAQEBrFu3jnXr1mFnZ8e8efMAuHHjBqNGjWLnzp2m9terVy/WrFnDmjVrWLFiBfPnz8fs2yq++eYb3n77bXr06PFQ53Dy5EnT48uL2NhY4uLiWLduHZUqVSqw42TFy8uLLl26PNZjioiIlGb5/m3Ezz//HG9vb+ubzy0WC2PHjqVLly4kJyeTmJiIr68ve/bseeh93759m3LlymGxWOjVqxdPP/009vb2zJw584Ew5R9//JFjx47h7+/PkiVL+Ne//kVkZCQWi4UePXowdOhQ/Pz8iI+PJz4+nhUrVlgbn+XLl3PixAk2bNhA+/bt8ff3Jz09HYvFwvTp02nYsKF1TDdu3GD8+PEYhkFaWhozZ86kQYMGrFmz5oHj3RcQEMD58+eZMWMGs2bNynSOS5Ys4dtvv6VGjRrW87l9+3aWYdF+fn5cvHiRlJQURo4cSY8ePdi1a5f1G4vPPvssM2fOpE+fPtZa1a1bFxcXF9zc3Lhw4QIjR44kLi4OHx8fvL29OXnyJHPmzAHA2dmZoKAgnJyccvy95CW6oLRR/pg5qpM5qpM5qlPuVCNzHrlORj4LCAgwtm3b9sByb29v4+LFi9Y/v/vuu8batWtz3Nenn35qeHp6GoMHDzaGDBlijB492jh8+LBhGIbRqVMn4/jx44ZhGMbbb79tfPLJJ4ZhGMa5c+eMgQMHGoZhGIMHDzZOnz5tnDp1yhg4cKCRnp5u3L171xgyZIhx5swZY+rUqcZHH330wHH37t1rjB8/3jAMw/D19TW++eYbwzAM47///a/x8ssvZ1p3165dxmuvvWbcuXPHOHr0qHHgwIEcj7d7927j0qVLhre39wPHPXnypOHj42PcvXvXuH37ttGuXTvj0qVLWZ7f7du3jRdeeMG4fv26cf36deOLL74w0tLSjE6dOhm//vqrYRiGERwcbFy+fDlTre7Xfe/evUavXr2MlJQU486dO8aLL75oXL9+3fD29jZOnTplGIZhhIeHG4sXL87295OcnGwcOHDASE5OzvH3WNodOHCgsIdQLKhO5qhO5qhOuVONzPltnfL6916+z2zVqlWLS5cuZVqWkZFBdHQ0VatWfej99erVi0mTJmX52f3Zs+zClO+LiooiOjqaYcOGAXDz5k0uXryYaR/ZOXPmDK1btwbuBUxfvXo10+cdO3bk/PnzvPbaa9ja2vLqq6/meLzf8/f35+LFi1SuXJnu3bvj4eGBjY0Njo6O1K9fP9vzc3R0JCAggICAABISEujTpw9xcXFUrFjRWuffRu9kdZ7NmjXD3t4euJet+Msvv3DmzBlmzpwJ3MtyzK0+IiIikrN8b7b69u3LiBEj6Ny5M1WqVGH8+PG4urrSqVMnHBwc8vVY98OS3dzc6NOnD7179+b69ets3Lgx03pubm7Uq1ePlStXYrFYCAsLo379+nz11VdZhjPb2NiQkZEB/F/Ac5cuXfj5559xcXHJtO6+ffuoXr06oaGhHD58mMWLF+Pv75/t8X5v7ty51p9//vlnVq9eTUZGBsnJyZw+fTrb84uJieH48eO8//77pKSk4OnpSe/evbl16xbx8fE4OzszZ84c+vTpk6lWv/Xf//6X9PR0UlNTOXPmDE8++SR169ZlwYIF1KpVi4MHDxIbG/swvxIRERH5nXxvtmrWrMnChQuZPXs2iYmJJCcnY2Njg4uLi7UJyG+5hSk3bNiQdu3a4ePjQ2pqKk2aNMHV1TXb/T355JNERUURFhbGlClTCAgIIDQ0lPT09EzN0f19T5gwgVWrVmFjY8PYsWMf+nj3NWrUiO7du9OvXz+qV69unaHK6vyqVatGbGwsf/rTn3BwcGDEiBHY29vz1ltv8be//Q0bGxueffZZ/ud//ifb45UtW5a//vWv3Lp1C19fX5ydnQkMDGTq1KncvXsX4IHzFRERkYfz2IKoT5w4QZ06dahQocLjOJw8BsnJyRw/fpz69etbb0fKg+6HlkrOVCdzVCdzVKfcqUbm/LZOqampREVF0bhxY8qVK2d6H4+t2cqJwpWLp9u3bxMVFVXYwxAREXms6tevn+s39X+rSDRbUjxlZGSQmJiInZ1dls++iYiIlCTG/3/NU4UKFbJ8Fjo7arZERERECpCyEUVEREQKkJotERERkQKkZktERESkAKnZEhERESlAarZEREREClC+v0FeSoaMjAwCAwM5efIk9vb2zJkzh6eeesr6+c6dO3n//fextbXlz3/+M/379891m5IoL3UC+NOf/mR9R0vt2rWZN29eoYz/cTFzbdy5c4fhw4czd+5c3N3dS931lJcaga6l39cpMjKSVatWUaZMGerXr09gYCBAqbqWIG91srGx0fX0uzp9/fXXfPDBB1gsFgYMGIC3t3fe/t+UT6HYUsJ8/fXXxtSpUw3DMIzDhw8bY8aMsX6WmppqdO3a1YiPjzdSUlIMLy8vIyYmJsdtSqq81Ck5Odno27dvIY24cOR2bRw5csR4+eWXjeeee844ffq0qW1KmrzUSNdS5jrduXPH6NKli5GUlGQYhmFMmDDB2L59e6m7lgwjb3XS9ZS5Tunp6Ua3bt2MW7duGenp6caLL75oXL9+PU/Xk24jSpYOHjxIhw4dAGjWrBnHjh2zfnY/tLpSpUrY29vTsmVLDhw4kOM2JVVe6nTixAnu3LnDiBEjGDp0KD/++GMhjf7xye3aSE1N5f3338fNzc30NiVNXmqkaylznezt7Vm/fj3ly5cHID09nbJly5a6awnyViddT5nrVKZMGb788kucnJyIj48HoEKFCnm6nnQbUbKUkJCAo6Oj9c9lypQhPT0dW1tbEhISMsUUVKhQgYSEhBy3KanyUqdy5coxcuRIvL29OX/+PH/961/56quvSm2dAFq2bPnQ25Q0eamRrqXMdbKxscHFxQWANWvWkJSURPv27dm6dWupupYgb3WKiorS9fS7a8PW1pZt27Yxa9YsPD09rf9vf9jrqeRWUB6Jo6MjiYmJ1j9nZGRYL6Tff5aYmIiTk1OO25RUealT3bp1eeqpp7BYLNStWxdnZ2diY2OpWbPmYx//45KXa6O0XU95OV9dSw/WKSMjg4ULF3Lu3DmCg4OxWCyl7lqCvNVJ11PW18aLL75I165d8fPz4/PPP8/T9aTbiJKlFi1asGfPHgB+/PFH6tevb/3M3d2dCxcuEB8fT2pqKgcOHKB58+Y5blNS5aVOmzZtYv78+QBcu3aNhIQEqlWrVijjf1zycm2UtuspL+era+nBOs2YMYOUlBSWLVtmvU1W2q4lyFuddD1lrlNCQgKDBw8mNTUVGxsbypcvj42NTZ6uJ2UjSpbuf9siKioKwzAICgriv//9L0lJSQwYMMD6LTvDMPjzn//MX/7ylyy3uf+NqZIqL3VKTU1l2rRpREdHY7FYmDRpEi1atCjsUylQudXpviFDhhAYGJjp24il5XrKS410LWWuk4eHB3/+859p1aoVFosFgKFDh9KlS5dSdS1B3urk6emp6+l3/91t2LCBTZs2YWtrS4MGDQgICMBisTz09aRmS0RERKQA6TaiiIiISAFSsyUiIiJSgNRsiYiIiBQgNVsiIiIiBUjNloiIiEgBUrMlIiIiUoDUbImIiIgUoP8HYVJjHlWMns8AAAAASUVORK5CYII=\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(8, 4))\n",
+ "\n",
+ "_=(pd.Series(tree7.feature_importances_, index=X_train.columns)\n",
+ ".sort_values()\n",
+ ".plot.barh(ax=ax)\n",
+ ")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ffded519-2464-4043-8414-5ef46a5d1f02",
+ "metadata": {},
+ "source": [
+ "Seems like in this case the feature importances of the decision tree are not necessarily\n",
+ "the same as the coefficients of the logistic regression model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "95187d35-645d-4095-9f02-ae35eda56406",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "C:\\Users\\ricky\\miniconda3\\lib\\site-packages\\sklearn\\base.py:450: UserWarning: X does not have valid feature names, but DecisionTreeClassifier was fitted with feature names\n"
+ ]
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#import dtreeviz\n",
+ "dt3 = tree.DecisionTreeClassifier(max_depth=3)\n",
+ "dt3.fit(X_train, y_train)\n",
+ "viz = dtreeviz.model(dt3, X_train=X_train, y_train=y_train,\n",
+ "feature_names=list(X_train.columns), target_name='Job',\n",
+ "class_names=['DS', 'SE'])\n",
+ "viz.view()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ee2a2cbe-75a0-4aa2-bd7c-e44f4ed9a810",
+ "metadata": {},
+ "source": [
+ "## 17.3 XGBoost Feature Importance"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "f8620676-5fe4-44ac-9954-cdfb520dd79d",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "xgb_def = xgb.XGBClassifier()\n",
+ "xgb_def.fit(X_train, y_train)\n",
+ "\n",
+ "fig, ax = plt.subplots(figsize=(8, 4))\n",
+ "\n",
+ "_=(pd.Series(xgb_def.feature_importances_, index=X_train.columns)\n",
+ ".sort_values()\n",
+ ".plot.barh(ax=ax)\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8cb3801a-cddb-4524-b21c-9498fca7f293",
+ "metadata": {},
+ "source": [
+ "XGBoost models using the `.plot_importance` method. This method has an `importance_type` parameter that allows you to change how importance is measured. The different types of feature importance are:\n",
+ "\n",
+ "- **Gain**: This measures the total gain in the model's performance that results from using a feature. It is calculated as the average gain of splits that use the feature.\n",
+ "- **Weight**: This measures the number of times a feature is used in the model. It is calculated as the number of times a feature is used to split the data across all trees.\n",
+ "- **Cover**: This measures the number of samples that are affected by a feature. It is calculated as the average coverage of splits that use the feature.\n",
+ "\n",
+ "These different types of feature importance provide different perspectives on the importance of features in an XGBoost model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "88febc50-9433-4646-afd0-d4a160a6b857",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(8, 4))\n",
+ "_=xgb.plot_importance(xgb_def, importance_type='cover', ax=ax)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "64750505-3455-42ca-a7f6-a24b99604ee6",
+ "metadata": {},
+ "source": [
+ "## 17.4 Surrogate Models\n",
+ "\n",
+ "Another way to tease apart the XGBoost model is to train a decision tree to its predictions and\n",
+ "then explore the interpretable decision tree. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "bfb91913-29fb-4186-a80f-9f7aa2d4042c",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
DecisionTreeRegressor(max_depth=4)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeRegressor(max_depth=4)
"
+ ],
+ "text/plain": [
+ "DecisionTreeRegressor(max_depth=4)"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn import tree\n",
+ "sur_reg_sk = tree.DecisionTreeRegressor(max_depth=4)\n",
+ "sur_reg_sk.fit(X_train, xgb_def.predict_proba(X_train)[:,-1])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "30be8053-b8ca-47de-86c3-0b4f17ad572f",
+ "metadata": {},
+ "source": [
+ "We are going to export the tree to examine it. using the `export_graphviz` function so the image goes from left to right. We will convert the `.dot `file to a `.png` before generating it."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "257d5fde-96de-4954-8f8d-7a04c25af211",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# to convert DOT to PNG\n",
+ "tree.export_graphviz(sur_reg_sk, \n",
+ " out_file='img/sur-sk.dot',\n",
+ " feature_names=X_train.columns, \n",
+ " filled=True, rotate=True,fontname='Roboto Condensed')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "12261bfe-d3eb-415a-9f6b-e4e951e4dcc8",
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "SyntaxError",
+ "evalue": "cannot assign to operator (3541661493.py, line 1)",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[1;36m Input \u001b[1;32mIn [22]\u001b[1;36m\u001b[0m\n\u001b[1;33m dot -Gdpi=300 -Tpng -oimg/sur-sk.png img/sur-sk.dot # HIDE\u001b[0m\n\u001b[1;37m ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m cannot assign to operator\n"
+ ]
+ }
+ ],
+ "source": [
+ "dot -Gdpi=300 -Tpng -oimg/sur-sk.png img/sur-sk.dot # HIDE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "44f4572d-bb25-4f32-b773-42edf58d5e08",
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "SyntaxError",
+ "evalue": "invalid syntax (4146176678.py, line 1)",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[1;36m Input \u001b[1;32mIn [21]\u001b[1;36m\u001b[0m\n\u001b[1;33m dot -Tpng img/sur-sk.dot -o img/sur-sk.png\u001b[0m\n\u001b[1;37m ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m invalid syntax\n"
+ ]
+ }
+ ],
+ "source": [
+ "dot -Tpng img/sur-sk.dot -o img/sur-sk.png"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "cbb93ac6-305e-4abe-ab67-d4d41aa539eb",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "(dot.exe:27028): Pango-WARNING **: couldn't load font \"Roboto Condensed Not-Rotated 14\", falling back to \"Sans Condensed Not-Rotated 14\", expect ugly output.\n",
+ "\n",
+ "(dot.exe:27028): Pango-WARNING **: couldn't load font \"Sans Condensed Not-Rotated 14\", falling back to \"Sans Not-Rotated 14\", expect ugly output.\n"
+ ]
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Tree\n",
+ "\n",
+ "\n",
+ "\n",
+ "0\n",
+ "\n",
+ "r <= 0.5\n",
+ "squared_error = 0.121\n",
+ "samples = 2110\n",
+ "value = 0.454\n",
+ "\n",
+ "\n",
+ "\n",
+ "1\n",
+ "\n",
+ "major_cs <= 0.5\n",
+ "squared_error = 0.103\n",
+ "samples = 1484\n",
+ "value = 0.559\n",
+ "\n",
+ "\n",
+ "\n",
+ "0->1\n",
+ "\n",
+ "\n",
+ "True\n",
+ "\n",
+ "\n",
+ "\n",
+ "16\n",
+ "\n",
+ "major_cs <= 0.5\n",
+ "squared_error = 0.074\n",
+ "samples = 626\n",
+ "value = 0.203\n",
+ "\n",
+ "\n",
+ "\n",
+ "0->16\n",
+ "\n",
+ "\n",
+ "False\n",
+ "\n",
+ "\n",
+ "\n",
+ "2\n",
+ "\n",
+ "Q3_United States of America <= 0.5\n",
+ "squared_error = 0.115\n",
+ "samples = 718\n",
+ "value = 0.429\n",
+ "\n",
+ "\n",
+ "\n",
+ "1->2\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "9\n",
+ "\n",
+ "years_exp <= 3.5\n",
+ "squared_error = 0.061\n",
+ "samples = 766\n",
+ "value = 0.681\n",
+ "\n",
+ "\n",
+ "\n",
+ "1->9\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "3\n",
+ "\n",
+ "major_stat <= 0.5\n",
+ "squared_error = 0.086\n",
+ "samples = 322\n",
+ "value = 0.564\n",
+ "\n",
+ "\n",
+ "\n",
+ "2->3\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "6\n",
+ "\n",
+ "years_exp <= 4.5\n",
+ "squared_error = 0.111\n",
+ "samples = 396\n",
+ "value = 0.319\n",
+ "\n",
+ "\n",
+ "\n",
+ "2->6\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "4\n",
+ "\n",
+ "squared_error = 0.072\n",
+ "samples = 285\n",
+ "value = 0.603\n",
+ "\n",
+ "\n",
+ "\n",
+ "3->4\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "5\n",
+ "\n",
+ "squared_error = 0.098\n",
+ "samples = 37\n",
+ "value = 0.265\n",
+ "\n",
+ "\n",
+ "\n",
+ "3->5\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "7\n",
+ "\n",
+ "squared_error = 0.071\n",
+ "samples = 294\n",
+ "value = 0.237\n",
+ "\n",
+ "\n",
+ "\n",
+ "6->7\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "8\n",
+ "\n",
+ "squared_error = 0.153\n",
+ "samples = 102\n",
+ "value = 0.554\n",
+ "\n",
+ "\n",
+ "\n",
+ "6->8\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "10\n",
+ "\n",
+ "age <= 27.5\n",
+ "squared_error = 0.052\n",
+ "samples = 525\n",
+ "value = 0.625\n",
+ "\n",
+ "\n",
+ "\n",
+ "9->10\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "13\n",
+ "\n",
+ "education <= 17.0\n",
+ "squared_error = 0.057\n",
+ "samples = 241\n",
+ "value = 0.803\n",
+ "\n",
+ "\n",
+ "\n",
+ "9->13\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "11\n",
+ "\n",
+ "squared_error = 0.041\n",
+ "samples = 460\n",
+ "value = 0.65\n",
+ "\n",
+ "\n",
+ "\n",
+ "10->11\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "12\n",
+ "\n",
+ "squared_error = 0.1\n",
+ "samples = 65\n",
+ "value = 0.453\n",
+ "\n",
+ "\n",
+ "\n",
+ "10->12\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "14\n",
+ "\n",
+ "squared_error = 0.027\n",
+ "samples = 113\n",
+ "value = 0.885\n",
+ "\n",
+ "\n",
+ "\n",
+ "13->14\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "15\n",
+ "\n",
+ "squared_error = 0.072\n",
+ "samples = 128\n",
+ "value = 0.73\n",
+ "\n",
+ "\n",
+ "\n",
+ "13->15\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "17\n",
+ "\n",
+ "years_exp <= 7.5\n",
+ "squared_error = 0.043\n",
+ "samples = 443\n",
+ "value = 0.123\n",
+ "\n",
+ "\n",
+ "\n",
+ "16->17\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "24\n",
+ "\n",
+ "years_exp <= 4.5\n",
+ "squared_error = 0.096\n",
+ "samples = 183\n",
+ "value = 0.397\n",
+ "\n",
+ "\n",
+ "\n",
+ "16->24\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "18\n",
+ "\n",
+ "education <= 14.5\n",
+ "squared_error = 0.03\n",
+ "samples = 382\n",
+ "value = 0.105\n",
+ "\n",
+ "\n",
+ "\n",
+ "17->18\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "21\n",
+ "\n",
+ "education <= 19.5\n",
+ "squared_error = 0.11\n",
+ "samples = 61\n",
+ "value = 0.236\n",
+ "\n",
+ "\n",
+ "\n",
+ "17->21\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "19\n",
+ "\n",
+ "squared_error = 0.0\n",
+ "samples = 1\n",
+ "value = 0.873\n",
+ "\n",
+ "\n",
+ "\n",
+ "18->19\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "20\n",
+ "\n",
+ "squared_error = 0.029\n",
+ "samples = 381\n",
+ "value = 0.103\n",
+ "\n",
+ "\n",
+ "\n",
+ "18->20\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "22\n",
+ "\n",
+ "squared_error = 0.133\n",
+ "samples = 40\n",
+ "value = 0.32\n",
+ "\n",
+ "\n",
+ "\n",
+ "21->22\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "23\n",
+ "\n",
+ "squared_error = 0.027\n",
+ "samples = 21\n",
+ "value = 0.074\n",
+ "\n",
+ "\n",
+ "\n",
+ "21->23\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "25\n",
+ "\n",
+ "education <= 17.0\n",
+ "squared_error = 0.071\n",
+ "samples = 144\n",
+ "value = 0.34\n",
+ "\n",
+ "\n",
+ "\n",
+ "24->25\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "28\n",
+ "\n",
+ "education <= 17.0\n",
+ "squared_error = 0.131\n",
+ "samples = 39\n",
+ "value = 0.609\n",
+ "\n",
+ "\n",
+ "\n",
+ "24->28\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "26\n",
+ "\n",
+ "squared_error = 0.061\n",
+ "samples = 69\n",
+ "value = 0.43\n",
+ "\n",
+ "\n",
+ "\n",
+ "25->26\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "27\n",
+ "\n",
+ "squared_error = 0.066\n",
+ "samples = 75\n",
+ "value = 0.257\n",
+ "\n",
+ "\n",
+ "\n",
+ "25->27\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "29\n",
+ "\n",
+ "squared_error = 0.035\n",
+ "samples = 13\n",
+ "value = 0.862\n",
+ "\n",
+ "\n",
+ "\n",
+ "28->29\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "30\n",
+ "\n",
+ "squared_error = 0.131\n",
+ "samples = 26\n",
+ "value = 0.482\n",
+ "\n",
+ "\n",
+ "\n",
+ "28->30\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import graphviz\n",
+ "# Load the DOT file\n",
+ "with open('img/sur-sk.dot') as f:\n",
+ " dot_graph = f.read()\n",
+ "\n",
+ "# Create a graph from the DOT data\n",
+ "graph = graphviz.Source(dot_graph)\n",
+ "\n",
+ "# Display the graph in the Jupyter notebook\n",
+ "graph\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "db6a3fc0-52d0-4e75-b0fe-5d37a22d997d",
+ "metadata": {},
+ "source": [
+ "The warnings you're seeing indicate that the `dot` command-line tool from Graphviz is unable to find the specified font (\"Roboto Condensed\") on your system. As a result, it's falling back to a default font (\"Sans\"), which may result in suboptimal output.\n",
+ "\n",
+ "To fix this issue, you can try installing the \"Roboto Condensed\" font on your system. The process for doing this varies depending on your operating system. On Windows, you can download the font from the Google Fonts website (https://fonts.google.com/specimen/Roboto+Condensed) and then double-click the downloaded file to install it. On macOS, you can use the Font Book application to install the font. On Linux, the process varies depending on your distribution and desktop environment.\n",
+ "\n",
+ "Alternatively, you can modify the code that generates the DOT file to use a different font that is already installed on your system. For example, you can change the `fontname` parameter of the `export_graphviz` function to specify a different font:\n",
+ "\n",
+ "```python\n",
+ "tree.export_graphviz(sur_reg_sk, out_file='img/sur-sk.dot',\n",
+ " feature_names=X_train.columns, filled=True,\n",
+ " rotate=True, fontname='Arial')\n",
+ "```\n",
+ "\n",
+ "This will generate a DOT file that uses the \"Arial\" font instead of \"Roboto Condensed\". You can then use the `dot` command-line tool or the `graphviz` Python library to generate an image from the DOT file as described in my previous responses.\n",
+ "\n",
+ "## Summary\n",
+ "\n",
+ "\n",
+ "1. **What is the difference between white box and black box models regarding interpretation?**\n",
+ " - White box models are interpretable, meaning that their internal workings and decision-making processes can be easily understood and explained.\n",
+ " - Black box models, on the other hand, are more opaque and their decision-making processes are not as easily understood or explained.\n",
+ " - This means that white box models are generally easier to interpret and understand than black box models.\n",
+ "\n",
+ "2. **How is logistic regression used to make predictions and how can the coefficients be interpreted to understand the model’s decision-making process?**\n",
+ " - Logistic regression is a type of generalized linear model that is used for binary classification problems.\n",
+ " - The model makes predictions by calculating a weighted sum of the input features and passing the result through a sigmoid function to produce a probability value between 0 and 1.\n",
+ " - The coefficients of the logistic regression model represent the importance of each feature in making predictions. A positive coefficient indicates that an increase in the value of that feature increases the probability of a positive outcome, while a negative coefficient indicates that an increase in the value of that feature decreases the probability of a positive outcome.\n",
+ "\n",
+ "3. **How can decision trees be interpreted, and how do they visualize the model’s decision-making process?**\n",
+ " - Decision trees are a type of white box model that can be easily interpreted and visualized.\n",
+ " - The tree structure represents the decision-making process of the model, with each node representing a test on a feature and each branch representing an outcome of that test.\n",
+ " - By following the branches of the tree from the root to a leaf node, one can see how the model makes decisions based on the values of the input features.\n",
+ "\n",
+ "4. **What are some potential limitations of using interpretation techniques to understand the decision-making process of a machine learning model?**\n",
+ " - Interpretation techniques can provide valuable insights into how a machine learning model makes decisions, but they are not always perfect and may have limitations.\n",
+ " - For example, some interpretation techniques may only provide an approximate or partial understanding of the model's decision-making process.\n",
+ " - Additionally, interpretation techniques may not always be applicable or effective for all types of models or datasets.\n",
+ "\n",
+ "5. **How can the feature importance attribute be used to interpret the decision-making process of a black box model?**\n",
+ " - Feature importance is a measure of how much each feature contributes to making predictions with a machine learning model.\n",
+ " - For black box models, feature importance can provide some insight into which features are most important for making predictions.\n",
+ " - However, it's important to note that feature importance does not provide a complete understanding of how the model makes decisions and should be used in conjunction with other interpretation techniques.\n",
+ "\n",
+ "6. **In what situations might it be more appropriate to use a white box model over a black box model, and vice versa?**\n",
+ " - White box models may be more appropriate in situations where interpretability and explainability are important. For example, in regulated industries or when making decisions that have significant consequences for individuals or society.\n",
+ " - Black box models may be more appropriate in situations where predictive performance is more important than interpretability. For example, when making predictions in complex domains where it may be difficult to build an interpretable model with high accuracy."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7fff02e4-0a9e-4a11-8116-d28e6cca688b",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.12"
+ }
+ },
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/chp17_ModelInterpretation.ipynb b/chp17_ModelInterpretation.ipynb
index eb06abc..b706d14 100644
--- a/chp17_ModelInterpretation.ipynb
+++ b/chp17_ModelInterpretation.ipynb
@@ -3510,7 +3510,39 @@
" rotate=True, fontname='Arial')\n",
"```\n",
"\n",
- "This will generate a DOT file that uses the \"Arial\" font instead of \"Roboto Condensed\". You can then use the `dot` command-line tool or the `graphviz` Python library to generate an image from the DOT file as described in my previous responses."
+ "This will generate a DOT file that uses the \"Arial\" font instead of \"Roboto Condensed\". You can then use the `dot` command-line tool or the `graphviz` Python library to generate an image from the DOT file as described in my previous responses.\n",
+ "\n",
+ "## Summary\n",
+ "\n",
+ "\n",
+ "1. **What is the difference between white box and black box models regarding interpretation?**\n",
+ " - White box models are interpretable, meaning that their internal workings and decision-making processes can be easily understood and explained.\n",
+ " - Black box models, on the other hand, are more opaque and their decision-making processes are not as easily understood or explained.\n",
+ " - This means that white box models are generally easier to interpret and understand than black box models.\n",
+ "\n",
+ "2. **How is logistic regression used to make predictions and how can the coefficients be interpreted to understand the model’s decision-making process?**\n",
+ " - Logistic regression is a type of generalized linear model that is used for binary classification problems.\n",
+ " - The model makes predictions by calculating a weighted sum of the input features and passing the result through a sigmoid function to produce a probability value between 0 and 1.\n",
+ " - The coefficients of the logistic regression model represent the importance of each feature in making predictions. A positive coefficient indicates that an increase in the value of that feature increases the probability of a positive outcome, while a negative coefficient indicates that an increase in the value of that feature decreases the probability of a positive outcome.\n",
+ "\n",
+ "3. **How can decision trees be interpreted, and how do they visualize the model’s decision-making process?**\n",
+ " - Decision trees are a type of white box model that can be easily interpreted and visualized.\n",
+ " - The tree structure represents the decision-making process of the model, with each node representing a test on a feature and each branch representing an outcome of that test.\n",
+ " - By following the branches of the tree from the root to a leaf node, one can see how the model makes decisions based on the values of the input features.\n",
+ "\n",
+ "4. **What are some potential limitations of using interpretation techniques to understand the decision-making process of a machine learning model?**\n",
+ " - Interpretation techniques can provide valuable insights into how a machine learning model makes decisions, but they are not always perfect and may have limitations.\n",
+ " - For example, some interpretation techniques may only provide an approximate or partial understanding of the model's decision-making process.\n",
+ " - Additionally, interpretation techniques may not always be applicable or effective for all types of models or datasets.\n",
+ "\n",
+ "5. **How can the feature importance attribute be used to interpret the decision-making process of a black box model?**\n",
+ " - Feature importance is a measure of how much each feature contributes to making predictions with a machine learning model.\n",
+ " - For black box models, feature importance can provide some insight into which features are most important for making predictions.\n",
+ " - However, it's important to note that feature importance does not provide a complete understanding of how the model makes decisions and should be used in conjunction with other interpretation techniques.\n",
+ "\n",
+ "6. **In what situations might it be more appropriate to use a white box model over a black box model, and vice versa?**\n",
+ " - White box models may be more appropriate in situations where interpretability and explainability are important. For example, in regulated industries or when making decisions that have significant consequences for individuals or society.\n",
+ " - Black box models may be more appropriate in situations where predictive performance is more important than interpretability. For example, when making predictions in complex domains where it may be difficult to build an interpretable model with high accuracy."
]
},
{