diff --git a/.nojekyll b/.nojekyll index 46cbd2e..9dc3c2e 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -fa295913 \ No newline at end of file +8ec74007 \ No newline at end of file diff --git a/lectures/w09-l01.html b/lectures/w09-l01.html index 67e9739..57441e5 100644 --- a/lectures/w09-l01.html +++ b/lectures/w09-l01.html @@ -362,6 +362,7 @@
Overall accuracy improves. Based on our earlier observations, we can predict class 1 much better than class 0.
+Overall accuracy improves. Based on our earlier observations, we can predict class 1 much better than class 0. Interestingly, this can also be interpreted as a different form of regularization. Typically, one would place a constraint on the norm of the parameters, implying, one is enforcing smoothness constraints on the functional space. Here, by reweighting the loss, the learning algorithm gives less importance is difficulty samples, there by, the function to be fit, need to do lot of hard work (i.e very complex function) but a simpler function (meaning smooth function) would suffice. So, while the goal is same (smooth function), the way one goes about can be different. The path of regularization, to a large extent, is a brute-force approach, but reweighting one exactly knowns what is the influence of each example in the training.
diff --git a/search.json b/search.json index 68f4a7e..512c7c2 100644 --- a/search.json +++ b/search.json @@ -15,7 +15,7 @@ "href": "notebooks/Sample-Hardness.html#margins", "title": "Sample Hardness", "section": "Margins", - "text": "Margins\nGiven some representation of the data (or embedding), we can use very well known ML techniques to come up similar statistics like RMD. For example, we can fit an SVM, and calculate the margins for each instance. Not only we solve the prediciton problem, we can get secondary statistics, which are useful in determining the difficulty of the sample to the model.\n\nfrom sklearn.svm import LinearSVC\nfrom sklearn.inspection import DecisionBoundaryDisplay\nfrom sklearn import svm\n\nclf = svm.SVC(kernel=\"linear\", C=1000)\nclf.fit(X_train, y_train)\n\ny_test = clf.predict(X_test)\nprint('on test set')\nprint(classification_report(y_train, yh_train, target_names=['versicolor','virginica']))\n\nprint('on test set')\nprint(classification_report(y_test, yh_test, target_names=['versicolor','virginica']))\n\non test set\n precision recall f1-score support\n\n versicolor 0.69 0.72 0.71 40\n virginica 0.71 0.68 0.69 40\n\n accuracy 0.70 80\n macro avg 0.70 0.70 0.70 80\nweighted avg 0.70 0.70 0.70 80\n\non test set\n precision recall f1-score support\n\n versicolor 1.00 0.85 0.92 13\n virginica 0.78 1.00 0.88 7\n\n accuracy 0.90 20\n macro avg 0.89 0.92 0.90 20\nweighted avg 0.92 0.90 0.90 20\n\n\n\n\n# calculate the margin of all data points in the training set, already available in sklearn\nconf = clf.decision_function(X_train)\n_, ax = plt.subplots(figsize=(5,5))\nscatter = ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train,s=50*np.abs(conf))\n\n\n\n\n\n\n\n\n\n# plot the decision function\n\nax = plt.gca()\nDecisionBoundaryDisplay.from_estimator(\n clf,\n X_train,\n plot_method=\"contour\",\n colors=\"k\",\n levels=[-1, 0, 1],\n alpha=0.5,\n linestyles=[\"--\", \"-\", \"--\"],\n ax=ax,\n)\n# plot support vectors\nax.scatter(\n clf.support_vectors_[:, 0],\n clf.support_vectors_[:, 1],\n s=100,\n linewidth=1,\n facecolors=\"none\",\n edgecolors=\"k\",\n)\nax.scatter(X_train[:, 0], X_train[:, 1], c=y_train,s=50*np.abs(conf))\nplt.show()\n\nprint('Train size', len(X_train))\nprint('# of support vectors', len(clf.support_vectors_))\n\n\n\n\n\n\n\n\nTrain size 80\n# of support vectors 56\n\n\nAt inference time, we can flag instances with low confidence. A simple heuristic to flag us, the confidence score has to be greater than the confidence of the suppost vectors.\n\n# get the smallest confidence that is not of a support vector\nsv = clf.support_\n# get the conf of those support vectors\nconf_sv = clf.decision_function(X_train[sv,:])\n\n\n# get conf of all points in the train set\nconf = clf.decision_function(X_train)\nprint('conf',conf.shape)\n\n# get the mix conf of SVs from the training data\nthresh = np.max(conf_sv)\n\nprint('max conf of support vectors is: ', thresh)\n\nconf (80,)\nmax conf of support vectors is: 1.2440925839422636\n\n\nEither we can remove the points with low confidence and re-train the model, or pass a sample weight based on the confidence, and retrain the model.\n\n# At inference time, flag test points as low conf or high conf\nconf_test = clf.decision_function(X_test)\n\nind_high_conf = np.where(conf_test > thresh)\nprint('Test points that with high confidence', ind_high_conf[0].tolist())\n\nind_low_conf = np.where(conf_test <= thresh)\nprint('Test points that with low confidence', ind_low_conf[0].tolist())\n\nTest points that with high confidence [13]\nTest points that with low confidence [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19]\n\n\nWe can look at the accuracy of test data on high conf and low conf data points on train set, as we seem to have not many points in the test set.\n\nconf_train = clf.decision_function(X_train)\nind_high_conf = np.where(conf_train > thresh)[0].tolist()\n\nyh_high_conf = clf.predict(X_train[ind_high_conf,:])\nprint('on train set: high conf')\nprint(classification_report(y_train[ind_high_conf], yh_high_conf))\n\nind_low_conf = np.where(conf_train <= thresh)[0].tolist()\n\nyh_low_conf = clf.predict(X_train[ind_low_conf,:])\nprint('on train set: low conf')\nprint(classification_report(y_train[ind_low_conf], yh_low_conf))\n\non train set: high conf\n precision recall f1-score support\n\n 1 1.00 1.00 1.00 10\n\n accuracy 1.00 10\n macro avg 1.00 1.00 1.00 10\nweighted avg 1.00 1.00 1.00 10\n\non train set: low conf\n precision recall f1-score support\n\n 0 0.64 0.70 0.67 40\n 1 0.54 0.47 0.50 30\n\n accuracy 0.60 70\n macro avg 0.59 0.58 0.58 70\nweighted avg 0.59 0.60 0.60 70\n\n\n\nWe achieve perfect accuracy on the train set in which all samples have high confidence. And accuracy is around ~ 60% on the samples with low confidence. This demonstrates an important aspect – not all samples will have equal degree of confidence, and if there is a way to flag them, and deal with in the downstream task, we can bring reliability into the system.\nWe have chosen the threshold based on some intuition that, typically support vectors will be closed to the separating hyper plans and will exactly sit on the hyperplanes. So, if we choose points whose are farther from the support vectors, they should be farther away from the decision boundary and hence easy to classify.\nBut this way of choosing the thresholds does not give any statistical guarantees. One to has to choose the threshold via some cross-validation procedure. Later, we will see conformalization techniques which address this issue.\nOne way to incorporate the confidence or sample easiness into training procedure is to, remove all difficult examples and retrain the model. Or convert the RMD or other types scores into weights and use a weighted loss, instead.\nThis paper Learning Sample Difficulty from Pre-trained Models for Reliable Prediction uses a score based on RMD to reweigh the samples in loss function.\n\n# create weight from conf, and fit a logistic regression with weighed samples\nweights = np.exp(0.5*conf)\nscaler = np.max(weights)\nweights = weights/scaler\nplt.plot(weights)\nplt.show()\nplt.plot(conf)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n# fit a logistic model with sampled weights\nweighed_model = LogisticRegression(random_state=0).fit(X_train, y_train, sample_weight=weights)\nyh_test = weighed_model.predict(X_test)\n\n# get conf from margins of svm\nconf_test = clf.decision_function(X_test)\ntest_weights = np.exp(conf_test)/scaler\n\nprint('on test set w/o weights')\nprint(classification_report(y_test, yh_test))\n\nprint('on test set with weights')\nprint(classification_report(y_test, yh_test, sample_weight = test_weights ))\n\non test set w/o weights\n precision recall f1-score support\n\n 0 1.00 0.62 0.76 13\n 1 0.58 1.00 0.74 7\n\n accuracy 0.75 20\n macro avg 0.79 0.81 0.75 20\nweighted avg 0.85 0.75 0.75 20\n\non test set with weights\n precision recall f1-score support\n\n 0 1.00 0.41 0.58 1.5365777483728735\n 1 0.78 1.00 0.88 3.298054168177313\n\n accuracy 0.81 4.834631916550187\n macro avg 0.89 0.70 0.73 4.834631916550187\nweighted avg 0.85 0.81 0.78 4.834631916550187\n\n\n\nOverall accuracy improves. Based on our earlier observations, we can predict class 1 much better than class 0.", + "text": "Margins\nGiven some representation of the data (or embedding), we can use very well known ML techniques to come up similar statistics like RMD. For example, we can fit an SVM, and calculate the margins for each instance. Not only we solve the prediciton problem, we can get secondary statistics, which are useful in determining the difficulty of the sample to the model.\n\nfrom sklearn.svm import LinearSVC\nfrom sklearn.inspection import DecisionBoundaryDisplay\nfrom sklearn import svm\n\nclf = svm.SVC(kernel=\"linear\", C=1000)\nclf.fit(X_train, y_train)\n\ny_test = clf.predict(X_test)\nprint('on test set')\nprint(classification_report(y_train, yh_train, target_names=['versicolor','virginica']))\n\nprint('on test set')\nprint(classification_report(y_test, yh_test, target_names=['versicolor','virginica']))\n\non test set\n precision recall f1-score support\n\n versicolor 0.69 0.72 0.71 40\n virginica 0.71 0.68 0.69 40\n\n accuracy 0.70 80\n macro avg 0.70 0.70 0.70 80\nweighted avg 0.70 0.70 0.70 80\n\non test set\n precision recall f1-score support\n\n versicolor 1.00 0.85 0.92 13\n virginica 0.78 1.00 0.88 7\n\n accuracy 0.90 20\n macro avg 0.89 0.92 0.90 20\nweighted avg 0.92 0.90 0.90 20\n\n\n\n\n# calculate the margin of all data points in the training set, already available in sklearn\nconf = clf.decision_function(X_train)\n_, ax = plt.subplots(figsize=(5,5))\nscatter = ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train,s=50*np.abs(conf))\n\n\n\n\n\n\n\n\n\n# plot the decision function\n\nax = plt.gca()\nDecisionBoundaryDisplay.from_estimator(\n clf,\n X_train,\n plot_method=\"contour\",\n colors=\"k\",\n levels=[-1, 0, 1],\n alpha=0.5,\n linestyles=[\"--\", \"-\", \"--\"],\n ax=ax,\n)\n# plot support vectors\nax.scatter(\n clf.support_vectors_[:, 0],\n clf.support_vectors_[:, 1],\n s=100,\n linewidth=1,\n facecolors=\"none\",\n edgecolors=\"k\",\n)\nax.scatter(X_train[:, 0], X_train[:, 1], c=y_train,s=50*np.abs(conf))\nplt.show()\n\nprint('Train size', len(X_train))\nprint('# of support vectors', len(clf.support_vectors_))\n\n\n\n\n\n\n\n\nTrain size 80\n# of support vectors 56\n\n\nAt inference time, we can flag instances with low confidence. A simple heuristic to flag us, the confidence score has to be greater than the confidence of the suppost vectors.\n\n# get the smallest confidence that is not of a support vector\nsv = clf.support_\n# get the conf of those support vectors\nconf_sv = clf.decision_function(X_train[sv,:])\n\n\n# get conf of all points in the train set\nconf = clf.decision_function(X_train)\nprint('conf',conf.shape)\n\n# get the mix conf of SVs from the training data\nthresh = np.max(conf_sv)\n\nprint('max conf of support vectors is: ', thresh)\n\nconf (80,)\nmax conf of support vectors is: 1.2440925839422636\n\n\nEither we can remove the points with low confidence and re-train the model, or pass a sample weight based on the confidence, and retrain the model.\n\n# At inference time, flag test points as low conf or high conf\nconf_test = clf.decision_function(X_test)\n\nind_high_conf = np.where(conf_test > thresh)\nprint('Test points that with high confidence', ind_high_conf[0].tolist())\n\nind_low_conf = np.where(conf_test <= thresh)\nprint('Test points that with low confidence', ind_low_conf[0].tolist())\n\nTest points that with high confidence [13]\nTest points that with low confidence [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19]\n\n\nWe can look at the accuracy of test data on high conf and low conf data points on train set, as we seem to have not many points in the test set.\n\nconf_train = clf.decision_function(X_train)\nind_high_conf = np.where(conf_train > thresh)[0].tolist()\n\nyh_high_conf = clf.predict(X_train[ind_high_conf,:])\nprint('on train set: high conf')\nprint(classification_report(y_train[ind_high_conf], yh_high_conf))\n\nind_low_conf = np.where(conf_train <= thresh)[0].tolist()\n\nyh_low_conf = clf.predict(X_train[ind_low_conf,:])\nprint('on train set: low conf')\nprint(classification_report(y_train[ind_low_conf], yh_low_conf))\n\non train set: high conf\n precision recall f1-score support\n\n 1 1.00 1.00 1.00 10\n\n accuracy 1.00 10\n macro avg 1.00 1.00 1.00 10\nweighted avg 1.00 1.00 1.00 10\n\non train set: low conf\n precision recall f1-score support\n\n 0 0.64 0.70 0.67 40\n 1 0.54 0.47 0.50 30\n\n accuracy 0.60 70\n macro avg 0.59 0.58 0.58 70\nweighted avg 0.59 0.60 0.60 70\n\n\n\nWe achieve perfect accuracy on the train set in which all samples have high confidence. And accuracy is around ~ 60% on the samples with low confidence. This demonstrates an important aspect – not all samples will have equal degree of confidence, and if there is a way to flag them, and deal with in the downstream task, we can bring reliability into the system.\nWe have chosen the threshold based on some intuition that, typically support vectors will be closed to the separating hyper plans and will exactly sit on the hyperplanes. So, if we choose points whose are farther from the support vectors, they should be farther away from the decision boundary and hence easy to classify.\nBut this way of choosing the thresholds does not give any statistical guarantees. One to has to choose the threshold via some cross-validation procedure. Later, we will see conformalization techniques which address this issue.\nOne way to incorporate the confidence or sample easiness into training procedure is to, remove all difficult examples and retrain the model. Or convert the RMD or other types scores into weights and use a weighted loss, instead.\nThis paper Learning Sample Difficulty from Pre-trained Models for Reliable Prediction uses a score based on RMD to reweigh the samples in loss function.\n\n# create weight from conf, and fit a logistic regression with weighed samples\nweights = np.exp(0.5*conf)\nscaler = np.max(weights)\nweights = weights/scaler\nplt.plot(weights)\nplt.show()\nplt.plot(conf)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n# fit a logistic model with sampled weights\nweighed_model = LogisticRegression(random_state=0).fit(X_train, y_train, sample_weight=weights)\nyh_test = weighed_model.predict(X_test)\n\n# get conf from margins of svm\nconf_test = clf.decision_function(X_test)\ntest_weights = np.exp(conf_test)/scaler\n\nprint('on test set w/o weights')\nprint(classification_report(y_test, yh_test))\n\nprint('on test set with weights')\nprint(classification_report(y_test, yh_test, sample_weight = test_weights ))\n\non test set w/o weights\n precision recall f1-score support\n\n 0 1.00 0.62 0.76 13\n 1 0.58 1.00 0.74 7\n\n accuracy 0.75 20\n macro avg 0.79 0.81 0.75 20\nweighted avg 0.85 0.75 0.75 20\n\non test set with weights\n precision recall f1-score support\n\n 0 1.00 0.41 0.58 1.5365777483728735\n 1 0.78 1.00 0.88 3.298054168177313\n\n accuracy 0.81 4.834631916550187\n macro avg 0.89 0.70 0.73 4.834631916550187\nweighted avg 0.85 0.81 0.78 4.834631916550187\n\n\n\nOverall accuracy improves. Based on our earlier observations, we can predict class 1 much better than class 0. Interestingly, this can also be interpreted as a different form of regularization. Typically, one would place a constraint on the norm of the parameters, implying, one is enforcing smoothness constraints on the functional space. Here, by reweighting the loss, the learning algorithm gives less importance is difficulty samples, there by, the function to be fit, need to do lot of hard work (i.e very complex function) but a simpler function (meaning smooth function) would suffice. So, while the goal is same (smooth function), the way one goes about can be different. The path of regularization, to a large extent, is a brute-force approach, but reweighting one exactly knowns what is the influence of each example in the training.", "crumbs": [ "Notebooks", "2 Sample Hardness" @@ -644,7 +644,7 @@ "href": "lectures/w09-l01.html#materials", "title": "09A: Uncertainty Quantification", "section": "", - "text": "Pre-work:\n\n[blog] Expected Calibration Error\n[paper] Calibration in Deep Learning: A Survey of the State-of-the-Art\n\n\n\nIn-Class\n\nA gentle introduction to Conformal Prediction and Distribution-free Uncertainty Quantification Video\n\n\n\nPost-class\n\n[paper] A tutorial on Conformal Prediction\n[paper] Towards Reliability using Pretrained Large Model Extensions\n[tools] awesome-conformal-prediction - a collection Conformal Prediction resources including implementations.\n[tools] crepes - Conformal Classifiers, Regressors, and Predictive Systems.\n[tools] TorchCP - a python toolbox for Conformal Prediction research in Deep Learning Models using PyTorch.", + "text": "Pre-work:\n\n[blog] Expected Calibration Error\n[paper] Calibration in Deep Learning: A Survey of the State-of-the-Art\n\n\n\nIn-Class\n\nA gentle introduction to Conformal Prediction and Distribution-free Uncertainty Quantification Video\ncolab from DEEL-PUNCC\n\n\n\nPost-class\n\n[paper] A tutorial on Conformal Prediction\n[paper] Towards Reliability using Pretrained Large Model Extensions\n[tools] awesome-conformal-prediction - a collection Conformal Prediction resources including implementations.\n[tools] crepes - Conformal Classifiers, Regressors, and Predictive Systems.\n[tools] TorchCP - a python toolbox for Conformal Prediction research in Deep Learning Models using PyTorch.\n[tools] MAPIE - a python toolbox for Conformal Prediction\n[tools] DEEL-PUNCC - a python toolbox for Conformal Prediction from DEEL.ai a project for Dependable, Certifiable, Explainable AI for Critical Systems. Checkout the sister projects from DEEl on Bias DEEL INFLUENCIAE, oodeel for OOD, xplique for XAI,", "crumbs": [ "Lectures[ML Science]", "09A: Uncertainty Quantification" diff --git a/sitemap.xml b/sitemap.xml index 5e20994..d3ffd2e 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,7 +2,7 @@