Resolved doc TODOs

Foggalong · Apr 19, 2024 · c593e8f · c593e8f
1 parent 9fa356b
commit c593e8f
Showing 1 changed file with 77 additions and 62 deletions.
diff --git a/robust-genetics.ipynb b/robust-genetics.ipynb
@@ -6,9 +6,7 @@
    "source": [
     "# Robust Optimization for Genetic Selection\n",
     "\n",
-    "This is a working notebook, looking at how the quadratic optimization problem (QP) which arises in the context of robust genetic selection can be solved with Python (tested with 3.10 specifically). There are some standard packages which this depends on, imported below.\n",
-    "\n",
-    "<!-- TODO this needs updating to reflect the current purpose of this file -->"
+    "This notebook provides a didactic for using Python to solve the multi-objective optimization problem which arises in the context of robust genetic selection. It has been tested with Python 3.10 specifically and there are some standard packages which this depends on, imported below."
    ]
   },
   {
@@ -247,35 +245,17 @@
     "The result of the inner problem determines the outer problem in a follower-leader setup. However, the inner problem is relatively simple and can be solved explicitly based on our choice of $U_{\\mu}$, representing an uncertainty set around the value of $\\mu$.\n"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Box Uncertainty Sets\n",
-    "\n",
-    "We could define this using a box uncertainty set\n",
-    "$$\n",
-    "    U_{\\mu} := \\left\\lbrace \\mu :\\ | \\mu_i - \\bar{\\mu}_i| \\leq\\xi_i,\\ \\forall i \\right\\rbrace\n",
-    "$$\n",
-    "where $\\xi$ is some upper bound on the uncertainty value. In this case our objective function gains an additional absolute value term and becomes\n",
-    "$$\n",
-    "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\kappa\\|\\Omega^{\\frac{1}{2}} w\\|\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
-    "$$\n",
-    "For practical reasons relating to continuity and differentiability, this is not desirable to work with.\n",
-    "<!-- TODO even though we don't use this set, it could be worth adding a derivation, perhaps in a footnote, to explain how this emerges -->"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "### Quadratic Uncertainty Sets\n",
     "\n",
-    "Alternatively, we could also define it using a quadratic uncertainty set\n",
+    "There are many ways to define the uncertainty set $U_{\\mu}$, but for practical reasons relating to continuity and differentiability² it's beneficial to define it using a quadratic uncertainty set. We say that\n",
     "$$\n",
     "    U_{\\mu} := \\left\\lbrace \\mu :\\ {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\leq \\kappa^2 \\right\\rbrace\n",
     "$$\n",
-    "which in effect bounds the uncertainty in a ball about $\\bar{\\mu}$ with parameter $\\kappa$.² This means that our lower level problem $\\max_{\\mu\\in U_{\\mu}} w^{T}\\mu$ becomes\n",
+    "which in effect bounds the uncertainty in a ball about $\\bar{\\mu}$ with parameter $\\kappa$.³ This means that our lower level problem $\\max_{\\mu\\in U_{\\mu}} w^{T}\\mu$ becomes\n",
     "$$\n",
     "    \\max_{\\mu} w^{T}\\mu\\quad \\text{subject to}\\quad {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\leq \\kappa^2,\n",
     "$$\n",
@@ -417,7 +397,8 @@
     "$$\n",
     "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\lambda\\kappa z\\ \\text{ s.t. }\\ z\\leq\\sqrt{w^{T}\\Omega w},\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
     "$$\n",
-    "<!-- TODO: better explain the idea of $z$ pushing up so that this switch doesn't have any practical difference -->\n",
+    "TODO: _better explain the idea of $z$ pushing up so that this switch doesn't have any practical difference._\n",
+    "\n",
     "However, Gurobi _still_ can't handle this due to the presence of the square root, so we further make use of both $z$ and $\\sqrt{w^{T}\\Omega w}$ being positive to note that $z\\leq\\sqrt{w^{T}\\Omega w}$ can be squared on both sides:\n",
     "$$\n",
     "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\lambda\\kappa z\\ \\text{ s.t. }\\ z^2\\leq w^{T}\\Omega w,\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
@@ -506,7 +487,7 @@
       "   5  -8.24036160e-01 -8.24039018e-01  1.54e-13 7.22e-15  2.38e-07     0s\n",
       "   6  -8.24036608e-01 -8.24036837e-01  1.10e-10 1.94e-13  1.90e-08     0s\n",
       "\n",
-      "Barrier solved model in 6 iterations and 0.02 seconds (0.00 work units)\n",
+      "Barrier solved model in 6 iterations and 0.03 seconds (0.00 work units)\n",
       "Optimal objective -8.24036608e-01\n",
       "\n",
       "w = [0.50000 0.36395 0.13605],\n",
@@ -559,11 +540,21 @@
     "| 32.0 | [0.50000 0.07511 0.42489] | 0.15022 | -2.85630 |\n",
     "\n",
     "We can see that in the case $\\kappa = 0$ we return the standard optimization solution, as expected. However, we do have $z\\neq0$ which suggests that a small amount of numerical error has been introduced as a result of asking Gurobi to optimize $z$ as well as $w$.\n",
-    "<!-- TODO check that these values are correct, talk about subbing into the KKT conditions -->\n",
     "\n",
     "We also note that there are not further changes once $\\kappa$ falls above a certain threshold."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Verifying with KKTs\n",
+    "\n",
+    "As with other(?) optimization problems, we can verify that we have produced an optimum solution by checking that it satisfies the KKT conditions.\n",
+    "\n",
+    "TODO: _check that these values are correct, talk about subbing into the KKT conditions._"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -590,7 +581,13 @@
    "source": [
     "def load_problem(A_filename, E_filename, S_filename, dimension=False):\n",
     "    \"\"\"\n",
-    "    TODO write a docstring\n",
+    "    Used to load genetic selection problems into NumPy. It takes three\n",
+    "    string inputs for filenames where Sigma, Mu, and Omega are stored,\n",
+    "    as well as an optional integer input for problem dimension if this\n",
+    "    is known. If it's know know, it's worked out based on E_filename.\n",
+    "\n",
+    "    As output, it returns (A, E, S, n), where A and S are n-by-n NumPy\n",
+    "    arrays, E is a length n NumPy array, and n is an integer.\n",
     "    \"\"\"\n",
     "\n",
     "    def load_symmetric_matrix(filename, dimension):\n",
@@ -687,7 +684,7 @@
       "  10  -9.76215438e-01 -9.76215482e-01  7.03e-13 6.66e-16  8.86e-10     0s\n",
       "  11  -9.76215473e-01 -9.76215476e-01  4.24e-13 6.66e-16  4.91e-11     0s\n",
       "\n",
-      "Barrier solved model in 11 iterations and 0.03 seconds (0.00 work units)\n",
+      "Barrier solved model in 11 iterations and 0.04 seconds (0.00 work units)\n",
       "Optimal objective -9.76215473e-01\n",
       "\n",
       "Gurobi Optimizer version 11.0.0 build v11.0.0rc2 (linux64 - \"Ubuntu 22.04.4 LTS\")\n",
@@ -732,39 +729,42 @@
       "Cutting planes:\n",
       "  RLT: 6\n",
       "\n",
-      "Explored 1 nodes (1828 simplex iterations) in 0.27 seconds (0.10 work units)\n",
+      "Explored 1 nodes (1828 simplex iterations) in 0.29 seconds (0.10 work units)\n",
       "Thread count was 8 (of 8 available processors)\n",
       "\n",
       "Solution count 1: -0.976358 \n",
       "\n",
       "Optimal solution found (tolerance 5.00e-02)\n",
       "Best objective -9.763583540500e-01, best bound -1.025012483088e+00, gap 4.9832%\n",
-      " i   w_std   w_rbs   \t\t      i   w_std   w_rbs\n",
-      "00  0.00000 0.00002\t             01  0.08319 0.08339\n",
-      "02  0.11649 0.11508\t             03  0.00000 0.00001\n",
-      "04  0.00000 0.00002\t             05  0.00000 0.00002\n",
-      "06  0.00000 0.00001\t             07  0.06817 0.06830\n",
-      "08  0.00000 0.00001\t             09  0.03762 0.03701\n",
-      "10  0.00000 0.00002\t             11  0.00000 0.00002\n",
-      "12  0.00000 0.00001\t             13  0.00355 0.00564\n",
-      "14  0.11542 0.11450\t             15  0.06966 0.06913\n",
-      "16  0.00000 0.00006\t             17  0.00000 0.00001\n",
-      "18  0.00000 0.00002\t             19  0.00000 0.00003\n",
-      "20  0.00000 0.00002\t             21  0.00001 0.00062\n",
-      "22  0.00000 0.00017\t             23  0.00000 0.00004\n",
-      "24  0.03537 0.03493\t             25  0.00000 0.00001\n",
-      "26  0.00000 0.00001\t             27  0.00000 0.00002\n",
-      "28  0.00000 0.00002\t             29  0.00000 0.00002\n",
-      "30  0.00000 0.00001\t             31  0.03580 0.03612\n",
-      "32  0.08447 0.08517\t             33  0.00000 0.00003\n",
-      "34  0.00000 0.00001\t             35  0.02849 0.02747\n",
-      "36  0.13489 0.13487\t             37  0.07446 0.07264\n",
-      "38  0.00000 0.00001\t             39  0.00000 0.00001\n",
-      "40  0.00000 0.00003\t             41  0.07312 0.07388\n",
-      "42  0.00000 0.00004\t             43  0.00000 0.00003\n",
-      "44  0.01336 0.01485\t             45  0.00000 0.00002\n",
-      "46  0.00000 0.00009\t             47  0.02593 0.02555\n",
-      "48  0.00000 0.00003\t             49  0.00000 0.00000\n"
+      "\n",
+      "SIRE WEIGHTS\t\t\t DAM WEIGHTS\n",
+      "--------------------\t\t --------------------\n",
+      " i   w_std    w_rbs\t\t  i   w_std    w_rbs\n",
+      "00  0.00000  0.00002             01  0.08319  0.08339\n",
+      "02  0.11649  0.11508             03  0.00000  0.00001\n",
+      "04  0.00000  0.00002             05  0.00000  0.00002\n",
+      "06  0.00000  0.00001             07  0.06817  0.06830\n",
+      "08  0.00000  0.00001             09  0.03762  0.03701\n",
+      "10  0.00000  0.00002             11  0.00000  0.00002\n",
+      "12  0.00000  0.00001             13  0.00355  0.00564\n",
+      "14  0.11542  0.11450             15  0.06966  0.06913\n",
+      "16  0.00000  0.00006             17  0.00000  0.00001\n",
+      "18  0.00000  0.00002             19  0.00000  0.00003\n",
+      "20  0.00000  0.00002             21  0.00001  0.00062\n",
+      "22  0.00000  0.00017             23  0.00000  0.00004\n",
+      "24  0.03537  0.03493             25  0.00000  0.00001\n",
+      "26  0.00000  0.00001             27  0.00000  0.00002\n",
+      "28  0.00000  0.00002             29  0.00000  0.00002\n",
+      "30  0.00000  0.00001             31  0.03580  0.03612\n",
+      "32  0.08447  0.08517             33  0.00000  0.00003\n",
+      "34  0.00000  0.00001             35  0.02849  0.02747\n",
+      "36  0.13489  0.13487             37  0.07446  0.07264\n",
+      "38  0.00000  0.00001             39  0.00000  0.00001\n",
+      "40  0.00000  0.00003             41  0.07312  0.07388\n",
+      "42  0.00000  0.00004             43  0.00000  0.00003\n",
+      "44  0.01336  0.01485             45  0.00000  0.00002\n",
+      "46  0.00000  0.00009             47  0.02593  0.02555\n",
+      "48  0.00000  0.00003             49  0.00000  0.00000\n"
      ]
     }
    ],
@@ -800,8 +800,11 @@
     "GRB.MINIMIZE)\n",
     "\n",
     "model_rbs.setObjective(\n",
+    "    # Gurobi does offer a way to set one objective in terms of another, i.e.\n",
+    "    # we could use `model_std.getObjective() - lam*kappa*z_rbs` to define this\n",
+    "    # robust objective, but it results in a significant slowdown in code.\n",
     "    0.5*w_rbs@(sigma@w_rbs) - lam*w_rbs.transpose()@mubar - lam*kappa*z_rbs,\n",
-    "GRB.MINIMIZE)  # TODO check if there's a way to define this objective in terms of the other\n",
+    "GRB.MINIMIZE)\n",
     "\n",
     "# add sum-to-half constraints to both models\n",
     "model_std.addConstr(M @ w_std == m, name=\"sum-to-half\")\n",
@@ -826,10 +829,12 @@
     "model_rbs.optimize()\n",
     "\n",
     "# HACK code which prints the results for comparison in a nice format\n",
-    "print(\" i   w_std   w_rbs   \\t\\t      i   w_std   w_rbs\")\n",
-    "for candidate in range(0,25):\n",
-    "    print(f\"{candidate*2:02d}  {w_std.X[candidate*2]:.5f} {w_rbs.X[candidate*2]:.5f}\\t \\\n",
-    "            {candidate*2+1:02d}  {w_std.X[candidate*2+1]:.5f} {w_rbs.X[candidate*2+1]:.5f}\")"
+    "print(\"\\nSIRE WEIGHTS\\t\\t\\t DAM WEIGHTS\")\n",
+    "print(\"-\"*20 + \"\\t\\t \" + \"-\"*20)\n",
+    "print(\" i   w_std    w_rbs\\t\\t  i   w_std    w_rbs\")\n",
+    "for candidate in range(25):\n",
+    "    print(f\"{candidate*2:02d}  {w_std.X[candidate*2]:.5f}  {w_rbs.X[candidate*2]:.5f} \\\n",
+    "            {candidate*2+1:02d}  {w_std.X[candidate*2+1]:.5f}  {w_rbs.X[candidate*2+1]:.5f}\")"
    ]
   },
   {
@@ -847,7 +852,17 @@
     "\n",
     "1. In reality, $\\Omega$ is also an estimated variable but we can ignore this for now to avoid going down a rabbit hole of uncertainties.\n",
     "\n",
-    "2. By \"ball\" we do not just mean a sphere in the typical geometric sense, but a ball in the sense of metric spaces. If we recognise that in our definition we have the transpose of a vector, multiplied by a positive definite matrix, multiplied by the same vector, we can see that\n",
+    "2. For example, we could alternatively define $U_{\\mu}$ using a box uncertainty set\n",
+    "   $$\n",
+    "      U_{\\mu} := \\left\\lbrace \\mu :\\ | \\mu_i - \\bar{\\mu}_i| \\leq\\xi_i,\\ \\forall i \\right\\rbrace\n",
+    "   $$\n",
+    "   where $\\xi$ is some upper bound on the uncertainty value. In this case, work by [Heckel et al (2016)](https://www.risk.net/journal-of-investment-strategies/2475975/insights-into-robust-optimization-decomposing-into-mean-variance-and-risk-based-portfolios) shows that our objective function would gain an additional absolute value term, becoming\n",
+    "   $$\n",
+    "      \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\kappa\\|\\Omega^{\\frac{1}{2}} w\\|\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
+    "   $$\n",
+    "   This absolute value term means we lose many of the properties that we would like to work with in our objective functions.\n",
+    "\n",
+    "3. By \"ball\" we do not just mean a sphere in the typical geometric sense, but a ball in the sense of metric spaces. If we recognise that in our definition we have the transpose of a vector, multiplied by a positive definite matrix, multiplied by the same vector, we can see that\n",
     "   \\begin{align*}\n",
     "      {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) &=: x^{T}Ax \\\\\n",
     "                                                      &= x^{T}L^{T}Lx \\quad \\text{(using Cholesky factorisation)} \\\\\n",
@@ -858,7 +873,7 @@
     "   <!-- TODO I got up to the four line by myself but hadn't come across the last line before Julian suggested it, so need to make sure I'm comfortable with how these special matrix norms work. -->\n",
     "   In other words, in the norm of $\\Omega$'s Cholesky factor our uncertainty set is a ball of radius $\\kappa$ centred on $\\bar{\\mu}$.\n",
     "\n",
-    "3. Some third footnote which has yet to be written. When is, increment and retain. "
+    "3. Some fourth footnote which has yet to be written. When is, increment and retain. "
    ]
   }
  ],