Foggalong · Foggalong · Apr 19, 2024 · Apr 17, 2024 · Apr 18, 2024 · Apr 19, 2024
diff --git a/Scratch/numerical-error.ipynb b/Scratch/numerical-error.ipynb
@@ -282,7 +282,7 @@
     "\n",
     "# setup the robust objective function\n",
     "model.setObjective(\n",
-    "    0.5*w@(relationship_matrix@w) - lam*w.transpose()@expected_breeding_values - kappa*z,\n",
+    "    0.5*w@(relationship_matrix@w) - lam*w.transpose()@expected_breeding_values - lam*kappa*z,\n",
     "GRB.MINIMIZE)\n",
     "\n",
     "# add quadratic uncertainty constraint\n",

diff --git a/robust-genetics.ipynb b/robust-genetics.ipynb
@@ -13,7 +13,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 27,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -32,7 +32,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 28,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -113,7 +113,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 29,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -144,7 +144,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 30,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -167,15 +167,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 31,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Set parameter Username\n",
-      "Academic license - for non-commercial use only - expires 2025-02-26\n",
       "QP solution: w = [0.50000 0.37500 0.12500]\n"
      ]
     }
@@ -207,7 +205,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 32,
    "metadata": {},
    "outputs": [
     {
@@ -240,19 +238,76 @@
     "\n",
     "A wrinkle to this is that we don't typically our problem variables with certainty. There _are_ ways to define $\\Sigma$ based on relationships within the cohort (which are known and discussed more later) but $\\mu$ is an estimated variable. In particular we say $\\mu$ has a univariate normal distribution, $\\mu\\sim N(\\bar{\\mu}, \\Omega)$. This means we must turn to optimization tools which can address this uncertainty.¹\n",
     "\n",
-    "Robust optimization is one such tool in which we adjust the objective function to model the inherent uncertainty in the problem. We may either do this with a quadratic uncertainty set, in which case our objective has an additional square-root term as in\n",
+    "Robust optimization is one such tool in which we adjust the objective function to model the inherent uncertainty in the problem. Within this framework, the problem would be the bilevel optimization problem\n",
     "$$\n",
-    "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\kappa\\sqrt{w^{T}\\Omega w}\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u,\n",
+    "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda\\left( \\max_{\\mu\\in U_{\\mu}} w^{T}\\mu \\right)\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
+    "$$\n",
+    "The result of the inner problem determines the outer problem in a follower-leader setup. However, the inner problem is relatively simple and can be solved explicitly based on our choice of $U_{\\mu}$, representing an uncertainty set around the value of $\\mu$.\n",
+    "\n",
+    "### Box Uncertainty Sets\n",
+    "\n",
+    "We could define this using a box uncertainty set\n",
+    "$$\n",
+    "    U_{\\mu} := \\left\\lbrace \\mu :\\ | \\mu_i - \\bar{\\mu}_i| \\leq\\xi_i,\\ \\forall i \\right\\rbrace\n",
+    "$$\n",
+    "where $\\xi$ is some upper bound on the uncertainty value. In this case our objective function gains an additional absolute value term and becomes\n",
+    "$$\n",
+    "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\kappa\\|\\Omega^{\\frac{1}{2}} w\\|\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
+    "$$\n",
+    "For practical reasons relating to continuity and differentiability, this is not desirable to work with. <!-- TODO add a derivation, perhaps in a footnote, explaining how this emerges? -->\n",
+    "\n",
+    "### Quadratic Uncertainty Sets\n",
+    "\n",
+    "Alternatively, we could also define it using a quadratic uncertainty set\n",
     "$$\n",
-    "or with a box uncertainty set, in which case our objective has an additional absolute value term as in\n",
+    "    U_{\\mu} := \\left\\lbrace \\mu :\\ {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\leq \\kappa^2 \\right\\rbrace\n",
     "$$\n",
-    "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\kappa\\|\\Omega^{\\frac{1}{2}} w\\|\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u,\n",
+    "which in effect bounds the uncertainty in a ball about $\\bar{\\mu}$ with parameter $\\kappa$.² This means that our lower level problem $\\max_{\\mu\\in U_{\\mu}} w^{T}\\mu$ becomes\n",
     "$$\n",
+    "    \\max_{\\mu} w^{T}\\mu\\quad \\text{subject to}\\quad {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\leq \\kappa^2.\n",
+    "$$\n",
+    "<!-- BUG this needs reframing in terms of minimization, since the maximisation problem has different KKTS and will be concave rather than convex -->\n",
+    "This is convex, since $\\Omega$ is a positive definite matrix and $w^{T}\\mu$ is an affine function. If we define a Lagrangian multiplier $\\rho\\in\\mathbb{R}$, the KKT conditions of this problem are:\n",
+    "\\begin{align}\n",
+    "    \\nabla_{\\mu}L(\\mu, \\rho) = 0 \\quad&\\Rightarrow\\quad \\nabla_{\\mu}\\left( w^{T}\\mu - \\rho\\big( {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\big) \\right) = 0, \\\\\n",
+    "    c(\\mu) \\geq 0                \\quad&\\Rightarrow\\quad \\kappa^2 - {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\geq 0, \\\\\n",
+    "    \\rho \\geq 0                  \\quad&\\Rightarrow\\quad \\rho\\geq0, \\\\\n",
+    "    \\rho c(\\mu) = 0              \\quad&\\Rightarrow\\quad \\rho\\left(\\kappa^2 - {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\right) = 0.\n",
+    "\\end{align}\n",
+    "<!-- BUG I think there's a sign error somewhere in this derivation, either in the objective function part of the Lagrangian (since its maximization, not minimization) or in the inequality constraint -->\n",
+    "From (1) we have that\n",
+    "\\begin{align*}\n",
+    "    w - \\rho 2\\Omega^{-1}(\\mu-\\bar{\\mu}) = 0 \\quad&\\Rightarrow\\quad \\rho2\\Omega^{-1}(\\mu-\\bar{\\mu}) = w \\\\\n",
+    "                                                  &\\Rightarrow\\quad \\mu - \\bar{\\mu} = \\frac{1}{2\\rho}\\Omega w \\qquad (5)\n",
+    "\\end{align*}\n",
+    "which when substituted into (4) gives\n",
+    "\\begin{align*}\n",
+    "    &\\phantom{\\Rightarrow}\\quad \\rho\\left( \\kappa^2 - {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\right) = 0 \\\\\n",
+    "    &\\Rightarrow\\quad \\frac{1}{4\\rho}\\left( \\rho\\kappa^2 - w^{T}\\Omega\\Omega^{-1}\\Omega w \\right) = 0 \\\\\n",
+    "    &\\Rightarrow\\quad w^{T}\\Omega w = 4\\rho^2\\kappa^2 \\\\\n",
+    "    &\\Rightarrow\\quad \\sqrt{w^{T}\\Omega w} = 2\\rho\\kappa\\quad \\text{(since $\\rho,\\kappa\\geq0$)} \\\\\n",
+    "    &\\Rightarrow\\quad \\frac{1}{2\\rho} = \\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}.\n",
+    "\\end{align*}\n",
+    "Substituting this back into (5), we find that\n",
+    "$$\n",
+    "    \\mu - \\bar{\\mu} = \\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}\\Omega w \\quad\\Rightarrow\\quad \\mu = \\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}\\Omega w + \\bar{\\mu}\n",
+    "$$\n",
+    "is the solution to the inner problem. Substituting this back into the outer problem gives the problem as\n",
+    "$$\n",
+    "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\left( \\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}\\Omega w + \\bar{\\mu} \\right)\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u\n",
+    "$$\n",
+    "which simplifies down to\n",
+    "$$\n",
+    "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda\\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}w^{T}\\Omega w - \\lambda w^{T}\\bar{\\mu}\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
+    "$$\n",
+    "Thus, after rationalizing the denominator, here our objective gains an additional square root term and becomes\n",
+    "$$\n",
+    "    \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - {\\color{red}\\lambda}\\kappa\\sqrt{w^{T}\\Omega w}\\quad \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u,\n",
+    "$$  <!-- BUG the first time I worked through this I didn't have a $\\lambda$ where there's now one marked in red. I need to go back through and check how that impacts the rest of the work which follows. -->\n",
     "where $\\kappa\\in\\mathbb{R}$ is our robust optimization parameters. For practical reasons relating to continuity and differentiability, the quadratic uncertainty set is far more favourable to work with.\n",
     "\n",
-    "This is obviously no longer a quadratic problem, so `qpsolvers` is no longer a viable tool. We will instead now need to work with the Gurobi API directly.\n",
     "\n",
-    "<!-- TODO this cell skips over the detail about how to go from the bilevel robust problem to the single level robust problem theory wise, it would be worth delving into that in more detail. -->"
+    "This is obviously no longer a quadratic problem, so `qpsolvers` is no longer a viable tool. We will instead now need to work with the Gurobi API directly."
    ]
   },
   {
@@ -266,7 +321,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 33,
    "metadata": {},
    "outputs": [
     {
@@ -374,7 +429,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 34,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -384,7 +439,7 @@
     "    [0, 0, 1/8]\n",
     "])\n",
     "\n",
-    "kappa = 0.5"
+    "kappa = 0"
    ]
   },
   {
@@ -396,7 +451,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 35,
    "metadata": {},
    "outputs": [
     {
@@ -409,7 +464,7 @@
       "Thread count: 4 physical cores, 8 logical processors, using up to 8 threads\n",
       "\n",
       "Optimize a model with 23 rows, 4 columns and 25 nonzeros\n",
-      "Model fingerprint: 0x541bef89\n",
+      "Model fingerprint: 0x560dd60a\n",
       "Model has 3 quadratic objective terms\n",
       "Model has 3 quadratic constraints\n",
       "Coefficient statistics:\n",
@@ -433,19 +488,19 @@
       "\n",
       "                  Objective                Residual\n",
       "Iter       Primal          Dual         Primal    Dual     Compl     Time\n",
-      "   0  -8.68156934e-01 -8.68156934e-01  1.57e+00 7.38e-01  6.17e-01     0s\n",
-      "   1  -9.99130241e-01 -1.52016137e+00  1.66e-01 8.12e-07  8.63e-02     0s\n",
-      "   2  -8.04900486e-01 -9.40830018e-01  1.82e-07 1.07e-08  1.13e-02     0s\n",
-      "   3  -8.32538184e-01 -8.51116759e-01  3.42e-13 1.18e-14  1.55e-03     0s\n",
-      "   4  -8.36522462e-01 -8.36788831e-01  1.55e-14 4.44e-16  2.22e-05     0s\n",
-      "   5  -8.36548538e-01 -8.36556228e-01  6.67e-13 4.47e-14  6.41e-07     0s\n",
-      "   6  -8.36549617e-01 -8.36550259e-01  1.09e-11 5.45e-14  5.35e-08     0s\n",
+      "   0  -8.05656934e-01 -8.05656934e-01  1.56e+00 7.37e-01  6.23e-01     0s\n",
+      "   1  -9.03284336e-01 -1.45365890e+00  1.34e-01 8.10e-07  8.06e-02     0s\n",
+      "   2  -7.88721177e-01 -8.90212424e-01  1.48e-07 2.71e-08  8.46e-03     0s\n",
+      "   3  -8.05879135e-01 -8.18083147e-01  1.62e-13 2.98e-14  1.02e-03     0s\n",
+      "   4  -8.11749336e-01 -8.12648517e-01  8.62e-14 2.85e-16  7.49e-05     0s\n",
+      "   5  -8.12482991e-01 -8.12550118e-01  1.82e-13 1.68e-14  5.59e-06     0s\n",
+      "   6  -8.12499963e-01 -8.12500086e-01  9.57e-13 2.22e-15  1.02e-08     0s\n",
       "\n",
-      "Barrier solved model in 6 iterations and 0.02 seconds (0.00 work units)\n",
-      "Optimal objective -8.36549617e-01\n",
+      "Barrier solved model in 6 iterations and 0.03 seconds (0.00 work units)\n",
+      "Optimal objective -8.12499963e-01\n",
       "\n",
-      "w = [0.50000 0.35282 0.14718],\n",
-      "z = 0.052036865323801265.\n"
+      "w = [0.50000 0.37500 0.12500],\n",
+      "z = 0.027562890578567638.\n"
      ]
     }
    ],
@@ -459,7 +514,7 @@
     "\n",
     "# setup the robust objective function\n",
     "model.setObjective(\n",
-    "    0.5*w@(relationship_matrix@w) - lam*w.transpose()@expected_breeding_values - kappa*z,\n",
+    "    0.5*w@(relationship_matrix@w) - lam*w.transpose()@expected_breeding_values - lam*kappa*z,\n",
     "GRB.MINIMIZE)\n",
     "\n",
     "# add quadratic uncertainty constraint\n",
@@ -518,7 +573,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 36,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -569,7 +624,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 37,
    "metadata": {},
    "outputs": [
     {
@@ -666,7 +721,7 @@
       "Cutting planes:\n",
       "  RLT: 6\n",
       "\n",
-      "Explored 1 nodes (1828 simplex iterations) in 0.34 seconds (0.10 work units)\n",
+      "Explored 1 nodes (1828 simplex iterations) in 0.19 seconds (0.10 work units)\n",
       "Thread count was 8 (of 8 available processors)\n",
       "\n",
       "Solution count 1: -0.976358 \n",
@@ -779,7 +834,20 @@
    "source": [
     "## Footnotes\n",
     "\n",
-    "1. In reality, $\\Omega$ is also an estimated variable but we can ignore this for now to avoid going down a rabbit hole of uncertainties."
+    "1. In reality, $\\Omega$ is also an estimated variable but we can ignore this for now to avoid going down a rabbit hole of uncertainties.\n",
+    "\n",
+    "2. By \"ball\" we do not just mean a sphere in the typical geometric sense, but a ball in the sense of metric spaces. If we recognise that in our definition we have the transpose of a vector, multiplied by a positive definite matrix, multiplied by the same vector, we can see that\n",
+    "   \\begin{align*}\n",
+    "      {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) &=: x^{T}Ax \\\\\n",
+    "                                                      &= x^{T}L^{T}Lx \\quad \\text{(using Cholesky factorisation)} \\\\\n",
+    "                                                      &= {(Lx)}^{T}{(Lx)} \\\\\n",
+    "                                                      &= {\\|Lx\\|}_2^2 \\\\\n",
+    "                                                      &= {\\|x\\|}_{L^{-1}}^2 \\leq \\kappa^2.  \n",
+    "   \\end{align*}\n",
+    "   <!-- TODO I got up to the four line by myself but hadn't come across the last line before Julian suggested it, so need to make sure I'm comfortable with how these special matrix norms work. -->\n",
+    "   In other words, in the norm of $\\Omega$'s Cholesky factor our uncertainty set is a ball of radius $\\kappa$ centred on $\\bar{\\mu}$.\n",
+    "\n",
+    "3. Some third footnote which has yet to be written. When is, increment and retain. "
    ]
   }
  ],