Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more complete derivations #1

Merged
merged 5 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Scratch/numerical-error.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@
"\n",
"# setup the robust objective function\n",
"model.setObjective(\n",
" 0.5*w@(relationship_matrix@w) - lam*w.transpose()@expected_breeding_values - kappa*z,\n",
" 0.5*w@(relationship_matrix@w) - lam*w.transpose()@expected_breeding_values - lam*kappa*z,\n",
"GRB.MINIMIZE)\n",
"\n",
"# add quadratic uncertainty constraint\n",
Expand Down
138 changes: 103 additions & 35 deletions robust-genetics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -32,7 +32,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -113,7 +113,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -144,7 +144,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -167,15 +167,13 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Set parameter Username\n",
"Academic license - for non-commercial use only - expires 2025-02-26\n",
"QP solution: w = [0.50000 0.37500 0.12500]\n"
]
}
Expand Down Expand Up @@ -207,7 +205,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 32,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -240,19 +238,76 @@
"\n",
"A wrinkle to this is that we don't typically our problem variables with certainty. There _are_ ways to define $\\Sigma$ based on relationships within the cohort (which are known and discussed more later) but $\\mu$ is an estimated variable. In particular we say $\\mu$ has a univariate normal distribution, $\\mu\\sim N(\\bar{\\mu}, \\Omega)$. This means we must turn to optimization tools which can address this uncertainty.¹\n",
"\n",
"Robust optimization is one such tool in which we adjust the objective function to model the inherent uncertainty in the problem. We may either do this with a quadratic uncertainty set, in which case our objective has an additional square-root term as in\n",
"Robust optimization is one such tool in which we adjust the objective function to model the inherent uncertainty in the problem. Within this framework, the problem would be the bilevel optimization problem\n",
"$$\n",
" \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\kappa\\sqrt{w^{T}\\Omega w}\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u,\n",
" \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda\\left( \\max_{\\mu\\in U_{\\mu}} w^{T}\\mu \\right)\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
"$$\n",
"The result of the inner problem determines the outer problem in a follower-leader setup. However, the inner problem is relatively simple and can be solved explicitly based on our choice of $U_{\\mu}$, representing an uncertainty set around the value of $\\mu$.\n",
"\n",
"### Box Uncertainty Sets\n",
"\n",
"We could define this using a box uncertainty set\n",
"$$\n",
" U_{\\mu} := \\left\\lbrace \\mu :\\ | \\mu_i - \\bar{\\mu}_i| \\leq\\xi_i,\\ \\forall i \\right\\rbrace\n",
"$$\n",
"where $\\xi$ is some upper bound on the uncertainty value. In this case our objective function gains an additional absolute value term and becomes\n",
"$$\n",
" \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\kappa\\|\\Omega^{\\frac{1}{2}} w\\|\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
"$$\n",
"For practical reasons relating to continuity and differentiability, this is not desirable to work with. <!-- TODO add a derivation, perhaps in a footnote, explaining how this emerges? -->\n",
"\n",
"### Quadratic Uncertainty Sets\n",
"\n",
"Alternatively, we could also define it using a quadratic uncertainty set\n",
"$$\n",
"or with a box uncertainty set, in which case our objective has an additional absolute value term as in\n",
" U_{\\mu} := \\left\\lbrace \\mu :\\ {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\leq \\kappa^2 \\right\\rbrace\n",
"$$\n",
" \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - \\kappa\\|\\Omega^{\\frac{1}{2}} w\\|\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u,\n",
"which in effect bounds the uncertainty in a ball about $\\bar{\\mu}$ with parameter $\\kappa$.² This means that our lower level problem $\\max_{\\mu\\in U_{\\mu}} w^{T}\\mu$ becomes\n",
"$$\n",
" \\max_{\\mu} w^{T}\\mu\\quad \\text{subject to}\\quad {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\leq \\kappa^2.\n",
"$$\n",
"<!-- BUG this needs reframing in terms of minimization, since the maximisation problem has different KKTS and will be concave rather than convex -->\n",
"This is convex, since $\\Omega$ is a positive definite matrix and $w^{T}\\mu$ is an affine function. If we define a Lagrangian multiplier $\\rho\\in\\mathbb{R}$, the KKT conditions of this problem are:\n",
"\\begin{align}\n",
" \\nabla_{\\mu}L(\\mu, \\rho) = 0 \\quad&\\Rightarrow\\quad \\nabla_{\\mu}\\left( w^{T}\\mu - \\rho\\big( {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\big) \\right) = 0, \\\\\n",
" c(\\mu) \\geq 0 \\quad&\\Rightarrow\\quad \\kappa^2 - {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\geq 0, \\\\\n",
" \\rho \\geq 0 \\quad&\\Rightarrow\\quad \\rho\\geq0, \\\\\n",
" \\rho c(\\mu) = 0 \\quad&\\Rightarrow\\quad \\rho\\left(\\kappa^2 - {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\right) = 0.\n",
"\\end{align}\n",
"<!-- BUG I think there's a sign error somewhere in this derivation, either in the objective function part of the Lagrangian (since its maximization, not minimization) or in the inequality constraint -->\n",
"From (1) we have that\n",
"\\begin{align*}\n",
" w - \\rho 2\\Omega^{-1}(\\mu-\\bar{\\mu}) = 0 \\quad&\\Rightarrow\\quad \\rho2\\Omega^{-1}(\\mu-\\bar{\\mu}) = w \\\\\n",
" &\\Rightarrow\\quad \\mu - \\bar{\\mu} = \\frac{1}{2\\rho}\\Omega w \\qquad (5)\n",
"\\end{align*}\n",
"which when substituted into (4) gives\n",
"\\begin{align*}\n",
" &\\phantom{\\Rightarrow}\\quad \\rho\\left( \\kappa^2 - {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) \\right) = 0 \\\\\n",
" &\\Rightarrow\\quad \\frac{1}{4\\rho}\\left( \\rho\\kappa^2 - w^{T}\\Omega\\Omega^{-1}\\Omega w \\right) = 0 \\\\\n",
" &\\Rightarrow\\quad w^{T}\\Omega w = 4\\rho^2\\kappa^2 \\\\\n",
" &\\Rightarrow\\quad \\sqrt{w^{T}\\Omega w} = 2\\rho\\kappa\\quad \\text{(since $\\rho,\\kappa\\geq0$)} \\\\\n",
" &\\Rightarrow\\quad \\frac{1}{2\\rho} = \\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}.\n",
"\\end{align*}\n",
"Substituting this back into (5), we find that\n",
"$$\n",
" \\mu - \\bar{\\mu} = \\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}\\Omega w \\quad\\Rightarrow\\quad \\mu = \\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}\\Omega w + \\bar{\\mu}\n",
"$$\n",
"is the solution to the inner problem. Substituting this back into the outer problem gives the problem as\n",
"$$\n",
" \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\left( \\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}\\Omega w + \\bar{\\mu} \\right)\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u\n",
"$$\n",
"which simplifies down to\n",
"$$\n",
" \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda\\frac{\\kappa}{\\sqrt{w^{T}\\Omega w}}w^{T}\\Omega w - \\lambda w^{T}\\bar{\\mu}\\ \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u.\n",
"$$\n",
"Thus, after rationalizing the denominator, here our objective gains an additional square root term and becomes\n",
"$$\n",
" \\min_w \\frac{1}{2}w^{T}\\Sigma w - \\lambda w^{T}\\mu - {\\color{red}\\lambda}\\kappa\\sqrt{w^{T}\\Omega w}\\quad \\text{ subject to }\\ Mw = \\begin{bmatrix} 0.5 \\\\ 0.5\\end{bmatrix},\\ l\\leq w\\leq u,\n",
"$$ <!-- BUG the first time I worked through this I didn't have a $\\lambda$ where there's now one marked in red. I need to go back through and check how that impacts the rest of the work which follows. -->\n",
"where $\\kappa\\in\\mathbb{R}$ is our robust optimization parameters. For practical reasons relating to continuity and differentiability, the quadratic uncertainty set is far more favourable to work with.\n",
"\n",
"This is obviously no longer a quadratic problem, so `qpsolvers` is no longer a viable tool. We will instead now need to work with the Gurobi API directly.\n",
"\n",
"<!-- TODO this cell skips over the detail about how to go from the bilevel robust problem to the single level robust problem theory wise, it would be worth delving into that in more detail. -->"
"This is obviously no longer a quadratic problem, so `qpsolvers` is no longer a viable tool. We will instead now need to work with the Gurobi API directly."
]
},
{
Expand All @@ -266,7 +321,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 33,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -374,7 +429,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -384,7 +439,7 @@
" [0, 0, 1/8]\n",
"])\n",
"\n",
"kappa = 0.5"
"kappa = 0"
]
},
{
Expand All @@ -396,7 +451,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 35,
"metadata": {},
"outputs": [
{
Expand All @@ -409,7 +464,7 @@
"Thread count: 4 physical cores, 8 logical processors, using up to 8 threads\n",
"\n",
"Optimize a model with 23 rows, 4 columns and 25 nonzeros\n",
"Model fingerprint: 0x541bef89\n",
"Model fingerprint: 0x560dd60a\n",
"Model has 3 quadratic objective terms\n",
"Model has 3 quadratic constraints\n",
"Coefficient statistics:\n",
Expand All @@ -433,19 +488,19 @@
"\n",
" Objective Residual\n",
"Iter Primal Dual Primal Dual Compl Time\n",
" 0 -8.68156934e-01 -8.68156934e-01 1.57e+00 7.38e-01 6.17e-01 0s\n",
" 1 -9.99130241e-01 -1.52016137e+00 1.66e-01 8.12e-07 8.63e-02 0s\n",
" 2 -8.04900486e-01 -9.40830018e-01 1.82e-07 1.07e-08 1.13e-02 0s\n",
" 3 -8.32538184e-01 -8.51116759e-01 3.42e-13 1.18e-14 1.55e-03 0s\n",
" 4 -8.36522462e-01 -8.36788831e-01 1.55e-14 4.44e-16 2.22e-05 0s\n",
" 5 -8.36548538e-01 -8.36556228e-01 6.67e-13 4.47e-14 6.41e-07 0s\n",
" 6 -8.36549617e-01 -8.36550259e-01 1.09e-11 5.45e-14 5.35e-08 0s\n",
" 0 -8.05656934e-01 -8.05656934e-01 1.56e+00 7.37e-01 6.23e-01 0s\n",
" 1 -9.03284336e-01 -1.45365890e+00 1.34e-01 8.10e-07 8.06e-02 0s\n",
" 2 -7.88721177e-01 -8.90212424e-01 1.48e-07 2.71e-08 8.46e-03 0s\n",
" 3 -8.05879135e-01 -8.18083147e-01 1.62e-13 2.98e-14 1.02e-03 0s\n",
" 4 -8.11749336e-01 -8.12648517e-01 8.62e-14 2.85e-16 7.49e-05 0s\n",
" 5 -8.12482991e-01 -8.12550118e-01 1.82e-13 1.68e-14 5.59e-06 0s\n",
" 6 -8.12499963e-01 -8.12500086e-01 9.57e-13 2.22e-15 1.02e-08 0s\n",
"\n",
"Barrier solved model in 6 iterations and 0.02 seconds (0.00 work units)\n",
"Optimal objective -8.36549617e-01\n",
"Barrier solved model in 6 iterations and 0.03 seconds (0.00 work units)\n",
"Optimal objective -8.12499963e-01\n",
"\n",
"w = [0.50000 0.35282 0.14718],\n",
"z = 0.052036865323801265.\n"
"w = [0.50000 0.37500 0.12500],\n",
"z = 0.027562890578567638.\n"
]
}
],
Expand All @@ -459,7 +514,7 @@
"\n",
"# setup the robust objective function\n",
"model.setObjective(\n",
" 0.5*w@(relationship_matrix@w) - lam*w.transpose()@expected_breeding_values - kappa*z,\n",
" 0.5*w@(relationship_matrix@w) - lam*w.transpose()@expected_breeding_values - lam*kappa*z,\n",
"GRB.MINIMIZE)\n",
"\n",
"# add quadratic uncertainty constraint\n",
Expand Down Expand Up @@ -518,7 +573,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -569,7 +624,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 37,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -666,7 +721,7 @@
"Cutting planes:\n",
" RLT: 6\n",
"\n",
"Explored 1 nodes (1828 simplex iterations) in 0.34 seconds (0.10 work units)\n",
"Explored 1 nodes (1828 simplex iterations) in 0.19 seconds (0.10 work units)\n",
"Thread count was 8 (of 8 available processors)\n",
"\n",
"Solution count 1: -0.976358 \n",
Expand Down Expand Up @@ -779,7 +834,20 @@
"source": [
"## Footnotes\n",
"\n",
"1. In reality, $\\Omega$ is also an estimated variable but we can ignore this for now to avoid going down a rabbit hole of uncertainties."
"1. In reality, $\\Omega$ is also an estimated variable but we can ignore this for now to avoid going down a rabbit hole of uncertainties.\n",
"\n",
"2. By \"ball\" we do not just mean a sphere in the typical geometric sense, but a ball in the sense of metric spaces. If we recognise that in our definition we have the transpose of a vector, multiplied by a positive definite matrix, multiplied by the same vector, we can see that\n",
" \\begin{align*}\n",
" {(\\mu-\\bar{\\mu})}^{T}\\Omega^{-1}(\\mu-\\bar{\\mu}) &=: x^{T}Ax \\\\\n",
" &= x^{T}L^{T}Lx \\quad \\text{(using Cholesky factorisation)} \\\\\n",
" &= {(Lx)}^{T}{(Lx)} \\\\\n",
" &= {\\|Lx\\|}_2^2 \\\\\n",
" &= {\\|x\\|}_{L^{-1}}^2 \\leq \\kappa^2. \n",
" \\end{align*}\n",
" <!-- TODO I got up to the four line by myself but hadn't come across the last line before Julian suggested it, so need to make sure I'm comfortable with how these special matrix norms work. -->\n",
" In other words, in the norm of $\\Omega$'s Cholesky factor our uncertainty set is a ball of radius $\\kappa$ centred on $\\bar{\\mu}$.\n",
"\n",
"3. Some third footnote which has yet to be written. When is, increment and retain. "
]
}
],
Expand Down