Everything works for IRWLS but is all scripty, about to do code cleanup

nrummel · Jun 10, 2024 · 9c620df · 9c620df
1 parent d52e2fc
commit 9c620df
Show file tree

Hide file tree

Showing 10 changed files with 764 additions and 219 deletions.
diff --git a/Project.toml b/Project.toml
@@ -5,6 +5,7 @@ version = "1.0.0-DEV"
 
 [deps]
 BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
+BlockDiagonals = "0a1fb500-61f7-11e9-3c65-f5ef3456f9f0"
 DifferentialEquations = "0c46a032-eb83-5123-abaf-570d42b7fbaa"
 FFTW = "7a1cc6ca-52ef-59f5-83cd-3a7055c09341"
 ImageFiltering = "6a3955dd-da59-5b1f-98d4-e7296123deb5"
@@ -13,6 +14,7 @@ LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
 MAT = "23992714-dd62-5051-b70f-ba57cb901cac"
+MKL = "33e6dc65-8f57-5167-99aa-e5a354878fb2"
 ModelingToolkit = "961ee093-0014-501f-94e3-6117800e7a78"
 PlotlyJS = "f0f68f2c-4968-5e81-91da-67840de0976a"
 Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"

diff --git a/README.md b/README.md
@@ -30,7 +30,7 @@ The following estimation problem is solved in variety of ways:
 $$w = \underset{\nu}{\operatorname{armin}} \tfrac{1}{2}\|\Theta \nu - \dot{\tilde{u}}\|_2^2 +\lambda\|\nu\|_*$$
 The $\|\cdot\|_*$ is a regularizer on $w$ chosen to promote sparse solutions. This in practice can be the 1 norm or 0 norm. This can per solved via LASSO or orthogonal matching pursuit (varieties of alternative) respectively. 
 ### WSINDy 
-. This extends the SINDy framework but now instead of looking at the strong for we have 
+This extends the SINDy framework but now instead of looking at the strong for we have 
 $$-\langle \dot{\Phi},u\rangle=\langle\Phi,\Theta\rangle w$$
 where $\langle \Phi, \cdot \rangle_k \approx \int_{t_1}^{t_J} \varphi_k(\tau) (\cdot(\tau)) d\tau$ where $\varphi_k \in C_c^\infty[t_1,t_J]$. These are the familiar test functions for the weak form of the differential equation. The boon here is that the derivative can be passed to the test functions to forgo the approximating the derivative. This along with other properties intrinsic to the weak form make it more robust to noisy data. 
 Again this is posed as a regularized least squares
@@ -42,7 +42,7 @@ References:
 ### WENDy
  This algorithm aims to work from a statistical framework instead of looking at the squared error of the coefficients, we instead isolated the error that is related to noise and come up with an estimator based on the asymptotic distribution of the residual. After much derivation and some assumption due to integration error and model mismatch error being negligible we find
 $$S_w^{-1/2}(\langle\Phi,\Theta\rangle w +\langle \dot{\Phi},u\rangle) \stackrel{asymp}{\sim} N(0, I)$$
-where $L_w = \underbrace{\dot{\Phi}}_{\stackrel{\Delta}{=} L_0} + \underbrace{\langle \Phi, \nabla_u \Theta \rangle}_{\stackrel{\Delta}{=} L_1} \times_3 w$ 
+where $L_w = \underbrace{ \mathbb{I}_D \circ \dot{\Phi}}_{\stackrel{\Delta}{=} L_0} + \underbrace{\nabla_u \Theta \circ \Phi, }_{\stackrel{\Delta}{=} L_1} \times_3 w$ 
 and $S_w = L_wL_w^T$.   $S_w^{1/2}$ is the Cholesky Factorization of $S_w$.
 Looking at the negative log-likelihood
 $$-\mathcal{L}(w;u) \stackrel{asymp}{=} (Gw-b)^TS_w^{-1}(Gw-b) + \log(\det(S_w))$$
@@ -93,23 +93,42 @@ Notice that while $b$ is linear in $u$ by the nature of the inner product $G$ is
 $$\begin{align*}
 	r(w) &= G(w) - b \\
 	&= G(w)+ G^*(w) - G^*(w) +G^*(w^*) -G^*(w^*) - b^* - b^\epsilon\\
-	&= \underbrace{G(w) - G^*(w)}_{e^\Theta} + \underbrace{G^*(w) -G^*(w^*)}_{e^{\text{int}}} + \underbrace{G^*(w^*) - b^*}_{r^*} -b^\epsilon
+	&= \underbrace{G(w) - G^*(w)}_{e^\Theta} + \underbrace{G^*(w) -G^*(w^*)}_{r^*} + \underbrace{G^*(w^*) - b^*}_{e^{\text{int}}} -b^\epsilon
 \end{align*}$$
-The WENDy paper states that in the linear case that $e^\text{int}$ is due to numerical integration. This still should be largely true, there could be some more numerical error because evaluation $G(\cdot)$ may have non-negligible noise, but for the sake of simplicity perhaps we can say this is still small. The $r^*$ is the residual evaluated at the true weights, so in general this should be small if enough data is present so that evaluation of $G$ can be done to high accuracy. Ok so maybe we still can say that as the number of points gets large
+Investigating the potential sources of error: 
+- $e^\text{int}$ is due to numerical integration because the only source of error is building $G^*(w),b^*$ with perfect data that satisfies the ODE. 
+- $r^*$ is the difference of the RHS of the weak form evaluated at the true weights vs the approximated one. $$\begin{align*}
+		\begin{cases}r^* = \|\nabla_w G^*(w-w^*)\| ,& \text{ when $G$ is linear in parameters}\\
+		r^*\le \|\nabla_w G_*\|\|w-w^*\| ,& \text{ when $G$ is non-linear in parameters}
+\end{cases}
+	\end{align*}$$As  $w^*\rightarrow w$ (i.e. method is convergent),  $r^* \rightarrow 0$. 
+
+We still can say that as the number of points gets large and the method converges that:
 $$r(w) \rightarrow e^\Theta -b^\epsilon$$
-Inspecting this quantity, we see that we need to linearize $G$ about the true data $u^*$:
+Inspecting this quantity, we see that we need to linearize $G$ about the  data $u$:
 $$\begin{align*}
-	e^\Theta-b^\epsilon &= G^*(w) + \langle \nabla_u G^*(w), \epsilon\rangle + H(u^*,w,\epsilon) - G^*(w) -b^\epsilon \\
-	&= \langle \nabla_u G^*(w), \epsilon\rangle  -b^\epsilon + H(u^*,w,\epsilon)\\
-	&= \langle \nabla_u G^*(w) - \dot{\Phi}, \epsilon\rangle
+	e^\Theta-b^\epsilon &= G(u,w) - G(u^*,w) + \langle \dot{\Phi}, \epsilon \rangle \\
+	&= G(u, w) + \langle \dot{\Phi}, \epsilon \rangle \\
+	&- \Big(G(u,w) - \langle \nabla_u G(u,w), \epsilon \rangle + H(u^*,w,\epsilon) \Big) \\
+	&\approx \langle \nabla_u G(u,w) + \dot{\Phi}, \epsilon\rangle
 \end{align*}$$
-```ad-warning
-title: Wait a second
-We can't evaluate $\nabla_u G^*$! We in practice approximate with $\nabla_u G$. Is this part of the assymptotic argument... seems sketchy...
+Notice that because $\epsilon_m \stackrel{iid}{\sim} N(0,\sigma^2) \Leftrightarrow \epsilon \sim N(0, \sigma^2 I)$. Now if we say that $|\epsilon| \ll 1$ then because $H = O(\epsilon^2)$ and can be throw out. 
+```ad-failure
+title: Asymptotic Distribution Derivation
+
+Assuming that $w \rightarrow w^*$ and $M \gg 1$, then we have
+$$\begin{align*}
+	\mathbb{E}[r(w^*)] &= \mathbb{E}[e^\Theta -b^\epsilon]\\
+	&= \Phi \mathbb{E}[\langle \nabla_u F(w,u), \epsilon \rangle] + \dot{\Phi} \mathbb{E}[\epsilon ] \\
+	&= \Phi \mathbb{E}[\nabla_u F(w,u)\times_3 \epsilon] \\
+	\mathbb{Var}[r(w)] &= \mathbb{E}[r(w)r(w)^T ]  \\ 
+	&\approx \Phi \mathbb{E}[\nabla_u F(w,u)\times_3 \epsilon \;\epsilon^T \times_3 \nabla_u F(w,u)]\Phi^T\\
+	&+ 2 \Phi \mathbb{E}[\nabla_u F(w,u)\times_3 \epsilon \;\epsilon^T ]\dot{\Phi}^T\\
+	&+ \dot{\Phi} \underbrace{\mathbb{E}[ \epsilon \;\epsilon^T ]}_{\sigma^2 I_{D\times (M+1)}}\dot{\Phi}^T\\
+	S_w^{-1/2} (e^\Theta - b^\epsilon) &\stackrel{asymp}{\sim} N(0, I)\\
+\end{align*}$$
+But now  $L_w =  \underbrace{\mathbb{I}_D \circ \dot{\Phi}}_{\stackrel{\Delta}{=} L_0} + \underbrace{\nabla_u F(u,w) \circ \Phi}_{\stackrel{\Delta}{=} L_1(w)}$,
 ```
-Notice that because $\epsilon_m \stackrel{iid}{\sim} N(0,\sigma^2) \Leftrightarrow \epsilon \sim N(0, \sigma^2 I)$. Now if we say that $|\epsilon| \ll 1$ then because $H = O(\epsilon^2)$ and can be throw out, we have linear function of a multivariate gaussian, which is itself multivariate Gaussian
-$$S_w^{-1/2} (e^\Theta - b^\epsilon) \stackrel{asymp}{\sim} N(0, I)$$
-But now  $L_w = \underbrace{\dot{\Phi}}_{\stackrel{\Delta}{=} L_0} + \underbrace{\langle \Phi, \nabla_u F(u,w) \rangle}_{\stackrel{\Delta}{=} L_1(w)}$,
 $S_w = L_wL_w^T$, and $S_w^{-1/2}$ is the Cholesky factorization. This now has a negative log likelihood:
 $$-\mathcal{L}(w; u) = \underbrace{(G(w)-b)^TS_w^{-1}(G(w)-b)}_{f(w;u)} + \log(\det(S_w))$$Neglecting  the $\log\det$ term due to its lack of contribution in general, we obtain 
 $$\begin{align*}
@@ -137,16 +156,32 @@ $$\begin{align*}
 $$G(w) = \langle \Phi,F(u,w)\rangle$$
 So evaluating the cost function require computing the inner product with the test functions. 
 ```
+### Fleshing out $L_w$ 
+Recall that 
+$$\begin{align*}
+F&: \mathbb{R}^{J \times D} \rightarrow \mathbb{R}^D\\
+F(w,u) &= \begin{bmatrix}f_1(w,u)\\
+\vdots \\
+f_D(w,u)\end{bmatrix}\\
+\nabla_uF &: \mathbb{R}^{J \times D} \rightarrow \mathbb{R}^{D\times D}\\\\
+\nabla_uF &=  \begin{bmatrix} \partial_{u_1} f_1(w,u) & \cdots & \partial_{u_D} f_1\\
+\vdots  & \ddots & \vdots\\
+\partial_{u_1} f_D(w,u) & \cdots & \partial_{u_D}f_D(w,u) \end{bmatrix}\\
+G&:\mathbb{R}^{J \times D} \rightarrow \mathbb{R}^{K \times D}\\
+G(w,u) &= \langle \Phi, F(w,u)\rangle = \Phi^T F(w,u)\\
+\nabla_u G&:\mathbb{R}^{J \times D} \rightarrow \mathbb{R}^{K \times D \times D}\\
+\nabla_u G(w,u) &= \langle\Phi, \nabla_uF(w,u)\rangle
+\end{align*}
+$$
+
 ### Picking an initial $w_0$
 Two obvious options: 
-- We can use the McClaurin or Taylor series of $F$ and solve the least squares problem.	$$\begin{align*}
-		F(u,w) &\approx F(u, 0) + \langle\nabla_w F(u,0), w\rangle\\
-		r(w) &\approx \underbrace{\langle \Phi, u\rangle - F(u, 0)}_y - \underbrace{\langle\nabla_w F(u,0), w\rangle}_{Aw}\\
-		\|r(w)\|_2 &\approx \|Aw - y\|_2 \\
-		w &= \underset{\nu}{\operatorname{argmin}} \|r(w)\|_2\\
-		&= (A^TA)^{-1}A^Tb
+- Solve the non-linear least squares problem neglecting the covariance. This is equivalent to the "WSINDy" $w_0$. We can doe this with any local minimization method, we can use a derivative free method or use the gradient of $G$:$$\begin{align*}
+		\min_{w} \frac{1}{2} \|G(w)-b\|_2^2\\\\
+		G(w) &= \Phi F(w)\\
+		\nabla_w G(w) &= \Phi \nabla_w F(w)
 	\end{align*}$$
-- Randomly pick in a desired domain or prior distribution? 
+- Randomly pick in a desired domain or prior distribution? I think it is reasonable to allow the use of prior knoledge or constraints on the parameter values.
 ## Test Problems
 ### FitzHugh–Nagumo Equations
 $$
@@ -210,27 +245,4 @@ title: Outstanding Questions
 	- _Continuously Stirred Tank Reactor Equations_ from [[Ramsay et al. - 2007 - Parameter Estimation for Differential Equations a.pdf#page=4|Ramsay]]
 ```
 
-## Identifiably of Parameters
-[Julia Implementation](https://docs.sciml.ai/ModelingToolkit/stable/tutorials/parameter_identifiability/) 
-
-## Question for Dan
-## Computing $V,V^\prime$
-- Why are there different coefficients for trap rule when we move $V$ to $V^\prime$?
-	$$\begin{align*}
-		\Phi &= \begin{bmatrix*}\phi_1 \cdots \phi_K \end{bmatrix*}\\
-		\mathcal{Q} &= \operatorname{diag}[(\tfrac{\Delta t}{2},\Delta t,\cdots ,\Delta t,\tfrac{\Delta t}{2})]\\
-		\text{In paper} &\rightarrow \begin{cases} 
-		V &= \Phi \mathcal{Q}\\
-		V^\prime &= -\dot{\Phi} \mathcal{Q}
-		 \end{cases}\\
-\text{In code} &\rightarrow \begin{cases} 
-		V &= \Phi \mathcal{Q}\\
-		V^\prime &= -\frac{1}{m_t*\Delta t} \dot{\Phi} \mathcal{Q} 
-		 \end{cases}
-	\end{align*}$$
-### SVD reduction of $V,V^\prime$ 
 
-### Bugs in Code
-- Error in `fdcoeffF.m` 
-- Index bug in `_getTestFunctionWeights`
-- Index bug in `VVp_svd` Fixing symmetry?
diff --git a/src/NLPMLE.jl b/src/NLPMLE.jl
@@ -20,23 +20,23 @@ include("testFunctions.jl")
 
 function runAlgo(;
     exampleFile::String=joinpath(@__DIR__, "../data/LogisticGrowth.mat"),
+    mdl::ODESystem=LOGISTIC_GROWTH_MODEL,
+    ϕ::Function=ExponentialTestFun(),
     noise_ratio::Real=0.05,
-    subsample::Int=1,
+    time_subsample_rate::Int=1,
     mt_params::AbstractVector = 2 .^(0:3),
-    K_max::Real=5e3,
     radMeth::RadMethod=MtminRadMethod(),
+    pruneMeth::TestFunctionPruningMethod=SingularValuePruningMethod(UniformDiscritizationMethod())
     seed::Real=1.0,
-    ϕ::Function=ExponentialTestFun(),
-    toggle_VVp_svd::Union{Int, Nothing}=nothing
 )
     BSON.@load exampleFile t u 
-    Mp1, D = size(u)
-    num_test_fun_params = length(mt_params)
     @assert Mp1 == length(t) "Number of time points should match dependent variable array"
-    @assert mod(subsample,2) ==0 || subsample == 1 "Subsample rate should be divisible by 2"
+    @assert mod(time_subsample_rate,2) ==0 || time_subsample_rate == 1 "Subsample rate should be divisible by 2"
     ## Subsample the data
-    tobs = t[1:subsample:end]
-    uobs = u[:,1:subsample:end]
+    tobs = t[1:time_subsample_rate:end]
+    uobs = u[:,1:time_subsample_rate:end]
+    Mp1, D = size(uobs)
+    num_rad = length(mt_params)
     ## Add noise 
     Random.seed!(seed)
     uobs = generateNoise(uobs, noise_ratio)
@@ -47,7 +47,7 @@ function runAlgo(;
     mt_max = maximum(floor((M-1)/2)-K_min,1) 
     mt_min = rad_select(tobs,xobs,ϕ,mt_max)
 
-    mt = zeros(num_test_fun_params, D)
+    mt = zeros(num_rad, D)
     for (n,mtp) in enumerate(mt_params),d in 1:N
         mt[n,d] = radMeth(xobs, tobs, ϕ, mtp, mt_min, mt_max)
     end

diff --git a/src/computeGradients.jl b/src/computeGradients.jl
@@ -7,29 +7,9 @@ end
 function getRHS(mdl)
     w = parameters(mdl)
     u = unknowns(mdl)
-    J = length(w)
-    D = length(u)
     rhs_sym = _getRHS_sym(mdl)
-    _f,_ = build_function(rhs_sym, w,u; expression=false)
-    function f(w::SVector{J,<:Real},u::SVector{D,<:Real}, ::Val{J}, ::Val{D}) where {J,D} 
-        return _f(w,u)
-    end
-    function rhs(w::AbstractVector{<:Real},u::AbstractVector{<:Real}) 
-        try
-            return f(SA[w...],SA[u...],Val(J), Val(D))
-        catch err 
-            if length(w) != J 
-                @error "length(w) = $(length(w)) != $J"
-                Base.throw_checksize_error(w, (J,))
-            elseif length(u) != D 
-                @error "length(u) = $(length(u)) != $D"
-                Base.throw_checksize_error(u, (D,))
-            else 
-                throw(err)
-            end
-        end
-    end
-    return rhs 
+    _rhs,_rhs! = build_function(rhs_sym, w,u; expression=false)
+    return _rhs,_rhs! 
 end
 
 function _getParameterJacobian_sym(mdl)
@@ -41,27 +21,43 @@ end
 function getParameterJacobian(mdl)
     w = parameters(mdl)
     u = unknowns(mdl)
-    J = length(w)
-    D = length(u)
     jac_sym = _getParameterJacobian_sym(mdl)
-    _f,_ = build_function(jac_sym, w,u; expression=false)
-    function f(w::SVector{J,<:Real},u::SVector{D,<:Real}, ::Val{J}, ::Val{D}) where {J,D} 
-        return _f(w,u)
-    end
-    function jac(w::AbstractVector{<:Real},u::AbstractVector{<:Real}) 
-        try
-            return f(SA[w...],SA[u...],Val(J),Val(D))
-        catch err 
-            if length(w) != J 
-                @error "length(w) = $(length(w)) != $J"
-                Base.throw_checksize_error(w, (J,))
-            elseif length(u) != D 
-                @error "length(u) = $(length(u)) != $D"
-                Base.throw_checksize_error(u, (D,))
-            else 
-                throw(err)
-            end
-        end
-    end
-    return jac
-end
+    jac,jac! = build_function(jac_sym, w,u; expression=false)
+
+    return jac, jac!
+end
+
+function _getJacobian_sym(mdl)
+    u = unknowns(mdl)
+    rhs_sym = _getRHS_sym(mdl)
+    return jacobian(rhs_sym, u)
+end
+
+function getJacobian(mdl)
+    w = parameters(mdl)
+    u = unknowns(mdl)
+    jac_sym = _getJacobian_sym(mdl)
+    jac,jac! = build_function(jac_sym, w,u; expression=false)
+    return jac, jac!
+end
+## Maybe put size checks in these functions? 
+# J = length(w)
+# D = length(u)
+# function f(w::SVector{J,<:Real},u::SVector{D,<:Real}, ::Val{J}, ::Val{D}) where {J,D} 
+#     return _f(w,u)
+# end
+# function jac(w::AbstractVector{<:Real},u::AbstractVector{<:Real}) 
+#     try
+#         return f(SA[w...],SA[u...],Val(J),Val(D))
+#     catch err 
+#         if length(w) != J 
+#             @error "length(w) = $(length(w)) != $J"
+#             Base.throw_checksize_error(w, (J,))
+#         elseif length(u) != D 
+#             @error "length(u) = $(length(u)) != $D"
+#             Base.throw_checksize_error(u, (D,))
+#         else 
+#             throw(err)
+#         end
+#     end
+# end