Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Manopt as the optimization package #105

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open

Use Manopt as the optimization package #105

wants to merge 23 commits into from

Conversation

pbrehmer
Copy link
Collaborator

This PR will switch out OptimKit for Manopt to enable more advanced optimization features and better control over optimization settings and outputs.

Since we haven't yet defined a PEPS manifold using the ManifoldsBase.jl interface, we will resort to vectorizing the PEPS using to_vec which we optimize - this should not incur significant overhead cost.

@pbrehmer pbrehmer marked this pull request as draft December 13, 2024 15:26
@pbrehmer
Copy link
Collaborator Author

This hopefully shouldn't be too much work. I will try to finish this up beginning of next week.

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 16.15385% with 109 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/algorithms/optimization/peps_optimization.jl 0.00% 54 Missing ⚠️
src/algorithms/optimization/manopt.jl 13.33% 26 Missing ⚠️
src/algorithms/ctmrg/sequential.jl 0.00% 12 Missing ⚠️
...rithms/optimization/fixed_point_differentiation.jl 0.00% 8 Missing ⚠️
src/environments/ctmrg_environments.jl 0.00% 8 Missing ⚠️
src/states/infinitepeps.jl 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/PEPSKit.jl 87.50% <ø> (ø)
src/algorithms/ctmrg/ctmrg.jl 70.90% <100.00%> (-16.37%) ⬇️
src/algorithms/ctmrg/simultaneous.jl 88.23% <100.00%> (-10.10%) ⬇️
src/algorithms/toolbox.jl 18.05% <ø> (-79.39%) ⬇️
src/utility/symmetrization.jl 82.29% <ø> (-1.83%) ⬇️
src/utility/util.jl 20.43% <100.00%> (-31.70%) ⬇️
src/states/infinitepeps.jl 36.00% <0.00%> (-33.39%) ⬇️
...rithms/optimization/fixed_point_differentiation.jl 3.84% <0.00%> (ø)
src/environments/ctmrg_environments.jl 24.35% <0.00%> (-37.35%) ⬇️
src/algorithms/ctmrg/sequential.jl 0.00% <0.00%> (-98.28%) ⬇️
... and 2 more

... and 18 files with indirect coverage changes

@pbrehmer
Copy link
Collaborator Author

I got fixedpoint to optimize with Manopt now. However, the linesearching (or something else) still seems to be weird and the optimization is slow to converge and it actually doesn't really converge at all. I'm not yet sure where the problem lies. Also, we can't directly compare against OptimKit since Manopt doesn't have the Hager-Zhang linesearch algorithm.

One annoying thing was that Manopt doesn't support supplying a function which provides both the cost and the gradient for quasi_Newton such that we have to resort to a workaround.

While Manopt seems more powerful and versatile for sure, it does take a bit of love to make it work, it seems...

@lkdvos
Copy link
Member

lkdvos commented Dec 19, 2024

I glanced over everything, I have to admit that I can't immediately see what's going wrong. Did you by any chance try the simplest option without caching? That should be slow, but still work.

One annoying thing was that Manopt doesn't support supplying a function which provides both the cost and the gradient for quasi_Newton such that we have to resort to a workaround.

At initial glance, that shouldn't be a big problem, right? Simply specifying both should be possible, and we can then avoid the recalculation with some kind of cache afterwards (I would first focus on getting it to work though)

Also, we can't directly compare against OptimKit since Manopt doesn't have the Hager-Zhang linesearch algorithm.

I think it does? From what I gather, you can use any linesearch in LineSearches.jl, which does contain Hager-Zhang: https://julianlsolvers.github.io/LineSearches.jl/latest/reference/linesearch.html#LineSearches.HagerZhang

@pbrehmer
Copy link
Collaborator Author

Simply specifying both should be possible, and we can then avoid the recalculation with some kind of cache afterwards

That's what I do at the moment and that works in principle. The thing that is a bit ugly about it is that in order to record things from the cache during optimization, the cost function needs the functor struct such that one can access it using get_objective and the gradient function is a separate function defined on the cache.

From what I gather, you can use any linesearch in LineSearches.jl

Oh I wasn't aware, that is really neat. Then I'll try Hager-Zhang and see if that solves the problems I currently have with optimization.

@pbrehmer
Copy link
Collaborator Author

Somehow this is a tough nut to crack. I was trying around today but with no luck, even the simplest option without caching won't optimize. I think the next thing I'll try is to go back to OptimKit but with the current cost/gradient/cache setup, just to rule out that there are problems with that and not with the optimization/linesearching parameters.

There's also a really weird Zygote error (independent of Manopt I believe) which only comes up the first time I execute fixedpoint after precompilation but it runs properly the second time and after that. Somehow Zygote can't handle that we now return the truncation error and condition number in the NamedTuple that is returned by ctmrg_iteration and leading_boundary.

ERROR: type NamedTuple has no field truncation_error
Stacktrace:
  [1] getproperty
    @ ./Base.jl:49 [inlined]
  [2] macro expansion
    @ ~/.julia/packages/Zygote/nyzjS/src/lib/lib.jl:326 [inlined]
  [3] (::Zygote.Jnew{…})(Δ::@NamedTuple{…})
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/lib/lib.jl:320
  [4] (::Zygote.var"#2220#back#318"{…})(Δ::@NamedTuple{…})
    @ Zygote ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:72
  [5] NamedTuple
    @ ./boot.jl:727 [inlined]
  [6] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::@NamedTuple{err::Float64, U::Nothing, S::TrivialTensorMap{…}, V::Nothing})
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/compiler/interface2.jl:0
  [7] simultaneous_projectors
    @ ~/repos/PEPSKit.jl/src/algorithms/ctmrg/simultaneous.jl:97 [inlined]
  [8] (::Zygote.Pullback{Tuple{…}, Any})(Δ::Tuple{Tuple{…}, Nothing})
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/compiler/interface2.jl:0
  [9] ctmrg_iteration
    @ ~/repos/PEPSKit.jl/src/algorithms/ctmrg/simultaneous.jl:37 [inlined]
 [10] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::Tuple{CTMRGEnv{…}, Nothing})
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/compiler/interface2.jl:0
 [11] f
    @ ~/repos/PEPSKit.jl/src/algorithms/peps_opt.jl:469 [inlined]
 [12] (::Zygote.Pullback{Tuple{…}, Any})(Δ::CTMRGEnv{TrivialTensorMap{…}, TrivialTensorMap{…}})
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/compiler/interface2.jl:0
 [13] (::Zygote.var"#ad_pullback#61"{Tuple{…}, Zygote.Pullback{…}})(Δ::CTMRGEnv{TrivialTensorMap{…}, TrivialTensorMap{…}})
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/compiler/chainrules.jl:264
 [14] (::PEPSKit.var"#∂f∂x#405"{…})(x::CTMRGEnv{…})
    @ PEPSKit ~/repos/PEPSKit.jl/src/algorithms/peps_opt.jl:474
 [15] apply(f::PEPSKit.var"#∂f∂x#405"{…}, x::CTMRGEnv{…})
    @ KrylovKit ~/.julia/packages/KrylovKit/xccMN/src/apply.jl:2
 [16] apply(operator::Function, x::CTMRGEnv{TrivialTensorMap{…}, TrivialTensorMap{…}}, α₀::ComplexF64, α₁::ComplexF64)
    @ KrylovKit ~/.julia/packages/KrylovKit/xccMN/src/apply.jl:5
 [17] linsolve(operator::Function, b::CTMRGEnv{…}, x₀::CTMRGEnv{…}, alg::KrylovKit.BiCGStab{…}, a₀::Int64, a₁::Int64; alg_rrule::KrylovKit.BiCGStab{…})
    @ KrylovKit ~/.julia/packages/KrylovKit/xccMN/src/linsolve/bicgstab.jl:165
 [18] linsolve(operator::Function, b::CTMRGEnv{…}, x₀::CTMRGEnv{…}, alg::KrylovKit.BiCGStab{…}, a₀::Int64, a₁::Int64)
    @ KrylovKit ~/.julia/packages/KrylovKit/xccMN/src/linsolve/bicgstab.jl:1
 [19] fpgrad(∂F∂x::CTMRGEnv{…}, ∂f∂x::Function, ∂f∂A::PEPSKit.var"#∂f∂A#404"{…}, y₀::CTMRGEnv{…}, alg::LinSolver{…})
    @ PEPSKit ~/repos/PEPSKit.jl/src/algorithms/peps_opt.jl:548
 [20] leading_boundary_fixed_pullback
    @ ~/repos/PEPSKit.jl/src/algorithms/peps_opt.jl:475 [inlined]
 [21] #37
    @ ~/repos/PEPSKit.jl/src/utility/hook_pullback.jl:34 [inlined]
 [22] ZBack
    @ ~/.julia/packages/Zygote/nyzjS/src/compiler/chainrules.jl:212 [inlined]
 [23] (::Zygote.var"#kw_zpullback#56"{PEPSKit.var"#37#38"{…}})(dy::Tuple{CTMRGEnv{…}, Nothing})
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/compiler/chainrules.jl:238
 [24] #393
    @ ~/repos/PEPSKit.jl/src/algorithms/peps_opt.jl:282 [inlined]
 [25] (::Zygote.Pullback{Tuple{PEPSKit.var"#393#395"{…}, InfinitePEPS{…}}, Any})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/compiler/interface2.jl:0
 [26] (::Zygote.var"#78#79"{Zygote.Pullback{Tuple{PEPSKit.var"#393#395"{…}, InfinitePEPS{…}}, Any}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/compiler/interface.jl:91
 [27] withgradient(f::Function, args::InfinitePEPS{TrivialTensorMap{ComplexSpace, 1, 4, Matrix{ComplexF64}}})
    @ Zygote ~/.julia/packages/Zygote/nyzjS/src/compiler/interface.jl:213
 [28] cost_and_grad!(cache::PEPSKit.PEPSCostFunctionCache{Float64}, peps_vec::Vector{Float64})
    @ PEPSKit ~/repos/PEPSKit.jl/src/algorithms/peps_opt.jl:281
 [29] gradient_function
    @ ~/repos/PEPSKit.jl/src/algorithms/peps_opt.jl:335 [inlined]
 [30] get_gradient
    @ ~/.julia/packages/Manopt/07ePo/src/plans/gradient_plan.jl:171 [inlined]
 [31] get_gradient
    @ ~/.julia/packages/Manopt/07ePo/src/plans/gradient_plan.jl:142 [inlined]
 [32] quasi_Newton!(M::Euclidean{…}, mgo::ManifoldGradientObjective{…}, p::Vector{…}; cautious_update::Bool, cautious_function::Function, debug::Vector{…}, retraction_method::ExponentialRetraction, vector_transport_method::ParallelTransport, basis::DefaultOrthonormalBasis{…}, direction_update::InverseBFGS, memory_size::Int64, project!::typeof(copyto!), initial_operator::Matrix{…}, initial_scale::Float64, stepsize::Manopt.WolfePowellLinesearchStepsize{…}, stopping_criterion::StopWhenAny{…}, kwargs::@Kwargs{…})
    @ Manopt ~/.julia/packages/Manopt/07ePo/src/solvers/quasi_Newton.jl:328
 [33] #quasi_Newton#987
    @ ~/.julia/packages/Manopt/07ePo/src/solvers/quasi_Newton.jl:252 [inlined]
 [34] quasi_Newton
    @ ~/.julia/packages/Manopt/07ePo/src/solvers/quasi_Newton.jl:248 [inlined]
 [35] quasi_Newton(M::Euclidean{…}, f::PEPSKit.PEPSCostFunctionCache{…}, grad_f::PEPSKit.var"#gradient_function#397"{…}, p::Vector{…}; evaluation::AllocatingEvaluation, kwargs::@Kwargs{…})
    @ Manopt ~/.julia/packages/Manopt/07ePo/src/solvers/quasi_Newton.jl:245
 [36] fixedpoint(peps₀::InfinitePEPS{…}, operator::LocalOperator{…}, alg::PEPSOptimize{…}, env₀::CTMRGEnv{…})
    @ PEPSKit ~/repos/PEPSKit.jl/src/algorithms/peps_opt.jl:400
 [37] top-level scope
    @ ~/repos/PEPSKit.jl/test/manopt.jl:45
Some type information was truncated. Use `show(err)` to see complete types.

Any ideas on this @lkdvos ?

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Dec 20, 2024

Update: it was a typo in the withgradient block which lead to a wrong gradient ¯_(ツ)_/¯

@lkdvos
Copy link
Member

lkdvos commented Dec 20, 2024

Do you know if Manopt might be doing some in place copying which is overwriting some tensors? This is somehow the only thing I can think of, because otherwise it should really be doing the same thing. It might be worth it to check gradient descend first, since that should be slow but very simple

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Dec 20, 2024

Do you know if Manopt might be doing some in place copying which is overwriting some tensors? This is somehow the only thing I can think of, because otherwise it should really be doing the same thing.

As far as I can tell, it doesn't do that by default, all in-place functionality needs to be enabled explicitly. But by fixing that typo the optimization now works and seems to be consistent with OptimKit!

When comparing Manopt and OptimKit, I was taking a closer look at the available linesearching algorithms and found that the Hager-Zhang from LineSearches behaves slightly differently from the one in OptimKit: The reason is that the OptimKit implementation has a kwarg acceptfirst which is set true for LBFGS such that the initial guess for the step size is directly accepted if all necessary conditions are met - that means no real linesearching is performed. In practice, the step size is initialized to 1.0 and it seems that in most optimization iterations this is immediately accepted. The LineSearches implementation on the other hand doesn't have this acceptfirst and actually performs linesearching which results in larger stepsizes that somehow lead to less efficient optimizations in the simple examples/tests I checked.

Long story short: I will need to look for a good default linesearching algorithm and from the first tests it seems that AdaptiveWNGradient works pretty well.

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Dec 20, 2024

Okay, the PR is mostly finished I would say but there are a few things that still need to be resolved:

  • Fix the truncation_error error
  • Fix the SpaceMismatch error in the gradients.jl test
  • Accelerate the pwave.jl test

I'm not sure if I get that in before Christmas but otherwise I'll finish it up next year :-)

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jan 17, 2025

It seems that the vectorization is a bit slower for fermionic tensors, as in the pwave.jl test, which is one reason why it still relatively slow. However, I think the difference to the OptimKit version is probably still negligible in real applications.

@lkdvos I could probably need help regarding the truncation_error that only occasionally comes up, if you have time. Seems like a weird Zygote thing perhaps. Hopefully, I can finish this up next week.

@pbrehmer pbrehmer marked this pull request as ready for review January 17, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants