Skip to content

Calling `leastsq` via generic interface

Nikolay Mayorov edited this page Jun 28, 2015 · 20 revisions

The least-squares algorithm from MINPACK wrapped by scipy.optimize.leastsq are very well designed and efficient. It's quite hard to reach their average performance (in terms of nfev) as they contain a lot of small witty tricks. So it's desirable to allow calling it from our general interface least_squares, let's call it 'lm' (Levenberg-Marquardt). But it comes with the cost of incompatibility and inconsistency.

My idea is to try unify algorithms as much as possible. I suggest to expose all common parameters to the interface and make usage of options almost unnecessary:

least_squares(fun, x0, bounds=(None, None), method=None, jac='2-point',
              ftol=EPS**0,5 xtol=EPS**0.5, gtol=EPS**0.5, max_nfev=None,
              scaling=1.0, diff_step=None, args=(), kwargs=None, options=None)

Nikolay: When we discussed approx_derivative we agreed that usage of {} as default is fine. Although I advocated to use None :) Overall I don't mind both variants. Let's settle it to None.

Here are some explanations:

  1. The signature of funis fun(x, *args, **kwargs). Ditto for jac, when it's a callable.
  2. method=None assumes auto-selection of the appropriate method among ['tff', 'dogbox', 'lm']. Default is 'trf'. Nikolay: I thought of more intelligent method selection. Now we can indeed set method='trf'. But I suggest to change it to method='auto' later.
  3. jac='2-point' suggests usage of 2-point numerical approximation of Jacobian, another options is '3-point'. For leastsq '3-point' and '2-point' will work the same way (perhaps with the warning about that). Allowed values are '2-point', '3-point' or a callable.
  4. max_nfev=None suggests default value for each algorithm.
  5. scaling=1.0 determines variables scaling. MINPACK calls this parameter as diag and it is reciprocal to my scaling, i. e. diag = 1 / scaling. I think it's nice unifying approach, we need just to compute diag before calling MINPACK. And scaling='auto' is equivalent to diag=None. Note that the default is 1.0 (no scaling, as appropriate for bounded problems), unlike leastsq where the default is to scale. Q invert scaling for consistency w/minpack? Up to you. Nikolay: Yes, let's change it. I came to the same conclusion. Can't we improve scaling strategy? For example, scaling=None means using default strategy (Jacobian scaling for unbounded problems, 1.0 for bounded), scaling='jac' forces to use Jacobian scaling. What do you think?
  6. diff_step=None determines step size selection for numerical differentiation, None suggests default step selection. If we will agree on "Press rule" for calculation the actual step, then diff_step is a relative step size. MINPACK uses epsfcn and determines the step as sqrt(epsfcn) * x, so we just set epsfcn=diff_step**2 when calling MINPACK.
  7. args, kwargs - parameters passed to the fun and jac. Raise a warning that leastsq doesn't support kwargs.
  8. options has limited usage to allow passing factor and col_deriv to leastsq. I consider to disable col_deriv=1 as it will only complicate documentation without any benefits (just compute Jacobian in a normal way, please!) Not sure how to handle options already presented in parameters, it will cause an error. Probably we can rely on this error and that's all. **options are passed directly to the underlying solver. Unknown options or duplicate entries raise TypeErrors.
  9. bounds != (None,None) and method='lm' raise a ValueError. Nikolay: Sure, that't also a part of bigger method selection logic (which isn't defined yet).

Another point to consider is what least_squares returns. Obviously, we want it to return the same OptimizeResult object for all methods. Let's consider all fields we currently return in view whether we can return them in case of method='lm':

  1. x - returned by MINPACK.
  2. fun (residuals) - returned by MINPACK.
  3. jac - we can't compute it from MINPACK returns (Q matrix from QR factorization is missing). I think it's fine to just call jac(x) one more time.
  4. obj_value - no problem to compute.
  5. optimality - having jac and fun easy to compute.
  6. active_mask - all zeros for unbounded problem.
  7. nfev, njev - returned by MINPACK.
  8. nit - not returned by MINPACK, set to -1 or None.
  9. status - returned by MINPACK. We should translate it to one of the standard statuses returned by least_squares, it's not hard to do.
  10. success - derived from status. I think we can drop success completely (or keep, not a big deal).
  11. message - again return one of the standard unified messages based on status.

leastsq also return the inverse of J.T.dot(J), it is sort of a covariance matrix of estimated parameters assuming that residuals has identity covariance matrix (I need to think about it more, maybe it's not useful at all). It's not connected to optimization algorithms and not always doable (consider big sparse matrices, which will appear here hopefully soon). So we can probably add this to OptimizeResult for MINPACK only (or not), need some sensible name for it.

Clone this wiki locally