-
Notifications
You must be signed in to change notification settings - Fork 5
Calling `leastsq` via generic interface
The least-squares algorithm from MINPACK wrapped by scipy.optimize.leastsq
are very well designed and efficient. It's quite hard to reach their average performance (in terms of nfev) as they contain a lot of small witty tricks. So it's desirable to allow calling it from our general interface least_squares
, let's call it 'lm' (Levenberg-Marquardt). But it comes with the cost of incompatibility and inconsistency.
My idea is to try unify algorithms as much as possible. I suggest to expose all common parameters to the interface and make usage of options
almost unnecessary:
least_squares(fun, x0, bounds=(None, None), method=None, jac='2-point',
ftol=EPS**0,5 xtol=EPS**0.5, gtol=EPS**0.5, max_nfev=None,
scaling=1.0, diff_step=None, args=(), kwargs=None, options=None)
Nikolay: When we discussed approx_derivative
we agreed that usage of {}
as default is fine. Although I advocated to use None
:) Overall I don't mind both variants. Let's settle it to None
.
Here are some explanations:
- The signature of
fun
isfun(x, *args, **kwargs)
. Ditto forjac
, when it's a callable. -
method=None
assumes auto-selection of the appropriate method among['tff', 'dogbox', 'lm']
. Default is 'trf'. Nikolay: I thought of more intelligent method selection. Now we can indeed setmethod='trf'
. But I suggest to change it tomethod='auto'
later. -
jac='2-point'
suggests usage of 2-point numerical approximation of Jacobian, another options is '3-point'. Forleastsq
'3-point' and '2-point' will work the same way (perhaps with the warning about that). Allowed values are '2-point', '3-point' or a callable. -
max_nfev=None
suggests default value for each algorithm. -
scaling=1.0
determines variables scaling. MINPACK calls this parameter asdiag
and it is reciprocal to myscaling
, i. e.diag = 1 / scaling
. I think it's nice unifying approach, we need just to computediag
before calling MINPACK. Andscaling='auto'
is equivalent todiag=None
. Note that the default is 1.0 (no scaling, as appropriate for bounded problems), unlikeleastsq
where the default is to scale. Q invertscaling
for consistency w/minpack? Up to you. Nikolay: Yes, let's change it. I came to the same conclusion. Can't we improve scaling strategy? For example,scaling=None
means using default strategy (Jacobian scaling for unbounded problems, 1.0 for bounded),scaling='jac'
forces to use Jacobian scaling. What do you think? -
diff_step=None
determines step size selection for numerical differentiation,None
suggests default step selection. If we will agree on "Press rule" for calculation the actual step, thendiff_step
is a relative step size. MINPACK usesepsfcn
and determines the step assqrt(epsfcn) * x
, so we just setepsfcn=diff_step**2
when calling MINPACK. -
args
,kwargs
- parameters passed to thefun
andjac
. Raise a warning thatleastsq
doesn't supportkwargs
. -
options
has limited usage to allow passingfactor
andcol_deriv
toleastsq
. I consider to disablecol_deriv=1
as it will only complicate documentation without any benefits (just compute Jacobian in a normal way, please!) Not sure how to handle options already presented in parameters, it will cause an error. Probably we can rely on this error and that's all.**options
are passed directly to the underlying solver. Unknown options or duplicate entries raise TypeErrors. -
bounds != (None,None)
andmethod='lm'
raise a ValueError. Nikolay: Sure, that't also a part of bigger method selection logic (which isn't defined yet).
Another point to consider is what least_squares
returns. Obviously, we want it to return the same OptimizeResult
object for all methods. Let's consider all fields we currently return in view whether we can return them in case of method='lm'
:
-
x
- returned by MINPACK. -
fun
(residuals) - returned by MINPACK. -
jac
- we can't compute it from MINPACK returns (Q matrix from QR factorization is missing). I think it's fine to just calljac(x)
one more time. -
obj_value
- no problem to compute. -
optimality
- havingjac
andfun
easy to compute. -
active_mask
- all zeros for unbounded problem. -
nfev
,njev
- returned by MINPACK. -
nit
- not returned by MINPACK, set to -1 or None. -
status
- returned by MINPACK. We should translate it to one of the standard statuses returned byleast_squares
, it's not hard to do. -
success
- derived from status. I think we can dropsuccess
completely (or keep, not a big deal). -
message
- again return one of the standard unified messages based on status.
leastsq
also return the inverse of J.T.dot(J)
, it is sort of a covariance matrix of estimated parameters assuming that residuals has identity covariance matrix (I need to think about it more, maybe it's not useful at all). It's not connected to optimization algorithms and not always doable (consider big sparse matrices, which will appear here hopefully soon). So we can probably add this to OptimizeResult
for MINPACK only (or not), need some sensible name for it.