QuasiNewton (BFGS, LBFGS)¶
This module provides an implementation of QuasiNewton methods (BFGS, sBFGS and lBFGS).
The Taylor expansion up to second order of a function \(f(\theta_t)\) allows a local quadratic approximiation of \(f(\theta_t + d_t)\):
\[f(\theta_t + d_t) \approx f(\theta_t) + d_t^Tf'(\theta_t) + \frac{1}{2}d_t^TH_td_t\]
where the symmetric positive definite matrix \(H_t\) is the Hessian at \(\theta_t\). The minimizer \(d_t\) of this convex quadratic model is:
\[d_t = H^{1}f'(\theta_t).\]
For large scale problems both computing/storing the Hessian and solving the above linear system is computationally demanding. Instead of recomputing the Hessian from scratch at every iteration, quasiNewton methods utilize successive measurements of the gradient to build a sufficiently good quadratic model of the objective function. The above formula is then applied to yield a direction \(d_t\). The update done is then of the form
\[\theta_{t+1} = \alpha_t d_t + \theta_t\]
where \(\alpha_t\) is obtained with a line search.
Note
The classes presented here are not working with gnumpy.

class
climin.bfgs.
Bfgs
(wrt, f, fprime, initial_inv_hessian=None, line_search=None, args=None)¶ BFGS (BroydenFletcherGoldfarbShanno) is one of the most wellknwon quasiNewton methods. The main idea is to iteratively construct an approximate inverse Hessian \(B^{1}_t\) by a rank2 update:
\[B^{1}_{t+1} = B^{1}_t + (1 + \frac{y_t^TB^{1}_ty_t}{y_t^Ts_t})\frac{s_ts_t^T}{s_t^Ty_t}  \frac{s_ty_t^TB^{1}_t + B^{1}_ty_ts_t^T}{s_t^Ty_t},\]where \(y_t = f(\theta_{t+1})  f(\theta_{t})\) and \(s_t = \theta_{t+1}  \theta_t\).
The storage requirements for BFGS scale quadratically with the number of variables. For detailed derivations, see [nocedal2006a], chapter 6.
[nocedal2006a] (1, 2) Nocedal, J. and Wright, S. (2006), Numerical Optimization, 2nd edition, Springer. Attributes
wrt (array_like) Current solution to the problem. Can be given as a first argument to .f
and.fprime
.f (Callable) The object function. fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt
.initial_inv_hessian (array_like) The initial estimate of the approximiate Hessian. line_search (LineSearch object.) Line search object to perform line searches with. args (iterable) Iterator over arguments which fprime
will be called with.Methods

__init__
(wrt, f, fprime, initial_inv_hessian=None, line_search=None, args=None)¶ Create a BFGS object.
Parameters: wrt : array_like
Array that represents the solution. Will be operated upon in place.
f
andfprime
should accept this array as a first argument.f : callable
The objective function.
fprime : callable
initial_inv_hessian : array_like
The initial estimate of the approximiate Hessian.
line_search : LineSearch object.
Line search object to perform line searches with.
args : iterable
Iterator over arguments which
fprime
will be called with.


class
climin.bfgs.
Lbfgs
(wrt, f, fprime, initial_hessian_diag=1, n_factors=10, line_search=None, args=None)¶ lBFGS (limitedmemory BFGS) is a limited memory variation of the wellknown BFGS algorithm. The storage requirement for BFGS scale quadratically with the number of variables, and thus it tends to be used only for smaller problems. Limitedmemory BFGS reduces the storage by only using the \(l\) latest updates (factors) in computing the approximate Hessian inverse and representing this approximation only implicitly. More specifically, it stores the last \(l\) BFGS update vectors \(y_t\) and \(s_t\) and uses these to implicitly perform the matrix operations of BFGS (see [nocedal2006a]).
Note
In order to handle simple box constraints, consider
scipy.optimize.fmin_l_bfgs_b
.Attributes
wrt (array_like) Current solution to the problem. Can be given as a first argument to .f
and.fprime
.f (Callable) The object function. fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt
.initial_hessian_diag (array_like) The initial estimate of the diagonal of the Hessian. n_factors (int) The number of factors that should be used to implicitly represent the inverse Hessian. line_search (LineSearch object.) Line search object to perform line searches with. args (iterable) Iterator over arguments which fprime
will be called with.Methods

__init__
(wrt, f, fprime, initial_hessian_diag=1, n_factors=10, line_search=None, args=None)¶ Create an Lbfgs object.
Attributes
wrt (array_like) Current solution to the problem. Can be given as a first argument to .f
and.fprime
.f (Callable) The object function. fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt
.initial_hessian_diag (array_like) The initial estimate of the diagonal of the Hessian. n_factors (int) The number of factors that should be used to implicitly represent the inverse Hessian. line_search (LineSearch object.) Line search object to perform line searches with. args (iterable) Iterator over arguments which fprime
will be called with.
