# Quasi-Newton (BFGS, L-BFGS)¶

This module provides an implementation of Quasi-Newton methods (BFGS, sBFGS and l-BFGS).

The Taylor expansion up to second order of a function $$f(\theta_t)$$ allows a local quadratic approximiation of $$f(\theta_t + d_t)$$:

$f(\theta_t + d_t) \approx f(\theta_t) + d_t^Tf'(\theta_t) + \frac{1}{2}d_t^TH_td_t$

where the symmetric positive definite matrix $$H_t$$ is the Hessian at $$\theta_t$$. The minimizer $$d_t$$ of this convex quadratic model is:

$d_t = -H^{-1}f'(\theta_t).$

For large scale problems both computing/storing the Hessian and solving the above linear system is computationally demanding. Instead of recomputing the Hessian from scratch at every iteration, quasi-Newton methods utilize successive measurements of the gradient to build a sufficiently good quadratic model of the objective function. The above formula is then applied to yield a direction $$d_t$$. The update done is then of the form

$\theta_{t+1} = \alpha_t d_t + \theta_t$

where $$\alpha_t$$ is obtained with a line search.

Note

The classes presented here are not working with gnumpy.

class climin.bfgs.Bfgs(wrt, f, fprime, initial_inv_hessian=None, line_search=None, args=None)

BFGS (Broyden-Fletcher-Goldfarb-Shanno) is one of the most well-knwon quasi-Newton methods. The main idea is to iteratively construct an approximate inverse Hessian $$B^{-1}_t$$ by a rank-2 update:

$B^{-1}_{t+1} = B^{-1}_t + (1 + \frac{y_t^TB^{-1}_ty_t}{y_t^Ts_t})\frac{s_ts_t^T}{s_t^Ty_t} - \frac{s_ty_t^TB^{-1}_t + B^{-1}_ty_ts_t^T}{s_t^Ty_t},$

where $$y_t = f(\theta_{t+1}) - f(\theta_{t})$$ and $$s_t = \theta_{t+1} - \theta_t$$.

The storage requirements for BFGS scale quadratically with the number of variables. For detailed derivations, see [nocedal2006a], chapter 6.

 [nocedal2006a] (1, 2) Nocedal, J. and Wright, S. (2006), Numerical Optimization, 2nd edition, Springer.

Attributes

 wrt (array_like) Current solution to the problem. Can be given as a first argument to .f and .fprime. f (Callable) The object function. fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt. initial_inv_hessian (array_like) The initial estimate of the approximiate Hessian. line_search (LineSearch object.) Line search object to perform line searches with. args (iterable) Iterator over arguments which fprime will be called with.

Methods

__init__(wrt, f, fprime, initial_inv_hessian=None, line_search=None, args=None)

Create a BFGS object.

Parameters: wrt : array_like Array that represents the solution. Will be operated upon in place. f and fprime should accept this array as a first argument. f : callable The objective function. fprime : callable Callable that given a solution vector as first parameter and *args and **kwargs drawn from the iterations args returns a search direction, such as a gradient. initial_inv_hessian : array_like The initial estimate of the approximiate Hessian. line_search : LineSearch object. Line search object to perform line searches with. args : iterable Iterator over arguments which fprime will be called with.
class climin.bfgs.Lbfgs(wrt, f, fprime, initial_hessian_diag=1, n_factors=10, line_search=None, args=None)

l-BFGS (limited-memory BFGS) is a limited memory variation of the well-known BFGS algorithm. The storage requirement for BFGS scale quadratically with the number of variables, and thus it tends to be used only for smaller problems. Limited-memory BFGS reduces the storage by only using the $$l$$ latest updates (factors) in computing the approximate Hessian inverse and representing this approximation only implicitly. More specifically, it stores the last $$l$$ BFGS update vectors $$y_t$$ and $$s_t$$ and uses these to implicitly perform the matrix operations of BFGS (see [nocedal2006a]).

Note

In order to handle simple box constraints, consider scipy.optimize.fmin_l_bfgs_b.

Attributes

 wrt (array_like) Current solution to the problem. Can be given as a first argument to .f and .fprime. f (Callable) The object function. fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt. initial_hessian_diag (array_like) The initial estimate of the diagonal of the Hessian. n_factors (int) The number of factors that should be used to implicitly represent the inverse Hessian. line_search (LineSearch object.) Line search object to perform line searches with. args (iterable) Iterator over arguments which fprime will be called with.

Methods

__init__(wrt, f, fprime, initial_hessian_diag=1, n_factors=10, line_search=None, args=None)

Create an Lbfgs object.

Attributes

 wrt (array_like) Current solution to the problem. Can be given as a first argument to .f and .fprime. f (Callable) The object function. fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt. initial_hessian_diag (array_like) The initial estimate of the diagonal of the Hessian. n_factors (int) The number of factors that should be used to implicitly represent the inverse Hessian. line_search (LineSearch object.) Line search object to perform line searches with. args (iterable) Iterator over arguments which fprime will be called with.