Quasi-Newton (BFGS, L-BFGS)

This module provides an implementation of Quasi-Newton methods (BFGS, sBFGS and l-BFGS).

The Taylor expansion up to second order of a function \(f(\theta_t)\) allows a local quadratic approximiation of \(f(\theta_t + d_t)\):

\[f(\theta_t + d_t) \approx f(\theta_t) + d_t^Tf'(\theta_t) + \frac{1}{2}d_t^TH_td_t\]

where the symmetric positive definite matrix \(H_t\) is the Hessian at \(\theta_t\). The minimizer \(d_t\) of this convex quadratic model is:

\[d_t = -H^{-1}f'(\theta_t).\]

For large scale problems both computing/storing the Hessian and solving the above linear system is computationally demanding. Instead of recomputing the Hessian from scratch at every iteration, quasi-Newton methods utilize successive measurements of the gradient to build a sufficiently good quadratic model of the objective function. The above formula is then applied to yield a direction \(d_t\). The update done is then of the form

\[\theta_{t+1} = \alpha_t d_t + \theta_t\]

where \(\alpha_t\) is obtained with a line search.

Note

The classes presented here are not working with gnumpy.

class climin.bfgs.Bfgs(wrt, f, fprime, initial_inv_hessian=None, line_search=None, args=None)

BFGS (Broyden-Fletcher-Goldfarb-Shanno) is one of the most well-knwon quasi-Newton methods. The main idea is to iteratively construct an approximate inverse Hessian \(B^{-1}_t\) by a rank-2 update:

\[B^{-1}_{t+1} = B^{-1}_t + (1 + \frac{y_t^TB^{-1}_ty_t}{y_t^Ts_t})\frac{s_ts_t^T}{s_t^Ty_t} - \frac{s_ty_t^TB^{-1}_t + B^{-1}_ty_ts_t^T}{s_t^Ty_t},\]

where \(y_t = f(\theta_{t+1}) - f(\theta_{t})\) and \(s_t = \theta_{t+1} - \theta_t\).

The storage requirements for BFGS scale quadratically with the number of variables. For detailed derivations, see [nocedal2006a], chapter 6.

[nocedal2006a](1, 2) Nocedal, J. and Wright, S. (2006), Numerical Optimization, 2nd edition, Springer.

Attributes

wrt (array_like) Current solution to the problem. Can be given as a first argument to .f and .fprime.
f (Callable) The object function.
fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt.
initial_inv_hessian (array_like) The initial estimate of the approximiate Hessian.
line_search (LineSearch object.) Line search object to perform line searches with.
args (iterable) Iterator over arguments which fprime will be called with.

Methods

__init__(wrt, f, fprime, initial_inv_hessian=None, line_search=None, args=None)

Create a BFGS object.

Parameters:

wrt : array_like

Array that represents the solution. Will be operated upon in place. f and fprime should accept this array as a first argument.

f : callable

The objective function.

fprime : callable

Callable that given a solution vector as first parameter and *args and **kwargs drawn from the iterations args returns a search direction, such as a gradient.

initial_inv_hessian : array_like

The initial estimate of the approximiate Hessian.

line_search : LineSearch object.

Line search object to perform line searches with.

args : iterable

Iterator over arguments which fprime will be called with.

class climin.bfgs.Lbfgs(wrt, f, fprime, initial_hessian_diag=1, n_factors=10, line_search=None, args=None)

l-BFGS (limited-memory BFGS) is a limited memory variation of the well-known BFGS algorithm. The storage requirement for BFGS scale quadratically with the number of variables, and thus it tends to be used only for smaller problems. Limited-memory BFGS reduces the storage by only using the \(l\) latest updates (factors) in computing the approximate Hessian inverse and representing this approximation only implicitly. More specifically, it stores the last \(l\) BFGS update vectors \(y_t\) and \(s_t\) and uses these to implicitly perform the matrix operations of BFGS (see [nocedal2006a]).

Note

In order to handle simple box constraints, consider scipy.optimize.fmin_l_bfgs_b.

Attributes

wrt (array_like) Current solution to the problem. Can be given as a first argument to .f and .fprime.
f (Callable) The object function.
fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt.
initial_hessian_diag (array_like) The initial estimate of the diagonal of the Hessian.
n_factors (int) The number of factors that should be used to implicitly represent the inverse Hessian.
line_search (LineSearch object.) Line search object to perform line searches with.
args (iterable) Iterator over arguments which fprime will be called with.

Methods

__init__(wrt, f, fprime, initial_hessian_diag=1, n_factors=10, line_search=None, args=None)

Create an Lbfgs object.

Attributes

wrt (array_like) Current solution to the problem. Can be given as a first argument to .f and .fprime.
f (Callable) The object function.
fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt.
initial_hessian_diag (array_like) The initial estimate of the diagonal of the Hessian.
n_factors (int) The number of factors that should be used to implicitly represent the inverse Hessian.
line_search (LineSearch object.) Line search object to perform line searches with.
args (iterable) Iterator over arguments which fprime will be called with.