Quasi-Newton (BFGS, L-BFGS)¶
This module provides an implementation of Quasi-Newton methods (BFGS, sBFGS and l-BFGS).
The Taylor expansion up to second order of a function \(f(\theta_t)\) allows a local quadratic approximiation of \(f(\theta_t + d_t)\):
\[f(\theta_t + d_t) \approx f(\theta_t) + d_t^Tf'(\theta_t) + \frac{1}{2}d_t^TH_td_t\]
where the symmetric positive definite matrix \(H_t\) is the Hessian at \(\theta_t\). The minimizer \(d_t\) of this convex quadratic model is:
\[d_t = -H^{-1}f'(\theta_t).\]
For large scale problems both computing/storing the Hessian and solving the above linear system is computationally demanding. Instead of recomputing the Hessian from scratch at every iteration, quasi-Newton methods utilize successive measurements of the gradient to build a sufficiently good quadratic model of the objective function. The above formula is then applied to yield a direction \(d_t\). The update done is then of the form
\[\theta_{t+1} = \alpha_t d_t + \theta_t\]
where \(\alpha_t\) is obtained with a line search.
Note
The classes presented here are not working with gnumpy.
-
class
climin.bfgs.
Bfgs
(wrt, f, fprime, initial_inv_hessian=None, line_search=None, args=None)¶ BFGS (Broyden-Fletcher-Goldfarb-Shanno) is one of the most well-knwon quasi-Newton methods. The main idea is to iteratively construct an approximate inverse Hessian \(B^{-1}_t\) by a rank-2 update:
\[B^{-1}_{t+1} = B^{-1}_t + (1 + \frac{y_t^TB^{-1}_ty_t}{y_t^Ts_t})\frac{s_ts_t^T}{s_t^Ty_t} - \frac{s_ty_t^TB^{-1}_t + B^{-1}_ty_ts_t^T}{s_t^Ty_t},\]where \(y_t = f(\theta_{t+1}) - f(\theta_{t})\) and \(s_t = \theta_{t+1} - \theta_t\).
The storage requirements for BFGS scale quadratically with the number of variables. For detailed derivations, see [nocedal2006a], chapter 6.
[nocedal2006a] (1, 2) Nocedal, J. and Wright, S. (2006), Numerical Optimization, 2nd edition, Springer. Attributes
wrt (array_like) Current solution to the problem. Can be given as a first argument to .f
and.fprime
.f (Callable) The object function. fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt
.initial_inv_hessian (array_like) The initial estimate of the approximiate Hessian. line_search (LineSearch object.) Line search object to perform line searches with. args (iterable) Iterator over arguments which fprime
will be called with.Methods
-
__init__
(wrt, f, fprime, initial_inv_hessian=None, line_search=None, args=None)¶ Create a BFGS object.
Parameters: wrt : array_like
Array that represents the solution. Will be operated upon in place.
f
andfprime
should accept this array as a first argument.f : callable
The objective function.
fprime : callable
initial_inv_hessian : array_like
The initial estimate of the approximiate Hessian.
line_search : LineSearch object.
Line search object to perform line searches with.
args : iterable
Iterator over arguments which
fprime
will be called with.
-
-
class
climin.bfgs.
Lbfgs
(wrt, f, fprime, initial_hessian_diag=1, n_factors=10, line_search=None, args=None)¶ l-BFGS (limited-memory BFGS) is a limited memory variation of the well-known BFGS algorithm. The storage requirement for BFGS scale quadratically with the number of variables, and thus it tends to be used only for smaller problems. Limited-memory BFGS reduces the storage by only using the \(l\) latest updates (factors) in computing the approximate Hessian inverse and representing this approximation only implicitly. More specifically, it stores the last \(l\) BFGS update vectors \(y_t\) and \(s_t\) and uses these to implicitly perform the matrix operations of BFGS (see [nocedal2006a]).
Note
In order to handle simple box constraints, consider
scipy.optimize.fmin_l_bfgs_b
.Attributes
wrt (array_like) Current solution to the problem. Can be given as a first argument to .f
and.fprime
.f (Callable) The object function. fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt
.initial_hessian_diag (array_like) The initial estimate of the diagonal of the Hessian. n_factors (int) The number of factors that should be used to implicitly represent the inverse Hessian. line_search (LineSearch object.) Line search object to perform line searches with. args (iterable) Iterator over arguments which fprime
will be called with.Methods
-
__init__
(wrt, f, fprime, initial_hessian_diag=1, n_factors=10, line_search=None, args=None)¶ Create an Lbfgs object.
Attributes
wrt (array_like) Current solution to the problem. Can be given as a first argument to .f
and.fprime
.f (Callable) The object function. fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt
.initial_hessian_diag (array_like) The initial estimate of the diagonal of the Hessian. n_factors (int) The number of factors that should be used to implicitly represent the inverse Hessian. line_search (LineSearch object.) Line search object to perform line searches with. args (iterable) Iterator over arguments which fprime
will be called with.
-