Resilient Propagation

This module contains the Resilient propagation optimizer.

class climin.rprop.Rprop(wrt, fprime, step_shrink=0.5, step_grow=1.2, min_step=1e-06, max_step=1, changes_max=0.1, args=None)

Rprop optimizer.

Resilient propagation is an optimizer that was originally tailored towards neural networks. It can however be savely applied to all kinds of optimization problems. The idea is to have a parameter specific step rate which is determined by sign changes of the derivative of the objective function.

To be more precise, given the derivative of the loss given the parameters \(f'(\theta_t)\) at time step \(t\), the \(i\) th component of the vector of steprates \(\alpha\) is determined as

\[\begin{split}\alpha_i \leftarrow \begin{cases} \alpha_i \cdot \eta_{\text{grow}} ~\text{if}~ f'(\theta_t)_i \cdot f'(\theta_{t-1})_i > 0 \\ \alpha_i \cdot \eta_{\text{shrink}} ~\text{if}~ f'(\theta_t)_i \cdot f'(\theta_{t-1})_i < 0 \\ \alpha_i \end{cases}\end{split}\]

where \(0 < \eta_{\text{shrink}} < 1 < \eta_{\text{grow}}\) specifies the shrink and growth rates of the step rates. Typically, we will threshold the step rates at minimum and maximum values.

The parameters are then adapted according to the sign of the error gradient:

\[\theta_{t+1} = -\alpha~\text{sgn}(f'(\theta_t)).\]

This results in a method which is quite robust. On the other hand, it is more sensitive towards stochastic objectives, since that stochasticity might lead to bad estimates of the sign of the gradient.

Note

Works with gnumpy.

[riedmiller1992rprop]M. Riedmiller und Heinrich Braun: Rprop - A Fast Adaptive Learning Algorithm. Proceedings of the International Symposium on Computer and Information Science VII, 1992

Attributes

wrt (array_like) Current solution to the problem. Can be given as a first argument to .fprime.
fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt.
step_shrink (float) Constant to shrink step rates by if the gradients of the error do not agree over time.
step_grow (float) Constant to grow step rates by if the gradients of the error do agree over time.
min_step (float) Minimum step rate.
max_step (float) Maximum step rate.

Methods

__init__(wrt, fprime, step_shrink=0.5, step_grow=1.2, min_step=1e-06, max_step=1, changes_max=0.1, args=None)

Create an Rprop object.

Parameters:

wrt : array_like

Current solution to the problem. Can be given as a first argument to .fprime.

fprime : Callable

First derivative of the objective function. Returns an array of the same shape as .wrt.

step_shrink : float

Constant to shrink step rates by if the gradients of the error do not agree over time.

step_grow : float

Constant to grow step rates by if the gradients of the error do agree over time.

min_step : float

Minimum step rate.

max_step : float

Maximum step rate.

args : iterable

Iterator over arguments which fprime will be called with.