Dealing with parameters

Parameters in climin are a long and one dimensional array. This might seem as a restriction at first, yet it makes things easier in other places. Consider a model involving complicated array dimensionalities; now consider how higher order derivatives of those might look like. Yes that’s right, a pretty messy thing. Furthermore, letting paramters occupy consecutive regions of memory has further advantages from an implementation point of view. We can easier write it to disk, randomize its contents or similar things.

Creating parameter sets

Creating of parameter arrays need not be tedious, though. Climin comes with a nice convenience function, climin.util.empty_with_views which does most of the work. You just need to feed it the shapes of parameters you are interested in.

Let us use logistic regressiom from the Tutorial and see where it comes in handy. First, we will create a parameter array and the various views according to a template:

import numpy as np
import climin.util

tmpl = [(784, 10), 10]          # w is matrix and b a vector
flat, (w, b) = climin.util.empty_with_views(tmpl)

Now, flat is a one dimensional array. w and b are a two dimensional and a one dimensional array respectively. They share memory with flat, so any change we will do in w or b will be reflected in flat and vice versa. In order for a predict function to get the parameters out of the flat array, there is climin.util.shaped_from_flat which does the same job as empty_with_views, except that it receives flat and does not create it. In fact, the latter uses the former internally.

Let’s adapt the predict function to use w and b instead:

def predict(parameters, inpt):
    w, b = climin.util.shaped_from_flat(parameters, tmpl)
    before_softmax = np.dot(inpt, w) + b
    softmaxed = np.exp(before_softmax - before_softmax.max(axis=1)[:, np.newaxis])
    return softmaxed / softmaxed.sum(axis=1)[:, np.newaxis]

This might seem like overkill for logistic regression, but becomes invaluable when complicated models with many different parameters are used.

Calculating derivatives in place

When calculating derivatives, you can make use of this as well–which is important because climin expects derivatives to be flat as well, nicely aligned with the parameter array:

def f_d_loss_wrt_pars(parameters, inpt, targets):
    p = predict(parameters, inpt)
    d_flat, d_w, d_b = climin.util.empty_with_views(tmpl)
    d_w[...] = np.dot(inpt.T, p - targets) / inpt.shape[0]
    d_b[...] = (p - targets).mean(axis=0)
    return d_flat

What are we doing here? First, we get ourselves a new array and preshaped views on it in the same way as the parameters. Then we overwrite the views in place with the derivatives and finally return the flat array as a result. The in place assignment is important. If we did it using d_w = ..., Python would just reassign the name and the changes would not turn up in d_flat.

As a further note, np.dot supports an extra argument out which specifies where to write the result. To safe memory, we could perform the following instead:

np.dot(inpt.T, p - targets, out=d_w)
d_w  /= inpt.shape[0]

Initializing parameters

Initializing parameters with empty values is asking for trouble. You probably want to populate an array with random numbers or zeros. Of course it is easy to do so by hand:

flat[...] = np.random.normal(0, 0.1, flat.shape)

We found this quite tedious to write though; especially as soon as flat becomes the field of a nested object. Thus, we have a short hand in the initialize module which does exaclty that:

import climin.initialize
climin.initialize.randomize_normal(flat, 0, 0.1)

There are more functions to do similar things. Check out Initialization of Parameters.