isanet.optimizer.LBFGS

LBFGS Module. This module provides the the LBFGS class. In this case, the backpropagation compute the gradient on the following objective function (Loss)

Loss = 1/N sum_k (y_i -y_i(w)')^2 + kernel_regularizer*||w||^2

So the quantity that will be monitored in the interation log will be:

loss        = loss_mse_reg
val_loss    = val_loss_mse_reg

Update rule for parameter w with gradient g:

d = - self.__compute_search_dir(g, H0, self.__s, self.__y)
alpha = line_search_strong_wolfe
w += alpha*d

Note

For major details on the implementation refer to Wright and Nocedal, ‘Numerical Optimization’, 1999, pp. 177-179.

class isanet.optimizer.LBFGS.LBFGS(m=3, c1=0.0001, c2=0.9, ln_maxiter=10, tol=None, n_iter_no_change=None, norm_g_eps=None, l_eps=None, debug=False)

Bases: isanet.optimizer.optimizer.Optimizer

Limited-memory BFGS (L-BFGS)

Parameters
  • m (integer, default=3) – The Hessian approximation will keep the curvature information from the ‘m’ most recent iterations.

  • c1 (float, default=1e-4) – Parameter for the Armijo-Wolfe line search.

  • c2 (float, default=0.9) – Parameter for the Armijo-Wolfe line search.

  • ln_maxiter (integer, default=10) – Maximum number of iterations of the Line Search.

  • tol (float, optional) – Tolerance for the optimization. When the loss on training is not improving by at least tol for ‘n_iter_no_change’ consecutive iterations convergence is considered to be reached and training stops.

  • n_iter_no_change (integer, optional) – Maximum number of iterations with no improvements > tol.

  • norm_g_eps (float, optional) – Threshold that is used to decide whether to stop the fitting of the model (it stops if the norm of the gradient reaches ‘norm_g_eps’).

  • l_eps (float, optional) – Threshold that is used to decide whether to stop the fitting of the model (it stops if the loss function reaches ‘l_eps’).

  • debug (boolean, default=False) – If True, allows you to perform iterations one at a time, pressing the Enter key.

history

Save for each iteration some interesting values.

Dictionary’s keys:
alpha

Step size chosen by the line search.

norm_g

Gradient norm.

ls_conv

Specifies whether the line search was able to find an alpha.

ls_it

Number of iterations of the line search.

ls_time

Computational time of the line search (includes the computational time of the zoom method, if used).

zoom_used

Specifies whether the zoom method has been used.

zoom_conv

Specifies whether the zoom method was able to find an alpha.

zoom_it

Number of iterations of the zoom method.

Type

dict

backpropagation(model, weights, X, Y)

Computes the derivative of 1/n sum_n (y_i -y_i’)^2 + lamda*||weights||^2.

Parameters
  • model (isanet.model.MLP) – Specify the Multilayer Perceptron object to optimize

  • weights (list) – List of arrays, the ith array represents all the weights of each neuron in the ith layer.

  • X (array-like of shape (n_samples, n_features)) – The input data.

  • Y (array-like of shape (n_samples, n_output)) – The target values.

Returns

contains the gradients norm for each layer to be used in the delta rule. Each index in the list represents the ith layer. (from the first hidden layer to the output layer).:

E.g. 0 -> first hidden layer, ..., n+1 -> output layer
where n is the number of hidden layer in the net.

Return type

list

forward(weights, X)

Uses the weights passed to the function to make the Feed-Forward step.

Parameters
  • weights (list) – List of arrays, the ith array represents all the weights of each neuron in the ith layer.

  • X (array-like of shape (n_samples, n_features)) – The input data.

Returns

Output of all neurons for input X.

Return type

array-like

get_batch(X_train, Y_train, batch_size)
Parameters
  • X_train (array-like of shape (n_samples, n_features)) – The input data.

  • Y_train (array-like of shape (n_samples, n_output)) – The target values.

  • batch_size (integer) – Size of minibatches for the optimizer.

Returns

Each key of the dictionary is a integer value from 0 to number_of_batch -1 and define a batch. Each element is a dictionary and has two key: ‘batch_x_train’ and ‘batch_y_train’ and refer to the portion of data and target respectively used for the training.

Return type

dict of dict

optimize(model, epochs, X_train, Y_train, validation_data=None, batch_size=None, es=None, verbose=0)
Parameters
  • model (isanet.model.MLP) – Specify the Multilayer Perceptron object to optimize.

  • epochs (integer) – Maximum number of epochs.

  • X_train (array-like of shape (n_samples, n_features)) – The input data.

  • Y_train (array-like of shape (n_samples, n_output)) – The target values.

  • validation_data (list of arrays-like, [X_val, Y_val], optional) – Validation set.

  • batch_size (integer, optional) – Size of minibatches for the optimizer. When set to “none”, the optimizer will performe a full batch.

  • es (isanet.callbacks.EarlyStopping, optional) – When set to None it will only use the epochs to finish training. Otherwise, an EarlyStopping type object has been passed and will stop training if the model goes overfitting after a number of consecutive iterations. See docs in optimizier module for the EarlyStopping Class.

  • verbose (integer, default=0) – Controls the verbosity: the higher, the more messages.

Returns

Return type

integer

step(model, X, Y, verbose)

Implements the LBFGS step update method.

Parameters
  • model (isanet.model.MLP) –

    Specify the Multilayer Perceptron object to optimize

    Xarray-like of shape (n_samples, n_features)

    The input data.

  • Y (array-like of shape (n_samples, n_output)) – The target values.

  • verbose (integer, default=0) – Controls the verbosity: the higher, the more messages.

Returns

The gradient norm.

Return type

float