isanet.optimizer.SGD

Stochastic Gradient Descent (SGD) Module. This module provides the the SGD class. In this case, the backpropagation compute the gradient on the following objective function (Loss)

Loss = 1/2 sum_k (y_i -y_i')^2

So the quantity that will be monitored in the interation log will be:

loss        = loss_mse
val_loss    = val_loss_mse

Gradient descent (with momentum) optimizer. Update rule for parameter w with gradient g when momentum is 0:

w = w - learning_rate * g  - kernel_regularizer*w

Update rule when momentum is larger than 0:

velocity = momentum * velocity - learning_rate * g
w = w + velocity - kernel_regularizer*w

When nesterov=True, this rule becomes:

g = g(w + sigma*velocity)
velocity = momentum * velocity - learning_rate * g
w = w - learning_rate * g - kernel_regularizer*w
class isanet.optimizer.SGD.SGD(learning_rate=0.1, momentum=0, nesterov=False, sigma=None, tol=None, n_iter_no_change=None, norm_g_eps=None, l_eps=None, debug=False)

Bases: isanet.optimizer.optimizer.Optimizer

Stochastic Gradient Descent (SGD)

Parameters
  • learning_rate (float, default=0.1) – Learning rate schedule for weight updates (delta rule).

  • momentum (float, default=0) – Momentum for gradient descent update.

  • nesterov (boolean, default=False) – Whether to use Nesterov’s momentum.

  • sigma (float, default=None) – Parameter of the Super Accelerated Nesterov’s momentum. If ‘nesterov’ is True and ‘sigma’ equals to ‘momentum’, then we have the simple Nesterov momentum. Instead, if ‘sigma’ is different from ‘momentum’, we have the super accelerated Nesterov.

  • tol (float, default=None) – Tolerance for the optimization. When the loss on training is not improving by at least tol for ‘n_iter_no_change’ consecutive iterations convergence is considered to be reached and training stops.

  • n_iter_no_change (integer, default=None) – Maximum number of epochs with no improvements > tol.

  • norm_g_eps (float, optional) – Threshold that is used to decide whether to stop the fitting of the model (it stops if the norm of the gradient reaches ‘norm_g_eps’).

  • l_eps (float, optional) – Threshold that is used to decide whether to stop the fitting of the model (it stops if the loss function reaches ‘l_eps’).

  • debug (boolean, default=False) – If True, allows you to perform iterations one at a time, pressing the Enter key.

history

Save for each iteration some interesting values.

Dictionary’s keys:
norm_g

Gradient norm.

Type

dict

backpropagation(model, weights, X, Y)

Computes the derivative of 1/2 sum_n (y_i -y_i’)

Parameters
  • model (isanet.model.MLP) – Specify the Multilayer Perceptron object to optimize

  • weights (list) – List of arrays, the ith array represents all the weights of each neuron in the ith layer.

  • X (array-like of shape (n_samples, n_features)) – The input data.

  • Y (array-like of shape (n_samples, n_output)) – The target values.

Returns

contains the gradients for each layer to be used in the delta rule. Each index in the list represents the ith layer. (from the first hidden layer to the output layer).:

E.g. 0 -> first hidden layer, ..., n+1 -> output layer
where n is the number of hidden layer in the net.

Return type

list

forward(weights, X)

Uses the weights passed to the function to make the Feed-Forward step.

Parameters
  • weights (list) – List of arrays, the ith array represents all the weights of each neuron in the ith layer.

  • X (array-like of shape (n_samples, n_features)) – The input data.

Returns

Output of all neurons for input X.

Return type

array-like

get_batch(X_train, Y_train, batch_size)
Parameters
  • X_train (array-like of shape (n_samples, n_features)) – The input data.

  • Y_train (array-like of shape (n_samples, n_output)) – The target values.

  • batch_size (integer) – Size of minibatches for the optimizer.

Returns

Each key of the dictionary is a integer value from 0 to number_of_batch -1 and define a batch. Each element is a dictionary and has two key: ‘batch_x_train’ and ‘batch_y_train’ and refer to the portion of data and target respectively used for the training.

Return type

dict of dict

optimize(model, epochs, X_train, Y_train, validation_data=None, batch_size=None, es=None, verbose=0)
Parameters
  • model (isanet.model.MLP) – Specify the Multilayer Perceptron object to optimize.

  • epochs (integer) – Maximum number of epochs.

  • X_train (array-like of shape (n_samples, n_features)) – The input data.

  • Y_train (array-like of shape (n_samples, n_output)) – The target values.

  • validation_data (list of arrays-like, [X_val, Y_val], optional) – Validation set.

  • batch_size (integer, optional) – Size of minibatches for the optimizer. When set to “none”, the optimizer will performe a full batch.

  • es (isanet.callbacks.EarlyStopping, optional) – When set to None it will only use the epochs to finish training. Otherwise, an EarlyStopping type object has been passed and will stop training if the model goes overfitting after a number of consecutive iterations. See docs in optimizier module for the EarlyStopping Class.

  • verbose (integer, default=0) – Controls the verbosity: the higher, the more messages.

Returns

Return type

integer

step(model, X, Y, verbose)

Implements the SGD step update method.

Parameters
  • model (isanet.model.MLP) –

    Specify the Multilayer Perceptron object to optimize

    Xarray-like of shape (n_samples, n_features)

    The input data.

  • Y (array-like of shape (n_samples, n_output)) – The target values.

  • verbose (integer, default=0) – Controls the verbosity: the higher, the more messages.

Returns

The gradient norm.

Return type

float