isanet.optimizer.SGD¶
Stochastic Gradient Descent (SGD) Module. This module provides the the SGD class. In this case, the backpropagation compute the gradient on the following objective function (Loss)
Loss = 1/2 sum_k (y_i -y_i')^2
So the quantity that will be monitored in the interation log will be:
loss = loss_mse
val_loss = val_loss_mse
Gradient descent (with momentum) optimizer. Update rule for parameter w with gradient g when momentum is 0:
w = w - learning_rate * g - kernel_regularizer*w
Update rule when momentum is larger than 0:
velocity = momentum * velocity - learning_rate * g
w = w + velocity - kernel_regularizer*w
When nesterov=True, this rule becomes:
g = g(w + sigma*velocity)
velocity = momentum * velocity - learning_rate * g
w = w - learning_rate * g - kernel_regularizer*w
-
class
isanet.optimizer.SGD.
SGD
(learning_rate=0.1, momentum=0, nesterov=False, sigma=None, tol=None, n_iter_no_change=None, norm_g_eps=None, l_eps=None, debug=False)¶ Bases:
isanet.optimizer.optimizer.Optimizer
Stochastic Gradient Descent (SGD)
- Parameters
learning_rate (float, default=0.1) – Learning rate schedule for weight updates (delta rule).
momentum (float, default=0) – Momentum for gradient descent update.
nesterov (boolean, default=False) – Whether to use Nesterov’s momentum.
sigma (float, default=None) – Parameter of the Super Accelerated Nesterov’s momentum. If ‘nesterov’ is True and ‘sigma’ equals to ‘momentum’, then we have the simple Nesterov momentum. Instead, if ‘sigma’ is different from ‘momentum’, we have the super accelerated Nesterov.
tol (float, default=None) – Tolerance for the optimization. When the loss on training is not improving by at least tol for ‘n_iter_no_change’ consecutive iterations convergence is considered to be reached and training stops.
n_iter_no_change (integer, default=None) – Maximum number of epochs with no improvements > tol.
norm_g_eps (float, optional) – Threshold that is used to decide whether to stop the fitting of the model (it stops if the norm of the gradient reaches ‘norm_g_eps’).
l_eps (float, optional) – Threshold that is used to decide whether to stop the fitting of the model (it stops if the loss function reaches ‘l_eps’).
debug (boolean, default=False) – If True, allows you to perform iterations one at a time, pressing the Enter key.
-
history
¶ Save for each iteration some interesting values.
- Dictionary’s keys:
norm_g
Gradient norm.
- Type
dict
-
backpropagation
(model, weights, X, Y)¶ Computes the derivative of 1/2 sum_n (y_i -y_i’)
- Parameters
model (isanet.model.MLP) – Specify the Multilayer Perceptron object to optimize
weights (list) – List of arrays, the ith array represents all the weights of each neuron in the ith layer.
X (array-like of shape (n_samples, n_features)) – The input data.
Y (array-like of shape (n_samples, n_output)) – The target values.
- Returns
contains the gradients for each layer to be used in the delta rule. Each index in the list represents the ith layer. (from the first hidden layer to the output layer).:
E.g. 0 -> first hidden layer, ..., n+1 -> output layer where n is the number of hidden layer in the net.
- Return type
list
-
forward
(weights, X)¶ Uses the weights passed to the function to make the Feed-Forward step.
- Parameters
weights (list) – List of arrays, the ith array represents all the weights of each neuron in the ith layer.
X (array-like of shape (n_samples, n_features)) – The input data.
- Returns
Output of all neurons for input X.
- Return type
array-like
-
get_batch
(X_train, Y_train, batch_size)¶ - Parameters
X_train (array-like of shape (n_samples, n_features)) – The input data.
Y_train (array-like of shape (n_samples, n_output)) – The target values.
batch_size (integer) – Size of minibatches for the optimizer.
- Returns
Each key of the dictionary is a integer value from 0 to number_of_batch -1 and define a batch. Each element is a dictionary and has two key: ‘batch_x_train’ and ‘batch_y_train’ and refer to the portion of data and target respectively used for the training.
- Return type
dict of dict
-
optimize
(model, epochs, X_train, Y_train, validation_data=None, batch_size=None, es=None, verbose=0)¶ - Parameters
model (isanet.model.MLP) – Specify the Multilayer Perceptron object to optimize.
epochs (integer) – Maximum number of epochs.
X_train (array-like of shape (n_samples, n_features)) – The input data.
Y_train (array-like of shape (n_samples, n_output)) – The target values.
validation_data (list of arrays-like, [X_val, Y_val], optional) – Validation set.
batch_size (integer, optional) – Size of minibatches for the optimizer. When set to “none”, the optimizer will performe a full batch.
es (isanet.callbacks.EarlyStopping, optional) – When set to None it will only use the
epochs
to finish training. Otherwise, an EarlyStopping type object has been passed and will stop training if the model goes overfitting after a number of consecutive iterations. See docs in optimizier module for the EarlyStopping Class.verbose (integer, default=0) – Controls the verbosity: the higher, the more messages.
- Returns
- Return type
integer
-
step
(model, X, Y, verbose)¶ Implements the SGD step update method.
- Parameters
model (isanet.model.MLP) –
Specify the Multilayer Perceptron object to optimize
- Xarray-like of shape (n_samples, n_features)
The input data.
Y (array-like of shape (n_samples, n_output)) – The target values.
verbose (integer, default=0) – Controls the verbosity: the higher, the more messages.
- Returns
The gradient norm.
- Return type
float