The BPROP, RPROP, and QPROP algorithms differ in how
is computed.
Standard Printing for the PROP Algorithms
When the PDETAIL option is not specified, a standard table is produced displaying the iteration number,
the value of the objective function at that iteration, and
norm of the gradient vector
.
This table is useful for overall convergence behavior. However, unlike the table produced by the
PDETAIL option, no information about individual network weights is given.
In the case of sum of squared error, which results from specifying OBJECTIVE=DEV in the NETOPTS
statement and ERROR=NORMAL in the TARGET statement, the error function serves as the objective
function, and is given by
Candidate network weight values
that minimize the objective function
satisfy the first order
condition
Hence, a natural convergence criteria is
for some small value . This is, in fact, convergence criteria for all prop methods. The value of is set
by the ABSGCONV= option in the NLOPTIONS statement, with a default value of
. Note
that the
norm
for some vector z is simply the maximum of the absolute value of the
components of z.
The standard table prints the following quantities:
Iteration n
q
Objective,
, using current network weight
q
Max Abs Gradient Element,
q
When the PDETAIL option is specified, this standard table is still printed. Each line of the standard
tables follows the detail lines for each of the weights at each iteration.
The Standard Backprop Algorithm
The standard backprop algorithm is a gradient descent with momentum.
At the nth iteration, the update is computed as
For TECH=BPROP, the PDETAIL option in the TRAIN statement results in the following quantities
being printed:
is labeled "Previous Change"
is labeled "Gradient"
is labeled "Current Change"
is labeled "Previous Weight"
is labeled "Current Weight"
The learning rate and the momentum are printed at the beginning of the table. These quantities are
set by the LEARN= and MOMENTUM= options respectively, with the default values of
and
.
The RPROP Algorithm
The RPROP algorithm (Riedmiller and Braun 1993) is unusual as a descent algorithm in that it does not
use the magnitude of the gradient in calculating the weight update. Instead, the signs of the current and
previous gradient are used to determine a step size
at each iteration.
To prevent oscillations and underflows, the step size
is bounded by
The value of
is set by the MAXLEARN= option with a default value of
. The value of
is set by the MINLEARN= option with a default value of
. Note that these values are
substantially different from the recommendations given in Schiffmann, Joost and Werner, (1994). These
new values improved stability and convergence over a wide range of problems.
For each connection weight, an initial stepsize
is given a small value. According to
Schiffmann, Joost and Werner, (1994), results are not typically dependent on the exact value given
. PROC NEURAL uses a default initial step size of 0.1 for all weights and is set by the
LEARN= option in the TRAIN statement.
At the nth iteration, adjust the step size by
The factors u and d are the acceleration and deceleration respectively. The values of these factors are set
by the ACCELERATE= and DECELERATE= options in the TRAIN statement. The default value for
ACCELERATE= 1.2; for DECELERATE= the default value is 0.5.
For TECH=RPROP, the PDETAIL option in the TRAIN statement results in the following quantities
being printed:
is labeled "Previous Step Size"
is labeled "Previous Gradient"
is labeled "Current Gradient"
is labeled "Current Step Size"
is labeled "Current Change"
is labeled "Previous Weight"
is labeled "Current Weight"
The Quickprop Algorithm
The Quickprop algorithm (Fahlman 1989) assumes that the error function behaves locally like a
parabola, and uses the method of false position to find the minimum of the approximating quadratic.
Variations are required to ensure the change is not uphill and to handle cases where the gradient does not
change between iterations (causing the false position method to fail).
The quickprop algorithm uses a modified gradient
related to the regular gradient by
At the nth iteration, the weight update is given by
For initialization at n=1, we set
, so the update step becomes a gradient descent:
At the second and subsequent iterations,
and
are computed as follows:
Calculation of
first involves evaluation
, the numerical estimate of the second
derivative:
This second derivative can become large in absolute value or can signal a move "up" the gradient away
from a minimum. The following modifications are applied to account for these situations.
The value of
is set by the LEARN= option in the TRAIN statement, with a default value of
. The bound
is set by the MAXMOMENTUM= option in the TRAIN statement,
with a default value of
.
For TECH=QPROP, the PDETAIL option in the TRAIN statement results in the following quantities
being printed:
is labeled "Previous Weight"
is labeled "Gradient"
is labeled "Modified Gradient"
Dostları ilə paylaş: |