The arboretum procedure

Yüklə 3,07 Mb.

Pdf görüntüsü

səhifə	100/148
tarix	30.04.2018
ölçüsü	3,07 Mb.
	#40673

1 ... 96 97 98 99 100 101 102 103 ... 148

Standard Printing for the PROP Algorithms
The Standard Backprop Algorithm
The RPROP Algorithm
The Quickprop Algorithm

The BPROP, RPROP, and QPROP algorithms differ in how

is computed.

Standard Printing for the PROP Algorithms

When the PDETAIL option is not specified, a standard table is produced displaying the iteration number,

the value of the objective function at that iteration, and

norm of the gradient vector

This table is useful for overall convergence behavior. However, unlike the table produced by the

PDETAIL option, no information about individual network weights is given.

In the case of sum of squared error, which results from specifying OBJECTIVE=DEV in the NETOPTS

statement and ERROR=NORMAL in the TARGET statement, the error function serves as the objective

function, and is given by

Candidate network weight values

that minimize the objective function

satisfy the first order

condition

Hence, a natural convergence criteria is

for some small value . This is, in fact, convergence criteria for all prop methods. The value of is set

by the ABSGCONV= option in the NLOPTIONS statement, with a default value of

. Note

that the

norm

for some vector z is simply the maximum of the absolute value of the

components of z.

The standard table prints the following quantities:

Iteration n

Objective,

, using current network weight

Max Abs Gradient Element,

When the PDETAIL option is specified, this standard table is still printed. Each line of the standard

tables follows the detail lines for each of the weights at each iteration.

The Standard Backprop Algorithm

The standard backprop algorithm is a gradient descent with momentum.

At the nth iteration, the update is computed as

For TECH=BPROP, the PDETAIL option in the TRAIN statement results in the following quantities

being printed:

is labeled "Previous Change"

is labeled "Gradient"

is labeled "Current Change"

is labeled "Previous Weight"

is labeled "Current Weight"

The learning rate and the momentum are printed at the beginning of the table. These quantities are

set by the LEARN= and MOMENTUM= options respectively, with the default values of

and

.

The RPROP Algorithm

The RPROP algorithm (Riedmiller and Braun 1993) is unusual as a descent algorithm in that it does not

use the magnitude of the gradient in calculating the weight update. Instead, the signs of the current and

previous gradient are used to determine a step size

at each iteration.

To prevent oscillations and underflows, the step size

is bounded by

The value of

is set by the MAXLEARN= option with a default value of

. The value of

is set by the MINLEARN= option with a default value of

. Note that these values are

substantially different from the recommendations given in Schiffmann, Joost and Werner, (1994). These

new values improved stability and convergence over a wide range of problems.

For each connection weight, an initial stepsize

is given a small value. According to

Schiffmann, Joost and Werner, (1994), results are not typically dependent on the exact value given

. PROC NEURAL uses a default initial step size of 0.1 for all weights and is set by the

LEARN= option in the TRAIN statement.

At the nth iteration, adjust the step size by

The factors u and d are the acceleration and deceleration respectively. The values of these factors are set

by the ACCELERATE= and DECELERATE= options in the TRAIN statement. The default value for

ACCELERATE= 1.2; for DECELERATE= the default value is 0.5.

For TECH=RPROP, the PDETAIL option in the TRAIN statement results in the following quantities

being printed:

is labeled "Previous Step Size"

is labeled "Previous Gradient"

is labeled "Current Gradient"

is labeled "Current Step Size"

is labeled "Current Change"

is labeled "Previous Weight"

is labeled "Current Weight"

The Quickprop Algorithm

The Quickprop algorithm (Fahlman 1989) assumes that the error function behaves locally like a

parabola, and uses the method of false position to find the minimum of the approximating quadratic.

Variations are required to ensure the change is not uphill and to handle cases where the gradient does not

change between iterations (causing the false position method to fail).

The quickprop algorithm uses a modified gradient

related to the regular gradient by

At the nth iteration, the weight update is given by

For initialization at n=1, we set

, so the update step becomes a gradient descent:

At the second and subsequent iterations,

and

are computed as follows:

Calculation of

first involves evaluation

, the numerical estimate of the second

derivative:

This second derivative can become large in absolute value or can signal a move "up" the gradient away

from a minimum. The following modifications are applied to account for these situations.

The value of

is set by the LEARN= option in the TRAIN statement, with a default value of

. The bound

is set by the MAXMOMENTUM= option in the TRAIN statement,

with a default value of

For TECH=QPROP, the PDETAIL option in the TRAIN statement results in the following quantities

being printed:

is labeled "Previous Weight"

is labeled "Gradient"

is labeled "Modified Gradient"

Yüklə 3,07 Mb.

Dostları ilə paylaş:

1 ... 96 97 98 99 100 101 102 103 ... 148