The arboretum procedure



Yüklə 3,07 Mb.
Pdf görüntüsü
səhifə100/148
tarix30.04.2018
ölçüsü3,07 Mb.
#40673
1   ...   96   97   98   99   100   101   102   103   ...   148

The BPROP, RPROP, and QPROP algorithms differ in how 

 is computed.



Standard Printing for the PROP Algorithms

When the PDETAIL option is not specified, a standard table is produced displaying the iteration number,

the value of the objective function at that iteration, and 

 norm of the gradient vector 

.

This table is useful for overall convergence behavior. However, unlike the table produced by the



PDETAIL option, no information about individual network weights is given.

In the case of sum of squared error, which results from specifying OBJECTIVE=DEV in the NETOPTS

statement and ERROR=NORMAL in the TARGET statement, the error function serves as the objective

function, and is given by

Candidate network weight values 

 that minimize the objective function 

 satisfy the first order

condition

Hence, a natural convergence criteria is

for some small value  . This is, in fact, convergence criteria for all prop methods. The value of   is set

by the ABSGCONV= option in the NLOPTIONS statement, with a default value of 

. Note


that the 

 norm 


 for some vector z is simply the maximum of the absolute value of the

components of z.

The standard table prints the following quantities:

Iteration n

q   



Objective, 

, using current network weight 

q   

Max Abs Gradient Element, 



q   

When the PDETAIL option is specified, this standard table is still printed. Each line of the standard

tables follows the detail lines for each of the weights at each iteration.

The Standard Backprop Algorithm

The standard backprop algorithm is a gradient descent with momentum.

At the nth iteration, the update is computed as

For TECH=BPROP, the PDETAIL option in the TRAIN statement results in the following quantities

being printed:

 is labeled "Previous Change"

 is labeled "Gradient"

 is labeled "Current Change"

 is labeled "Previous Weight"

 is labeled "Current Weight"

The learning rate   and the momentum   are printed at the beginning of the table. These quantities are

set by the LEARN= and MOMENTUM= options respectively, with the default values of 

 and

.

The RPROP Algorithm



The RPROP algorithm (Riedmiller and Braun 1993) is unusual as a descent algorithm in that it does not

use the magnitude of the gradient in calculating the weight update. Instead, the signs of the current and

previous gradient are used to determine a step size 

 at each iteration.




To prevent oscillations and underflows, the step size 

 is bounded by

The value of 

 is set by the MAXLEARN= option with a default value of 

. The value of

 is set by the MINLEARN= option with a default value of 

. Note that these values are

substantially different from the recommendations given in Schiffmann, Joost and Werner, (1994). These

new values improved stability and convergence over a wide range of problems.

For each connection weight, an initial stepsize 

 is given a small value. According to

Schiffmann, Joost and Werner, (1994), results are not typically dependent on the exact value given

. PROC NEURAL uses a default initial step size of 0.1 for all weights and is set by the

LEARN= option in the TRAIN statement.

At the nth iteration, adjust the step size by

The factors u and d are the acceleration and deceleration respectively. The values of these factors are set

by the ACCELERATE= and DECELERATE= options in the TRAIN statement. The default value for

ACCELERATE= 1.2; for DECELERATE= the default value is 0.5.

For TECH=RPROP, the PDETAIL option in the TRAIN statement results in the following quantities

being printed:

 is labeled "Previous Step Size"

 is labeled "Previous Gradient"

 is labeled "Current Gradient"



 is labeled "Current Step Size"

 is labeled "Current Change"

 is labeled "Previous Weight"

 is labeled "Current Weight"



The Quickprop Algorithm

The Quickprop algorithm (Fahlman 1989) assumes that the error function behaves locally like a

parabola, and uses the method of false position to find the minimum of the approximating quadratic.

Variations are required to ensure the change is not uphill and to handle cases where the gradient does not

change between iterations (causing the false position method to fail).

The quickprop algorithm uses a modified gradient 

 related to the regular gradient by

At the nth iteration, the weight update is given by

For initialization at n=1, we set 

, so the update step becomes a gradient descent:

At the second and subsequent iterations, 

 and 


 are computed as follows:


Calculation of 

 first involves evaluation 

, the numerical estimate of the second

derivative:

This second derivative can become large in absolute value or can signal a move "up" the gradient away

from a minimum. The following modifications are applied to account for these situations.

The value of 

 is set by the LEARN= option in the TRAIN statement, with a default value of

. The bound 

 is set by the MAXMOMENTUM= option in the TRAIN statement,

with a default value of 

.

For TECH=QPROP, the PDETAIL option in the TRAIN statement results in the following quantities



being printed:

 is labeled "Previous Weight"

 is labeled "Gradient"

 is labeled "Modified Gradient"




Yüklə 3,07 Mb.

Dostları ilə paylaş:
1   ...   96   97   98   99   100   101   102   103   ...   148




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə