The arboretum procedure

Yüklə 3.07 Mb.

ölçüsü3.07 Mb.
1   ...   95   96   97   98   99   100   101   102   ...   148

the generalized delta rule. In backpropagation, the difference (delta) between the output

value and the target value is the error.


Requests the RPROP algorithm.


Requests Quickprop.

See the following table for the defaults for weight-based optimization techniques for a given value

of the OBJECT= option.

Defaults for Weight-based Optimization Techniques










0 to 100




101 - 501




501 or



(All other




up to 500


(All other




501 or



Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The NEURAL Procedure

USE Statement

Sets all weights to values from a data set.

Category Action Statement - affects the network or the data sets. Options set in an action statement

affect only that statement.

USE SAS-data-set;

Required Arguments


Specifies an input data set that contains all the weights. Unlike the INITIAL statement, the USE

statement does not generate any random weights, therefore the data set must contain all of the

network weights and parameters.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The NEURAL Procedure


For details about neural network architecture and training, see the online Neural Network Node:

Reference documentation. For an introduction to predictive modeling, see the online Predictive

Modeling document. Both of these documents can be accessed by using the Help pull-down menu to

select the "Open the Enterprise Miner Nodes Help" item.

The BPROP, RPROP, and QPROP Algorithms Used in



While the standard backprop algorithm has been a very popular algorithm for training feedforward

networks, performance problems have motivated numerous attempts at finding faster algorithms.

The following discussion of the implementation of the backprop (BPROP), RPROP, and QPROP

algorithms in PROC NEURAL relates the details of these algorithms with the printed output resulting

from the use of the PDETAIL option. The discussion uses the algorithmic description and notation in

Schiffmann, Joost, and Werner (1994) as well as the Neural Net Frequently Asked Questions (FAQ)

available as a hypertext document readable by any World-Wide Web browser, such as Mosaic, under the


There is an important distinction between "backprop" ( or "back propagation of errors") and the

"backpropagation algorithm".

The "back propagation of errors" is an efficient computational technique for computing the derivative of

the error function with respect to the weights and biases of the network. This derivative, more commonly

known as the error gradient, is needed for any first order nonlinear optimization method. The standard

backpropagation algorithm is a method for updating with weights based on the gradient. It is a variation

of the simple "delta rule". See "What is backprop?" in part 2 of the FAQ for more details and references

on standard backprop, RPROP, and Quickprop.

With any of the "prop" algorithms, PROC NEURAL allows detailed printing of the iterations. The

PDETAIL option in the TRAIN statement prints, for each iteration, all quantities involved in the

algorithm for each weight in the network. This option should be used with caution as it can result in

voluminous output. However, by restricting the number of iterations and number of non-frozen weights,

a manageable amount of information is produced. The purpose of the PDETAIL option is to allow you to

follow the nonlinear optimization of the error function for each of the network weights. For any

particular propagation method, as described below, all quantities used to compute an updated weight are


In standard backprop, too low a learning rate makes the network learn very slowly. Too high a learning

rate makes the weights and error function diverge, so there is no learning at all. If the error function is

quadratic, as in linear models, good learning rates can be computed from the Hessian matrix. If the error

function has many local and global optima, as in typical feedforward neural networks with hidden units,

the optimal learning rate often changes dramatically during the training process, because the Hessian

also changes dramatically. Trying to train a neural network using a constant learning rate is usually a

tedious process requiring much trial and error.

With batch training, there is no need to use a constant learning rate. In fact, there is no reason to use

standard backprop at all, because vastly more efficient, reliable, and convenient batch training

algorithms exist such as Quickprop and RPROP.

Many other variants of backprop have been invented. Most suffer from the same theoretical flaw as

standard backprop: the magnitude of the change in the weights (the step size) should NOT be a function

of the magnitude of the gradient. In some regions of the weight space, the gradient is small and you need

a large step size; this happens when you initialize a network with small random weights. In other regions

of the weight space, the gradient is small and you need a small step size; this happens when you are close

to a local minimum. Likewise, a large gradient may call for either a small step or a large step. Many

algorithms try to adapt the learning rate, but any algorithm that multiplies the learning rate by the

gradient to compute the change in the weights is likely to produce erratic behavior when the gradient

changes abruptly. The great advantage of Quickprop and RPROP is that they do not have this excessive

dependence on the magnitude of the gradient. Conventional optimization algorithms use not only the

gradient but also second-order derivatives or a line search (or some combination thereof) to obtain a

good step size.

Mathematical Notation

It is helpful to establish notation so that we can relate quantities and describe algorithms.

 is the weight associated with the connection between the ith unit in the current layer

and the jth unit from the previous layer. The argument n refers to iteration.


 is the update or change for weight 

. This update results in the 

iteration value for 



 is the partial derivative of the error function 

 with respect to the weight 


the nth iteration.


 is the kth component of the output vector for the mth case as a function of the


 and network weights 



 is the kth component of the target vector for the mth case as a function of the inputs



The basic algorithm in all methods is a generic update given by

Dostları ilə paylaş:
1   ...   95   96   97   98   99   100   101   102   ...   148

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2017
rəhbərliyinə müraciət

    Ana səhifə