The arboretum procedure

Yüklə 3,07 Mb.

ölçüsü3,07 Mb.
1   ...   108   109   110   111   112   113   114   115   ...   148


PROC G3D creates a plot of the predicted values. Note that this network

underfits badly.

proc g3d data=mlpout;

   plot x2*x1=p_hipl / grid side ctop=blue 

                       caxis=green ctext=black 

                       zmin=-1.5 zmax=1.5;


The NEURAL Procedure


Berry, M. J. A. and Linoff, G. (1997), Data Mining Techniques for Marketing, Sales, and

Customer Support, New York: John Wiley and Sons, Inc.

Bishop, C. M. (1995), Neural Networks for Pattern Recognition, New York: Oxford University


Bigus, J. P. (1996), Data Mining with Neural Networks: Solving Business Problems - from

Application Development to Decision Support, New York: McGraw-Hill.

Collier Books (1987), The 1987 Baseball Encyclopedia Update, New York: Macmillan

Publishing Company.

Michie, D., Spiegelhalter, D. J. and Taylor, C. C. (1994), Machine Learning, Neural and

Statistical Classification, New York: Ellis Horwood.

Ripley, B. D. (1996), Pattern Recognition and Neural Networks, New York: Cambridge

University Press.

Sarle, W. S. (1994a), "Neural Networks and Statistical Models," Proceedings of the Nineteenth

Annual SAS Users Group International Conference, Cary, NC: SAS Institute Inc., 1538-1550.

Sarle, W. S. (1994b), "Neural Network Implementation in SAS Software," Proceedings of the

Nineteenth Annual SAS Users Group International Conference, Cary, NC: SAS Institute Inc.,


Sarle, W. S. (1995), "Stopped Training and Other Remedies for Overfitting," Proceedings of the

27th Symposium on the Interface, Cary, NC: SAS Institute Inc.

Smith, M. (1993), Neural Networks for Statistical Modeling, New York: Van Nostrand


Weiss, S. M. and Kulikowski, C. A., (1991), Computer Systems that Learn: Classification and

Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, San

Mateo, CA: Morgan Kaufmann.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The PMBR Procedure

The PMBR Procedure


Procedure Syntax

PROC PMBR Statement

VAR Statement

TARGET Statement

CLASS Statement

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The PMBR Procedure


The PMBR procedure is used for prediction as an alternative to other predictive modeling techniques in

Enterprise Miner, such as the NEURAL, SPLIT, DMSPLIT, DMNEURL, and DMREG procedures.

However, the technique in the PMBR procedure is different. Whereas all the other techniques attempt to

determine some rules for predicting future examples, the PMBR procedure categorizes an observation in

a score data set by retrieving its k closest neighbors from a training data set, and then having each

neighbor vote on the target value based on its value for the target variable. These votes then become the

posterior pobabilities for predicting the target, which are included in an output data set. Training is thus

faster than with the alternative techniques, but scoring is generally slower.

The target variable is expected to be either binary, interval, or nominal. Ordinal targets are not specially

supported at this time, but could be modeled as interval targets. If the target variable is a class variable in

the DMDB, one variable is created on the output data set for each value of the target, representing the

appropriate posterior probabilities. Otherwise, one predicted variable is created on the output data set

corresponding to the average prediction for the k neighbors.

The neighbors are determined by a simple Euclidean distance between the values on each of the

variables in the VAR statement for the probe and target example. Thus, it is assumed that the variables

are orthogonal to each other and standardized. If your input data is not in that form, you need precede

this procedure with one that will create numeric, orthogonal, and standardized variables -- such as the


The PMBR procedure needs to be run separately and be given the DMDB-name for each of the data sets

to be scored, including any training, validation, test, or score data set.

Missing values in either the training or score data set are replaced by the mean of that variable as stored

in the DMDB catalog.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The PMBR Procedure

PROC PMBR Statement

Invokes the PMBR procedure.

PROC PMBR <option(s)>;

Required Arguments

DMDBCAT = SAS-catalog

Specifies the DMDB catalog.


DATA = (or IN =) SAS-data-set

Specifies the DMDB-encoded input SAS data to be trained on. If you omit the DATA= option, the

procedure uses the most recently created SAS data set, which must be DMDB-encoded.

SCORE = SAS-data-set

Specifies the data set to be scored. This data set might not have the target variable. It can be the

same name as the training data set.

OUT = SAS-data-set

Specifies the name of the output data set. This output data set contains all variable in the score

data set and additional variables representing the posterior probabilities. If the target variable is

categorical, the names of these variables generally begin with P_, followed by a part of the

original variable names and with the values added to the end. These posterior probabilities

correspond to the percentages of the k neighbors that have the value as the target. If the target

variable is interval, a single posterior variable is produced that averages the target values across

the k neighbors. This option is required if the SCORE= option is used.

K = integer

Specifies the number of nearest neighbors to retrieve.




Prints out training information and weights (if the WEIGHTED option is specified) to the

OUTPUT window.

METHOD = method

Determines what data representation is used to store the training data set and then to retrieve the

nearest neighbors. The following methods are available:

Dostları ilə paylaş:
1   ...   108   109   110   111   112   113   114   115   ...   148

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2017
rəhbərliyinə müraciət

    Ana səhifə