Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The SPLIT statement invokes the procedure. The DATA=option identifies
the training data set that is used to fit the model. The DMDBCAT= option identifies
the training data catalog.
proc split data=sampsio.dmdbase
dmdbcat=sampsio.dmdbase
The CRITERION = method specifies the PROBF method of searching and evaluating
candidate splitting rules. For interval targets, the default method is PROBF
(p-value of F-test associated with node variance).
criterion=probf
The PADJUST=option specifies the DEPTH method for adjusting p-values. DEPTH adjusts
for the number of ancestor splits.
padjust=depth
The OUTMATRIX= option names the output data set that contains tree summary
statistics for the training data.
outmatrix=trtree
The OUTTREE= option names the data set that contains tree information.
You can use the INTREE= option to read the OUTTREE= data set in a subsequent
execution of PROC SPLIT.
outtree=treedata
The OUTLEAF= option names the data set that contains statistics for
each leaf node.
outleaf=leafdata
The OUTSEQ= option names the data set that contains sub-tree statistics.
outseq=subtree;
Each INPUT statement specifies a set of input variables that have the
same measurement level. The LEVEL= option identifies the measurement level
of each input set.
input league division position / level=nominal;
input no_atbat no_hits no_home no_runs no_rbi no_bb
yr_major cr_atbat cr_hits cr_home cr_runs cr_rbi cr_bb
no_outs no_assts no_error / level=interval;
The TARGET statement specifies the target (response) variable.
target logsalar;
The SCORE statement specifies the data set that you want to score in
conjunction with training. The DATA= option identifies the score data set.
score data=sampsio.dmtbase nodmdb
The OUTFIT= option names the output data set that contains goodness-of-fit
statistics for the scored data set. The OUT= data set contains summary statistics
for the scored data set, such as predicted and residual values.
outfit=splfit
out=splout(rename=(p_logsal=predict r_logsal=residual));
title 'Decision Tree: Baseball Data';
run;
PROC PRINT lists summary tree statistics for the training data set.
proc print data=trtree noobs label;
title2 'Summary Tree Statistics for the Training Data';
run;
PROC PRINT lists summary statistics for each leaf node.
proc print data=leafdata noobs label;
title2 'Leaf Node Summary Statistics';
run;
PROC PRINT lists summary statistics for each subtree in the sub-tree
sequence.
proc print data=subtree noobs label;
title2 'Subtree Summary Statistics';
run;
PROC PRINT lists fit statistics for the scored test data set.
proc print data=splfit noobs label;
title2 'Summary Statistics for the Scored Test Data';
run;
PROC GPLOT produces diagnostic plots for the scored test data set. The
first PLOT statement creates a scatter plot of the target values versus the
predicted values of the target. The second PLOT statement creates a scatter
plot of the residual values versus the predicted values of the target.
proc gplot data=splout;
plot logsalar*predict / haxis=axis1 vaxis=axis2 frame;
symbol c=black i=none v=dot h= 3 pct;
axis1 minor=none color=black width=2.5;
axis2 minor=none color=black width=2.5;
title2 'Log of Salary versus the Predicted Log of Salary';
The SPLIT Procedure
References
Berry, M. J. A. and Linoff, G. (1997), Data Mining Techniques for Marketing, Sales, and
Customer Support, New York: John Wiley and Sons, Inc.
Breiman, L., Friedman, J.H., Olsen, R.A., and Stone, C.J. (1984), Classification and Regression
Trees, Belmont, CA: Wadsworth International Group.
Collier Books (1987), The Baseball Encyclopedia Update, New York: Macmillan Publishing
Company.
Hand, D. J. (1987), Construction and Assessment of Classification Rules, New York: John
Wiley and Sons, Inc.
Quinlan, J. Ross (1993), C4.5: Programs for Machine Learning, San Francisco: Morgan
Kaufmann Publishers.
Steinberg, D. and Colla, P. (1995), CART: Tree-Structured Non-Parametric Data Analysis, San
Diego, CA: Salford Systems.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
Dostları ilə paylaş: |