PROC GPLOT creates a scatter plot of the Rings training data.
title 'SPLIT Example: RINGS Data';
title2 'Plot of the Rings Training Data';
goptions gunit=pct ftext=swiss ftitle=swissb htitle=4 htext=3;
proc gplot data=sampsio.dmdring;
plot y*x=c /haxis=axis1 vaxis=axis2;
symbol c=black i=none v=dot;
symbol2 c=red i=none v=square;
symbol3 c=green i=none v=triangle;
axis1 c=black width=2.5 order=(0 to 30 by 5);
axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);
run;
The SPLIT statement invokes the procedure. The DATA= option names the
DMDB encoded training data set. The DMDBCAT= option names the DMDB encoded
training catalog.
title2 'Entropy Criterion';
proc split data=sampsio.dmdring
dmdbcat=sampsio.dmdring
The CRITERION = method specifies to use the ENTROPY method of searching
and evaluating candidate splitting rules. The ENTROPY method searches for
splits based on a reduction in entropy measure of node impurity. The default
CRITERION= method for nominal targets is set to PROBCHISQ.
criterion=entropy
The SPLITSIZE= option specifies the smallest number of training observations
a node must have for the procedure to consider splitting it.
splitsize=2
The MAXBRANCH= n option restricts the number of subsets
a splitting rule can produce to n or fewer.
maxbranch=3
The OUTTREE= option names the data set that contains tree information.
outtree=tree;
The INPUT statement specifies the input variables. By default, the measurement
level of the inputs is set to INTERVAL.
input x y;
The TARGET statement specifies the target variable. The LEVEL= option
sets the measurement level to nominal.
target c / level=nominal;
Because the DATA= option is not specified, the SCORE statement scores
the training data set. The OUT= option names the output data set containing
outputs. The OUTFIT= option names the output data set containing fit statistics.
score out=out outfit=fit;
run;
PROC PRINT creates a report of fit statistics for the training data.
proc print data=fit noobs label;
title3 'Fit Statistics for the Training Data';
run;
PROC FREQ creates a misclassification table for the training data. The
F_C variable is the actual target value for each case, and the I_C variable
is the target value into which the case is classified.
proc freq data=out;
tables f_c*i_c;
title3 'Misclassification Table';
run;
PROC GPLOT produces a plot of the classification results for the training
data.
proc gplot data=out;
plot y*x=i_c / haxis=axis1 vaxis=axis2;
symbol c=black i=none v=dot;
symbol2 c=red i=none v=square;
symbol3 c=green i=none v=triangle;
axis1 c=black width=2.5 order=(0 to 30 by 5);
axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);
title3 'Classification Results';
run;
The INTREE= option specifies to read the OUTTREE= decision tree data
set that was created from the previous run of PROC SPLIT.
proc split intree=tree;
The SCORE statement scores the DATA= data set and outputs the results
to the OUT= data set. The ROLE=SCORE option identifies the data set as a score
data set. The ROLE= option primarily affects what fit statistics are computed
and what their names and labels are.
score data=sampsio.dmsring nodmdb role=score out=gridout;
run;
The GCONTOUR procedure creates contour plots of the posterior probabilities.
proc gcontour data=gridout;
plot y*x=p_c1 / pattern ctext=black coutline=gray;
plot y*x=p_c2 / pattern ctext=black coutline=gray;
plot y*x=p_c3 / pattern ctext=black coutline=gray;
title2 'Posterior Probabilities';
pattern v=msolid;
legend frame;
title3 'Posterior Probabilities';
run;
The GPLOT procedure creates a scatter plot of the leaf nodes.
proc gplot data=gridout;
plot y*x=_node_;;
symbol c=blue i=none v=dot;
symbol2 c=red i=none v=square;
symbol3 c=green i=none v=triangle;
symbol4 c=black i=none v=star;
symbol5 c=orange i=none v=plus;
symbol6 c=brown i=none v=circle;
symbol7 c=cyan i=none v==;
symbol8 c=black i=none v=hash;
symbol9 c=gold i=none v=:;
symbol10 c=yellow i=none v=x;
title3 'Leaf Nodes';
run;
Dostları ilə paylaş: |