The arboretum procedure

Yüklə 3,07 Mb.

Pdf görüntüsü

səhifə	130/148
tarix	30.04.2018
ölçüsü	3,07 Mb.
	#40673

1 ... 126 127 128 129 130 131 132 133 ... 148

PROC GPLOT creates a scatter plot of the Rings training data.

title 'SPLIT Example: RINGS Data';

title2 'Plot of the Rings Training Data';

goptions gunit=pct ftext=swiss ftitle=swissb htitle=4 htext=3;

proc gplot data=sampsio.dmdring;

plot y*x=c /haxis=axis1 vaxis=axis2;

symbol c=black i=none v=dot;

symbol2 c=red i=none v=square;

symbol3 c=green i=none v=triangle;

axis1 c=black width=2.5 order=(0 to 30 by 5);

axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);

run;

The SPLIT statement invokes the procedure. The DATA= option names the

DMDB encoded training data set. The DMDBCAT= option names the DMDB encoded

training catalog.

title2 'Entropy Criterion';

proc split data=sampsio.dmdring

dmdbcat=sampsio.dmdring

The CRITERION = method specifies to use the ENTROPY method of searching

and evaluating candidate splitting rules. The ENTROPY method searches for

splits based on a reduction in entropy measure of node impurity. The default

CRITERION= method for nominal targets is set to PROBCHISQ.

criterion=entropy

The SPLITSIZE= option specifies the smallest number of training observations

a node must have for the procedure to consider splitting it.

splitsize=2

The MAXBRANCH= n option restricts the number of subsets

a splitting rule can produce to n or fewer.

maxbranch=3

The OUTTREE= option names the data set that contains tree information.

outtree=tree;

The INPUT statement specifies the input variables. By default, the measurement

level of the inputs is set to INTERVAL.

input x y;

The TARGET statement specifies the target variable. The LEVEL= option

sets the measurement level to nominal.

target c / level=nominal;

Because the DATA= option is not specified, the SCORE statement scores

the training data set. The OUT= option names the output data set containing

outputs. The OUTFIT= option names the output data set containing fit statistics.

score out=out outfit=fit;

run;

PROC PRINT creates a report of fit statistics for the training data.

proc print data=fit noobs label;

title3 'Fit Statistics for the Training Data';

run;

PROC FREQ creates a misclassification table for the training data. The

F_C variable is the actual target value for each case, and the I_C variable

is the target value into which the case is classified.

proc freq data=out;

tables f_c*i_c;

title3 'Misclassification Table';

run;

PROC GPLOT produces a plot of the classification results for the training

data.

proc gplot data=out;

plot y*x=i_c / haxis=axis1 vaxis=axis2;

symbol c=black i=none v=dot;

symbol2 c=red i=none v=square;

symbol3 c=green i=none v=triangle;

axis1 c=black width=2.5 order=(0 to 30 by 5);

axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);

title3 'Classification Results';

run;

The INTREE= option specifies to read the OUTTREE= decision tree data

set that was created from the previous run of PROC SPLIT.

proc split intree=tree;

The SCORE statement scores the DATA= data set and outputs the results

to the OUT= data set. The ROLE=SCORE option identifies the data set as a score

data set. The ROLE= option primarily affects what fit statistics are computed

and what their names and labels are.

score data=sampsio.dmsring nodmdb role=score out=gridout;

run;

The GCONTOUR procedure creates contour plots of the posterior probabilities.

proc gcontour data=gridout;

plot y*x=p_c1 / pattern ctext=black coutline=gray;

plot y*x=p_c2 / pattern ctext=black coutline=gray;

plot y*x=p_c3 / pattern ctext=black coutline=gray;

title2 'Posterior Probabilities';

pattern v=msolid;

legend frame;

title3 'Posterior Probabilities';

run;

The GPLOT procedure creates a scatter plot of the leaf nodes.

proc gplot data=gridout;

plot y*x=_node_;;

symbol c=blue i=none v=dot;

symbol2 c=red i=none v=square;

symbol3 c=green i=none v=triangle;

symbol4 c=black i=none v=star;

symbol5 c=orange i=none v=plus;

symbol6 c=brown i=none v=circle;

symbol7 c=cyan i=none v==;

symbol8 c=black i=none v=hash;

symbol9 c=gold i=none v=:;

symbol10 c=yellow i=none v=x;

title3 'Leaf Nodes';

run;

Yüklə 3,07 Mb.

Dostları ilə paylaş:

1 ... 126 127 128 129 130 131 132 133 ... 148