The arboretum procedure



Yüklə 3,07 Mb.
Pdf görüntüsü
səhifə130/148
tarix30.04.2018
ölçüsü3,07 Mb.
#40673
1   ...   126   127   128   129   130   131   132   133   ...   148

 

PROC GPLOT creates a scatter plot of the Rings training data.

title  'SPLIT Example: RINGS Data';

title2  'Plot of the Rings Training Data';

goptions gunit=pct ftext=swiss ftitle=swissb htitle=4 htext=3;

proc gplot data=sampsio.dmdring;

   plot y*x=c /haxis=axis1 vaxis=axis2;

   symbol  c=black i=none v=dot;

   symbol2 c=red i=none v=square;

   symbol3 c=green i=none v=triangle;

   axis1 c=black width=2.5 order=(0 to 30 by 5);

   axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);

run;



 

The SPLIT statement invokes the procedure. The DATA= option names the

DMDB encoded training data set. The DMDBCAT= option names the DMDB encoded

training catalog.

title2 'Entropy Criterion';

proc split data=sampsio.dmdring

           dmdbcat=sampsio.dmdring



 

The CRITERION = method specifies to use the ENTROPY method of searching

and evaluating candidate splitting rules. The ENTROPY method searches for

splits based on a reduction in entropy measure of node impurity. The default

CRITERION= method for nominal targets is set to PROBCHISQ.

           criterion=entropy




 

The SPLITSIZE= option specifies the smallest number of training observations

a node must have for the procedure to consider splitting it.

           splitsize=2




 

The MAXBRANCH= n option restricts the number of subsets

a splitting rule can produce to n or fewer. 

           maxbranch=3




 

The OUTTREE= option names the data set that contains tree information. 

           outtree=tree;



 

The INPUT statement specifies the input variables. By default, the measurement

level of the inputs is set to INTERVAL.

   input x y;




 

The TARGET statement specifies the target variable. The LEVEL= option

sets the measurement level to nominal.

   target c / level=nominal;




 

Because the DATA= option is not specified, the SCORE statement scores

the training data set. The OUT= option names the output data set containing

outputs. The OUTFIT= option names the output data set containing fit statistics.

   score out=out outfit=fit;

run;



 

PROC PRINT creates a report of fit statistics for the training data.

proc print data=fit noobs label;

  title3 'Fit Statistics for the Training Data';

run;



 

PROC FREQ creates a misclassification table for the training data. The

F_C variable is the actual target value for each case, and the I_C variable

is the target value into which the case is classified.

proc freq data=out;

  tables f_c*i_c;

  title3 'Misclassification Table';

run;



 

PROC GPLOT produces a plot of the classification results for the training

data.

proc gplot data=out;



   plot y*x=i_c / haxis=axis1 vaxis=axis2;

   symbol  c=black i=none v=dot;

   symbol2 c=red i=none v=square;

   symbol3 c=green i=none v=triangle;

   axis1 c=black width=2.5 order=(0 to 30 by 5);

   axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);

   title3 'Classification Results';

run;



 

The INTREE= option specifies to read the OUTTREE= decision tree data

set that was created from the previous run of PROC SPLIT.

proc split intree=tree;




 

The SCORE statement scores the DATA= data set and outputs the results

to the OUT= data set. The ROLE=SCORE option identifies the data set as a score

data set. The ROLE= option primarily affects what fit statistics are computed

and what their names and labels are.

   score data=sampsio.dmsring nodmdb role=score out=gridout;

run;



 

The GCONTOUR procedure creates contour plots of the posterior probabilities.

proc gcontour data=gridout;

   plot y*x=p_c1 / pattern ctext=black coutline=gray;

   plot y*x=p_c2 / pattern ctext=black coutline=gray;

   plot y*x=p_c3 / pattern ctext=black coutline=gray;

   title2 'Posterior Probabilities';

   pattern v=msolid;

   legend frame;

title3 'Posterior Probabilities';

run;



 

The GPLOT procedure creates a scatter plot of the leaf nodes.

proc gplot data=gridout;

   plot y*x=_node_;;

   symbol  c=blue i=none v=dot;

   symbol2 c=red i=none v=square;

   symbol3 c=green i=none v=triangle;

   symbol4 c=black i=none v=star;

   symbol5 c=orange i=none v=plus;

   symbol6 c=brown i=none v=circle;

   symbol7 c=cyan i=none v==;

   symbol8 c=black i=none v=hash;

   symbol9 c=gold i=none v=:;

   symbol10 c=yellow i=none v=x;

   title3 'Leaf Nodes';

run;



Yüklə 3,07 Mb.

Dostları ilə paylaş:
1   ...   126   127   128   129   130   131   132   133   ...   148




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə