The arboretum procedure



Yüklə 3.07 Mb.

səhifə81/148
tarix30.04.2018
ölçüsü3.07 Mb.
1   ...   77   78   79   80   81   82   83   84   ...   148
: documentation
documentation -> From cyber-crime to insider trading, digital investigators are increasingly being asked to
documentation -> EnCase Forensic Transform Your Investigations
documentation -> File Sharing Documentation Prepared by Alan Halter Created: 1/7/2016 Modified: 1/7/2016
documentation -> Gaia Data Release 1 Documentation release 0

      Absolute    of Squared     Squared      Squared      Divisor

OBS     Error       Errors        Error        Error      for VASE

 1     0.72727      1.52883     .0050961     0.071387        300

Misclassification Table for the Scored Data Set (OUT=)

Only one customer in the test data set was incorrectly classified. Ideally, you should use a mutually exclusive test data

set for validating the tree.

 

                Input Tree and Score Test Data



             Misclassification Table for the Test Data                   

                 TABLE OF F_PURCHA BY I_PURCHA

                      F_PURCHA(Formatted Target Value)

                                I_PURCHA(Predicted Category)

                      Frequency|

                      Percent  |

                      Row Pct  |

                      Col Pct  |No      |Yes     |  Total

                      ----------------------------

                      No       |     64 |      1 |     65

                               |  42.67 |   0.67 |  43.33

                               |  98.46 |   1.54 |

                               | 100.00 |   1.16 |

                      ----------------------------

                      Yes      |      0 |     85 |     85

                               |   0.00 |  56.67 |  56.67

                               |   0.00 | 100.00 |

                               |   0.00 |  98.84 |

                      ----------------------------

                      Total          64       86      150

                                  42.67    57.33   100.00   

Partial PROC PRINT Report of the Score Summary Data Set

 

                   Input Tree and Score Test Data



                               Score Summary Data

          Node       Assessment  Assessment:  Assessment:  Decision  Formatted

     Identification      of        PURCHASE     PURCHASE   Assigned   Target

 OBS     Number      Prediction     = Yes         = No     to Case     Value

   1        21         0.58527     0.41473      0.58527      No         No

   2        21         0.58527     0.41473      0.58527      No         Yes

   3       128         0.89474     0.89474      0.10526      Yes        Yes

   4        21         0.58527     0.41473      0.58527      No         No

   5       128         0.89474     0.89474      0.10526      Yes        Yes



   6        21         0.58527     0.41473      0.58527      No         No

   7        33         0.53237     0.53237      0.46763      Yes        Yes

   8       120         1.00000     1.00000      0.00000      Yes        Yes

   9        33         0.53237     0.53237      0.46763      Yes        Yes

  10        49         1.00000     0.00000      1.00000      No         No

               Predicted: Predicted:            Residual: Residual:

     Predicted  PURCHASE   PURCHASE  Predicted:  PURCHASE  PURCHASE Residual:

 OBS Category     = Yes      = No     PURCHASE    = Yes      = No    PURCHASE

   1    No       0.41473    0.58527    0.58527   -0.41473   0.41473   0.41473

   2    No       0.41473    0.58527    0.58527    0.58527  -0.58527  -0.58527

   3    Yes      0.89474    0.10526    0.89474    0.10526  -0.10526   0.10526

   4    No       0.41473    0.58527    0.58527   -0.41473   0.41473   0.41473

   5    Yes      0.89474    0.10526    0.89474    0.10526  -0.10526   0.10526

   6    No       0.41473    0.58527    0.58527   -0.41473   0.41473   0.41473

   7    Yes      0.53237    0.46763    0.53237    0.46763  -0.46763   0.46763

   8    Yes      1.00000    0.00000    1.00000    0.00000   0.00000   0.00000

   9    Yes      0.53237    0.46763    0.53237    0.46763  -0.46763   0.46763

  10    No       0.00000    1.00000    1.00000    0.00000   0.00000   0.00000

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



 

Before you analyze the data using the DMSPLIT procedure, you must create

the DMDB encoded data set and catalog. For more information about how to do

this, see "Example 1: Getting Started with the DMDB Procedure"

in the DMDB procedure documentation. 

proc dmdb batch data=sampsio.dmexa1 out=dmbexa1 dmdbcat=catexa1;

     id acctnum;

     var  amount income homeval frequent recency age

        domestic apparel;

     class purchase(desc) marital ntitle gender telind

         origin job statecod numcars edlevel;

run;



 

The PROC DMSPLIT statement invokes the procedure. The DATA= option identifies

the DMDB encoded training data set that is used to fit the model. The DMDBCAT=

option identifies the DMDB training data catalog. 

proc dmsplit data=dmbexa1 dmdbcat=catexa1



 

The BINS= option specifies the number of categories in which the range

of each  interval variable is divided for splits.

              bins=30




 

The CHISQ= option specifies a minimum bound for the Chi-Square value

that is still eligible for making a variable split. The value of CHISQ governs

the number of splits that are performed. As you increase the CHISQ value,

the procedure performs fewer splits and passes through the input data.

              chisq=2.00                         





Dostları ilə paylaş:
1   ...   77   78   79   80   81   82   83   84   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə