The arboretum procedure



Yüklə 3.07 Mb.

səhifə39/148
tarix30.04.2018
ölçüsü3.07 Mb.
1   ...   35   36   37   38   39   40   41   42   ...   148
: documentation
documentation -> From cyber-crime to insider trading, digital investigators are increasingly being asked to
documentation -> EnCase Forensic Transform Your Investigations
documentation -> File Sharing Documentation Prepared by Alan Halter Created: 1/7/2016 Modified: 1/7/2016
documentation -> Gaia Data Release 1 Documentation release 0

Definition:

The DMINE procedure organizes numeric variables into 16 equally-spaced

groups or bins called AOV16 variables. The AOV16 variables are created to

help identify non-linear relationships with the target. Bins that have zero

observations are eliminated; therefore, an AOV16 variable can have fewer

than 16 bins.



Default:

Create the AOV16 variables. Note that there is not an AOV16 option, only a

NOAOV16 option to prevent these variables from being used in the final

forward stepwise selection process.



NOINTER

Specifies not to consider interactions between categories (that is, a two-way interaction) of

CLASS variables in the process of variable selection.

Definition:

A two-way interaction measures the effect of a classification input variable

across the levels of another classification variable. For example, credit

worthiness may not be consistent across job classifications. The lack of

uniformity in the response may signify a credit worthiness by job interaction.

Default:

Two-way interactions between categories of the class variables are

considered in the variable selection process. Note that the two-way

interactions can dramatically increase the processing time of the DMINE

procedure.

MAXROWS=value

Specifies the upper bound for the number of independent variables selected for the model. This is

an upper bound for the number of rows and columns of the X'X matrix of the regression problem.

Default:

3000. This means that for most models, the MINR2 and STOPR2 settings

will determine the number of selected independent variables. The X'X matrix

used for the stepwise regression requires 

 double precision

values storage in RAM, where n is the number of rows in the matrix. (This

corresponds to 3000 * 1500 * 8 bytes (which is about 36 megabytes) of RAM

needed for storage.)



MINR2=value

Specifies a lower bound for the individual R-square value of a variable to be eligible for the model

selection process. Variables with R-square values greater than or equal to value are included in the

selection process.



Definition:

R-square is the ratio of the model sum of squares (SS) to the total sum of

squares. It measures the sequential improvement in the model as input

variables are selected.



Default:


NOMONITOR

Suppresses the output of the status monitor that indicates the progress made in the computations.



Default:

The output of the status monitor is displayed.



NOPRINT

Suppresses all output printed in the output window.



Default:

The output is printed to the output window.



STOPR2=value

Specifies a lower value for the incremental model R-square value at which the variable selection

process is stopped.

Default:

USEGROUPS

PROC DMINE automatically tries to reduce the levels of each class variable to a group variable

based on the relationship with the target. By doing so, observations of class variables with many

categories (for example, ZIP codes) can be mapped into groups of fewer categories. If you specify

the USEGROUPS option, and a class variable can be reduced to a group variable, then only the

group version of the variable is considered in the model. If you omit the USEGROUPS option,

then both the group variable and the original class variable are allowed in the model.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The DMINE Procedure

FREQ Statement

Alias: FREQUENCY

Tip: Specify the FREQ variable in PROC DMDB so that the information is saved in the catalog and

so that the variable is automatically used as a FREQ variable in PROC DMINE. This also

ensures that the FREQ variable is automatically used by all other Enterprise Miner procedures

in the project.



FREQ variable;

Required Argument

variable

Specifies one numeric (interval-scaled) FREQUENCY variable.



Range:

Any integer. A noninteger value is truncated.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The DMINE Procedure

TARGET Statement

TARGET variable;

Required Argument

variable

Specifies the output variable. One variable name can be specified identifying the target (response)

variable for the two regressions.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The DMINE Procedure

VARIABLES Statement

Alias: VAR

VARIABLES variable-list;

Required Argument

variable-list

Specifies all the variables (numeric and categorical, that is, INTERVAL and CLASS) that can be

used for independent variables in the prediction or modeling of the target variable.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.





Dostları ilə paylaş:
1   ...   35   36   37   38   39   40   41   42   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə