The arboretum procedure

Yüklə 3.07 Mb.

ölçüsü3.07 Mb.
1   ...   34   35   36   37   38   39   40   41   ...   148
: documentation
documentation -> From cyber-crime to insider trading, digital investigators are increasingly being asked to
documentation -> EnCase Forensic Transform Your Investigations
documentation -> File Sharing Documentation Prepared by Alan Halter Created: 1/7/2016 Modified: 1/7/2016
documentation -> Gaia Data Release 1 Documentation release 0


The TARGET statement identifies the target (response) variable.

   target bad;


The DMDB Procedure

Example 2: Specifying a FREQ Variable

Features Specifying a FREQ variable with the FREQ Statement

This example demonstrates how to define a FREQ variable in the DMDB data set and catalog. A FREQ

variable represents the frequency of occurrence for other values in each observation of the input data set.

The DATA step required to create the WORK.FREQEX input data set is provided.


data freqex;

 input  count X1 X2  X3  Y ;


 3     -0.17339    -0.04926    -0.61599    0

 2     -1.51586     0.31526    -1.65430    1

 1      1.04348     0.64517    -0.06878    0

 1     -1.74298     0.02592    -0.71203    1

 1      0.07806     1.45284    -0.39064    1

 4      0.20073     0.22533    -0.44507    0

 1     -0.08887    -1.24641    -0.73156    0

 1      0.10309     0.88542    -1.63595    1

 2     -0.57030    -1.35613    -1.58209    0

 1     -1.39170    -1.22333     1.98124    1

 2      0.51356    -0.36128     0.77962    0

 1     -0.89216    -0.01054    -0.76720    0

 1     -0.09882     1.43263     0.53820    0

 3      0.03225    -0.17737     0.25381    0

 1     -0.14203    -1.64183    -0.34028    0

 1     -0.24436    -0.83537    -2.00245    0

 2     -0.78277     0.00284    -0.75016    0

 1      0.77732    -0.28847    -0.77437    0

 1      1.55172    -0.21167    -0.53833    0

 2     -0.74054    -1.23276     0.11452    1


proc dmdb batch data=freqex out=dmfout dmdbcat=outfcat;

   var x1 x2 x3;

   class y(desc);

   target y;


   freq count;



1  data freqex;

2   input  count X1 X2  X3  Y ;

3   datalines;

NOTE: The data set WORK.FREQEX has 20 observations and 5 variables.

NOTE: The DATA statement used 0:00:00.44 real 0:00:00.15 cpu.

24  run;

25  proc dmdb batch data=freqex out=dmfout dmdbcat=outfcat;

26     var x1 x2 x3;

27     class y(desc);

28     target y;


30     freq count;

31  run;

Records processed=      20  Mem used = 511K.

NOTE: The PROCEDURE DMDB used 0:00:00.92 real 0:00:00.27 cpu.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.


The FREQ statement specifies the numeric variable that contains the

frequency of each observation.

   freq count;


The DMINE Procedure

The DMINE Procedure


Procedure Syntax

PROC DMINE Statement

FREQ Statement

TARGET Statement


WEIGHT Statement



Example 1: Modeling a Continuous Target with the DMINE Procedure (Simple Selection Settings)

Example 2: Including the AOV16 and Grouping Variables into the Analysis (Detailed Selection


Example 3: Modeling a Binary Target with the DMINE Procedure

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The DMINE Procedure


Many data mining databases have hundreds of potential model inputs (independent variables). The

DMINE procedure enables you to quickly identify the input variables that are useful for predicting the

target variable(s) based on a linear models framework. The procedure facilitates ordinary least squares or

logistic regression methods. (Logistic regression is a form of regression analysis in which the response

variable represents a binary or ordinal-level response.)

PROC DMINE and PROC DMSPLIT are underlying procedures for the Variable Selection node.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The DMINE Procedure

Procedure Syntax

PROC DMINE <option(s)>;

FREQ variable;

TARGET variable;

VARIABLES variable-list;

WEIGHT variable;

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The DMINE Procedure

PROC DMINE Statement

Invokes the DMINE procedure.

PROC DMINE <option(s)>;

Required Arguments

DATA= SAS-data-set

Identifies the input data set generated by PROC DMDB. The data set is associated with a valid

catalog specified by the DMDBCAT= option. This option must be specified; no default is

permitted. The DATA= data set must contain interval scaled variables and CLASS variables in a

specific form written by PROC DMDB.



DMDBCAT= SAS-catalog

Identifies an input catalog of meta information generated by PROC DMDB. The information is

associated with a valid data set specified by the DATA= option. The catalog contains important

information (for example, the range of variables, number of missing values of each variable,

moments of variables) that is used by many other Enterprise Miner procedures that require a

DMDB data set. The DMDBCAT= catalog and the DATA= data set must be appropriately related

to each other in order to obtain proper results.





By default, the DMINE procedure creates the AOV16 variables, calculates their R-squares with

the target variable, and then uses the remaining significant variables in the final forward stepwise

selection process. The interval scaled variables are grouped into categories to create the AOV16

variables. The range of interval scaled variables can be equally divided into 16 categories and each

observation (value) of the variable is then mapped into one of these categories. The NOAOV16

option prevents the procedure from including the AOV16 variables in the final stepwise selection

process. Note that the R-square value is calculated for each AOV16 variable even if you specify

the NOAOV16 option.

Dostları ilə paylaş:
1   ...   34   35   36   37   38   39   40   41   ...   148

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2017
rəhbərliyinə müraciət

    Ana səhifə