The TARGET statement identifies the target (response) variable.
target bad;
run;
The DMDB Procedure
Example 2: Specifying a FREQ Variable
Features Specifying a FREQ variable with the FREQ Statement
This example demonstrates how to define a FREQ variable in the DMDB data set and catalog. A FREQ
variable represents the frequency of occurrence for other values in each observation of the input data set.
The DATA step required to create the WORK.FREQEX input data set is provided.
Program
data freqex;
input count X1 X2 X3 Y ;
datalines;
3 -0.17339 -0.04926 -0.61599 0
2 -1.51586 0.31526 -1.65430 1
1 1.04348 0.64517 -0.06878 0
1 -1.74298 0.02592 -0.71203 1
1 0.07806 1.45284 -0.39064 1
4 0.20073 0.22533 -0.44507 0
1 -0.08887 -1.24641 -0.73156 0
1 0.10309 0.88542 -1.63595 1
2 -0.57030 -1.35613 -1.58209 0
1 -1.39170 -1.22333 1.98124 1
2 0.51356 -0.36128 0.77962 0
1 -0.89216 -0.01054 -0.76720 0
1 -0.09882 1.43263 0.53820 0
3 0.03225 -0.17737 0.25381 0
1 -0.14203 -1.64183 -0.34028 0
1 -0.24436 -0.83537 -2.00245 0
2 -0.78277 0.00284 -0.75016 0
1 0.77732 -0.28847 -0.77437 0
1 1.55172 -0.21167 -0.53833 0
2 -0.74054 -1.23276 0.11452 1
run;
proc dmdb batch data=freqex out=dmfout dmdbcat=outfcat;
var x1 x2 x3;
class y(desc);
target y;
freq count;
run;
Log
1 data freqex;
2 input count X1 X2 X3 Y ;
3 datalines;
NOTE: The data set WORK.FREQEX has 20 observations and 5 variables.
NOTE: The DATA statement used 0:00:00.44 real 0:00:00.15 cpu.
24 run;
25 proc dmdb batch data=freqex out=dmfout dmdbcat=outfcat;
26 var x1 x2 x3;
27 class y(desc);
28 target y;
29
30 freq count;
31 run;
Records processed= 20 Mem used = 511K.
NOTE: The PROCEDURE DMDB used 0:00:00.92 real 0:00:00.27 cpu.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The FREQ statement specifies the numeric variable that contains the
frequency of each observation.
freq count;
run;
The DMINE Procedure
The DMINE Procedure
Overview
Procedure Syntax
PROC DMINE Statement
FREQ Statement
TARGET Statement
VARIABLES Statement
WEIGHT Statement
Details
Examples
Example 1: Modeling a Continuous Target with the DMINE Procedure (Simple Selection Settings)
Example 2: Including the AOV16 and Grouping Variables into the Analysis (Detailed Selection
Settings)
Example 3: Modeling a Binary Target with the DMINE Procedure
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMINE Procedure
Overview
Many data mining databases have hundreds of potential model inputs (independent variables). The
DMINE procedure enables you to quickly identify the input variables that are useful for predicting the
target variable(s) based on a linear models framework. The procedure facilitates ordinary least squares or
logistic regression methods. (Logistic regression is a form of regression analysis in which the response
variable represents a binary or ordinal-level response.)
PROC DMINE and PROC DMSPLIT are underlying procedures for the Variable Selection node.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMINE Procedure
Procedure Syntax
PROC DMINE < option(s)>;
FREQ variable;
TARGET variable;
VARIABLES variable-list;
WEIGHT variable;
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMINE Procedure
PROC DMINE Statement
Invokes the DMINE procedure.
PROC DMINE < option(s)>;
Required Arguments
DATA= SAS-data-set
Identifies the input data set generated by PROC DMDB. The data set is associated with a valid
catalog specified by the DMDBCAT= option. This option must be specified; no default is
permitted. The DATA= data set must contain interval scaled variables and CLASS variables in a
specific form written by PROC DMDB.
Default:
none
DMDBCAT= SAS-catalog
Identifies an input catalog of meta information generated by PROC DMDB. The information is
associated with a valid data set specified by the DATA= option. The catalog contains important
information (for example, the range of variables, number of missing values of each variable,
moments of variables) that is used by many other Enterprise Miner procedures that require a
DMDB data set. The DMDBCAT= catalog and the DATA= data set must be appropriately related
to each other in order to obtain proper results.
Default:
None.
Options
NOAOV16
By default, the DMINE procedure creates the AOV16 variables, calculates their R-squares with
the target variable, and then uses the remaining significant variables in the final forward stepwise
selection process. The interval scaled variables are grouped into categories to create the AOV16
variables. The range of interval scaled variables can be equally divided into 16 categories and each
observation (value) of the variable is then mapped into one of these categories. The NOAOV16
option prevents the procedure from including the AOV16 variables in the final stepwise selection
process. Note that the R-square value is calculated for each AOV16 variable even if you specify
the NOAOV16 option.
Dostları ilə paylaş: |