The DMDB Procedure
Details
The data mining database (DMDB) is maintained as a SAS data set. The metadata information
associated with the DMDB is maintained in a SAS catalog. Metadata includes overall data set
information as well as statistical information for the variables according to their roles. For each CLASS
variable, the metadata contains information on each of the following: its class level value, its frequency,
and its ordering information. In the DMDB, the CLASS variables are stored as integers 0, 1, 2, ..., which
can be mapped into different class level values.
For each VAR variable, the metadata catalog contains the following statistics:
N
The number of observations with nonmissing values of the variable
NMISS
The number of observations with missing values of the variable
MIN
The minimum
MAX
The maximum
SUM
The sum of all the nonmissing values of the variable
SUMWGT
The sum of weights
CSS
The corrected sum of squares
USS
The uncorrected sum of squares
STD
The standard deviation
SKEWNESS
Measure of the tendency for the distribution of values to be more spread out on one side of the
mean than on the other
KURTOSIS
Measure of the "heaviness of the tails"
(Refer to the SAS Procedures Guide, Chapter 1 for formulas and other details.)
DMDBs are only created for training data and should not be used for validation or test during modeling.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMDB Procedure
Examples
The following examples were executed using the HP-UX version 10.20 operating system and the SAS
software release 6.12TS045.
Example 1: Getting Started with the DMDB Procedure
Example 2: Specifying a FREQ Variable
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMDB Procedure
Example 1: Getting Started with the DMDB
Procedure
Features:
Specifying the Output DMDB Data Set and Catalog
q
Defining the Numeric Variables in a VAR Statement
q
Defining the Class Variables in a Class Statement
q
Setting the Order of the Class Variables
q
Defining the Target Variable in a Target Statement
q
This example demonstrates how to create a data mining database (DMDB) data set and catalog. The
example uses the fictitious mortgage data set name SAMPSIO.HMEQ. The data set contains 5,960
cases. Each case represents an applicant for a home equity loan. All applicants have an existing
mortgage. The binary target BAD indicates whether or not an applicant eventually defaulted or was ever
seriously delinquent. There are ten numeric inputs and two class inputs available for subsequent
modeling.
Program
proc dmdb batch data=sampsio.hmeq
out=dmhmeq
dmdbcat=cathmeq;
var loan derog mortdue value yoj delinq
clage ninq clno debtinc;
class bad(desc)
reason(ascending)
job;
target bad;
run;
Log
1 proc dmdb batch data=sampsio.hmeq
2
3 out=dmhmeq
4 dmdbcat=cathmeq;
5
6 var loan derog mortdue value yoj delinq
7 clage ninq clno debtinc;
8
9 class bad(desc)
10 reason(ascending)
11 job;
12
13 target bad;
14 run;
Records processed= 5960 Mem used = 511K.
NOTE: The PROCEDURE DMDB used 0:00:08.30 real 0:00:02.85 cpu.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The PROC DMDB statement invokes the procedure. The BATCH option requests
the creation of a new DMDB catalog. The DATA= option specifies the input data
set.
proc dmdb batch data=sampsio.hmeq
The OUT= option specifies the name of the output DMDB data set. The
DMDBCAT= option specifies the name of the output DMDB catalog.
out=dmhmeq
dmdbcat=cathmeq;
The VAR statement identifies the numeric analysis variables. If you
omit the VAR statement, PROC DMDB analyzes all numeric variables not listed
in other statements.
var loan derog mortdue value yoj delinq
clage ninq clno debtinc;
The CLASS statement specifies the categorical variables to be used in
the analysis. The ORDER option specifies the order to use when considering
the levels of the classification variables. Valid ORDER options include ASCENDING
(ASC), DESCENDING (DESC), ASCFORMATTED (ASCFMT), DESFORMATTED (DESFMT), or
DSORDER (DATA). The default for the ORDER option is set to ASCENDING.
class bad(desc)
reason(ascending)
job;
Dostları ilə paylaş: |