Chapter
Contents
Previous
Next
The DMREG Procedure
Details
Input
The input to the DMREG procedure can be assigned one of these roles:
Training
The DATA= data set is used to fit the initial model.
Validation
The VALIDATA= data set is used to compute assessment statistics and to fine-tune the model
during stepwise selection.
Test
The TESTDATA= data set is an additional "hold out" data set that you can use to compute
assessment statistics.
Score
The DATA= data set in the SCORE statement is used for predicting target values for a new data
set that may not contain the target.
Specification of Effects
Different types of effects can be used in the DMREG procedure. In the following list, assume that A, B,
and C are class variables and that X1, X2, and Y are continuous variables:
Regressor effects are specified by writing continuous variables individually:
X1 X2
1.
Polynomial effects are specified by joining two or more continuous variables with asterisks:
X1*X1 X1*X2
2.
Main effects are specified by writing class variables individually:
AC
3.
Crossed effects (interactions) are specified by joining class variables with asterisks:
A*BB*CA*B*C
4.
Continuous-by-class effects are written by joining continuous variables and class variables with
asterisks:
X1*A.
5.
Note: Nested effects are not supported.
Optimization Methods
The following table provides a list of the general nonlinear optimization methods and the default
maximum number of iterations and function calls for each method.
Optimization Methods for the Regression
node.
Optimization
Method
Maximum
Iterations
Maximum
Function
Calls
Conjugate
Gradient
400
1000
Double Dogleg
200
500
Newton-Raphson
with Line Search
50
125
Newton-Raphson
with Ridging
50
125
Quasi-Newton
200
500
Trust-Region
50
125
You should set the optimization method based on the size of the data mining problem, as follows:
Small-to-medium problems - The Trust-Region, Newton-Raphson with Ridging, and
Newton-Raphson with Line Search methods are appropriate for small and medium sized
optimization problems (number of model parameters up to 40) where the Hessian matrix is easy
and cheap to compute. Sometimes, Newton-Raphson with Ridging can be faster than
Trust-Region, but Trust-Region is numerically more stable. If the Hessian matrix is not singular at
the optimum, then the Newton-Raphson with Line Search can be a very competitive method.
1.
Medium Problems - The quasi-Newton and Double Dogleg methods
are appropriate for medium
optimization problems (number of model parameters up to 400) where the objective function and
the gradient are must faster to compute than the Hessian. Quasi-Newton and Double Dogleg
require more iterations than does the Trust-Region or the Newton-Raphson methods, but each
iteration is much faster.
2.
Large Problems - The Conjugate Gradient method is appropriate for
large data mining problems
(number of model parameters greater than 400) where the objective function and the gradient are
much faster to compute than the Hessian matrix, and where they need too much memory to store
the approximate Hessian matrix.
3.
Note: To learn about these optimization methods, see the SAS/OR Technical Report: The NLP
Procedure (1997).
The underlying "Default" optimization entry method depends on the number of parameters in the model.
If the number of parameters is less than or equal to 40, then the default method
is set to Newton-Raphson
with Ridging. If the number of parameters is greater than 40 and less than 400, then the default method is
set to quasi-Newton. If the number of parameters is greater than 400, then Conjugate Gradient is the
default method.
Fit Statistics for OUTEST and OUTFIT Data Sets
The OUTEST= data set in the PROC DMREG statement contains fit statistics for the training, test,
and/or validation data. Depending on the ROLE= option in the SCORE statement, the OUTFIT= data set
contains fit statistics for either the training, test, or validation data.
Fit Statistics for the Training
Data
Fit
Statistic
Training Data
_AIC_
Train: Akaike's
Information
Criterion
_ASE_
Train:
Average
Squared Error
_AVERR_ Train: Average
Error Function
_DFE_
Train: Degrees
of Freedom for
Error
_DFM_
Train: Model
Degrees of
Freedom
_DFT_
Train: Total
Degrees of
Freedom
_DIV_
Train: Divisor
for ASE
_ERR_
Train: Error
Function
_FPE_
Train: Final
Prediction Error