Chapter

Contents
Previous

Next

*The DMREG Procedure*
**Details**
**Input**
The input to the DMREG procedure can be assigned one of these roles:

Training

The DATA= data set is used to fit the initial model.

Validation

The VALIDATA= data set is used to compute assessment statistics and to fine-tune the model

during stepwise selection.

Test

The TESTDATA= data set is an additional "hold out" data set that you can use to compute

assessment statistics.

Score

The DATA= data set in the SCORE statement is used for predicting target values for a new data

set that may not contain the target.

**Specification of Effects**
Different types of effects can be used in the DMREG procedure. In the following list, assume that A, B,

and C are class variables and that X1, X2, and Y are continuous variables:

Regressor effects are specified by writing continuous variables individually:

X1 X2

1.

Polynomial effects are specified by joining two or more continuous variables with asterisks:

X1*X1 X1*X2

2.

Main effects are specified by writing class variables individually:

AC

3.

Crossed effects (interactions) are specified by joining class variables with asterisks:

A*BB*CA*B*C

4.

Continuous-by-class effects are written by joining continuous variables and class variables with

asterisks:

X1*A.

5.

**Note:** Nested effects are not supported.

**Optimization Methods**
The following table provides a list of the general nonlinear optimization methods and the default

maximum number of iterations and function calls for each method.

**Optimization Methods for the Regression**

**node.**

**Optimization**

**Method**

**Maximum**

**Iterations**

**Maximum**

**Function**

**Calls**

Conjugate

Gradient

400

1000

Double Dogleg

200

500

Newton-Raphson

with Line Search

50

125

Newton-Raphson

with Ridging

50

125

Quasi-Newton

200

500

Trust-Region

50

125

You should set the optimization method based on the size of the data mining problem, as follows:

Small-to-medium problems - The Trust-Region, Newton-Raphson with Ridging, and

Newton-Raphson with Line Search methods are appropriate for small and medium sized

optimization problems (number of model parameters up to 40) where the Hessian matrix is easy

and cheap to compute. Sometimes, Newton-Raphson with Ridging can be faster than

Trust-Region, but Trust-Region is numerically more stable. If the Hessian matrix is not singular at

the optimum, then the Newton-Raphson with Line Search can be a very competitive method.

1.

Medium Problems - The quasi-Newton and Double Dogleg methods

are appropriate for medium
optimization problems (number of model parameters up to 400) where the objective function and

the gradient are must faster to compute than the Hessian. Quasi-Newton and Double Dogleg

require more iterations than does the Trust-Region or the Newton-Raphson methods, but each

iteration is much faster.

2.

Large Problems - The Conjugate Gradient method is appropriate for

large data mining problems
(number of model parameters greater than 400) where the objective function and the gradient are

much faster to compute than the Hessian matrix, and where they need too much memory to store

the approximate Hessian matrix.

3.

**Note:** To learn about these optimization methods, see the **SAS/OR Technical Report: The NLP**

**Procedure** (1997).

The underlying "Default" optimization entry method depends on the number of parameters in the model.

If the number of parameters is less than or equal to 40, then the default method

is set to Newton-Raphson
with Ridging. If the number of parameters is greater than 40 and less than 400, then the default method is

set to quasi-Newton. If the number of parameters is greater than 400, then Conjugate Gradient is the

default method.

**Fit Statistics for OUTEST and OUTFIT Data Sets**
The OUTEST= data set in the PROC DMREG statement contains fit statistics for the training, test,

and/or validation data. Depending on the ROLE= option in the SCORE statement, the OUTFIT= data set

contains fit statistics for either the training, test, or validation data.

**Fit Statistics for the Training**
**Data**
**Fit**
**Statistic**
**Training Data**
_AIC_

Train: Akaike's

Information

Criterion

_ASE_

Train:

Average
Squared Error

_AVERR_ Train: Average

Error Function

_DFE_

Train: Degrees

of Freedom for

Error

_DFM_

Train: Model

Degrees of

Freedom

_DFT_

Train: Total

Degrees of

Freedom

_DIV_

Train: Divisor

for ASE

_ERR_

Train: Error

Function

_FPE_

Train: Final

Prediction Error