The arboretum procedure



Yüklə 3.07 Mb.

səhifə75/148
tarix30.04.2018
ölçüsü3.07 Mb.
1   ...   71   72   73   74   75   76   77   78   ...   148

The DMREG Procedure

Example 3: Comparison of the DMREG and LOGISTIC Procedures

when Using a Categorical Input Variable

Example Features

Creating the Design Matrix Data Set for Classification Inputs

q   

Comparing the Results of the Procedures



q   

This example provides a comparison of the DMREG and LOGISTIC procedures when using a categorical input to model a binary target.

The example data set SAMPSIO.HMEQ contains fictitious mortgage data where each case represents an applicant for a home equity loan.

All applicants have an existing mortgage.

The binary target BAD represents whether or not an applicant eventually defaulted or was ever seriously delinquent. There are nine

continuous inputs available for modeling. JOB is the only categorical input used to predict the target BAD.

When you compare the output from the DMREG and LOGISTIC procedures code, you must take into consideration how each procedure

handles the categorical variables. By default, DMREG uses a deviations from the means coding to code the classification variables. The

design matrix for the class effects has values of 0, 1, and -1 for the reference levels. This coding is sometimes referred to as "effects",

"center-point", and "full-rank" coding. The parameters for these categorical indicators measure the difference from each level to the average

across levels.

Because the LOGISTIC procedure does not enable you to specify class inputs directly in the MODEL statement, you must first create an

input data set that contains the design matrix for the class variables. To create the design matrix data set for input to the LOGISTIC

procedure, you can use a SAS DATA step, a TRANSREG procedure, or a GENMOD procedure. If you use the deviations from the means

coding method to code the class variables, then the LOGISTIC output will automatically match the output generated from the DMREG run.

If you use the GLM non-full rank coding (0, 1) to code the class variables, you must set the DMREG CODE= MODEL statement option in

GLM. In this case, both procedures will generate the same output.

Program: Deviations from the Mean Coding

 

proc freq data=sampsio.hmeq;



   tables job / missing;

   title 'JOB Classification Table';

run;

 

data hmeq;



  set sampsio.hmeq;

   if job = ' ' then job='Other';

run;

 

proc transreg data=hmeq design;



 

  

   model class (job/deviations);



 

   


   id bad loan mortdue value yoj derog clage ninq clno debtinc;

   output;

run;



 

/*

data dumyhmeq;



   set hmeq;

   select (job);

    when ('Mgr')

     do;


      j_mgr=1;

      j_off=0;

      j_other=0;

      j_prof=0;

      j_sales=0;

      j_self=-1;

     end;

   when ('Office')

    do;

      j_mgr=0;



      j_off=1;

      j_other=0;

      j_prof=0;

      j_sales=0;

      j_self=-1;

    end;


  when ('Other')

    do;


      j_mgr=0;

      j_off=0;

      j_other=1;

      j_prof=0;

      j_sales=0;

      j_self=-1;

    end;

   when ('ProfExe')

    do;

      j_mgr=0;



      j_off=0;

      j_other=0;

      j_prof=1;

      j_sales=0;

      j_self=-1;

    end;


 when ('Sales') 

    do;


      j_mgr=0;

      j_off=0;

      j_other=0;

      j_prof=0;

      j_sales=1;

      j_self=-1;

    end;

   when ('Self') 

    do;

      j_mgr=-1;



      j_off=-1;

      j_other=-1;

      j_prof=-1;



      j_sales=-1;

      j_self=-1;

    end;

     otherwise;

end;

run;


*/

 

proc logistic descending;



   model bad = &_trgind loan mortdue value yoj

               derog clage ninq clno debtinc;

  title 'LOGISTIC Home Equity Data: Deviations from the Mean Coding';

run;


 

proc dmdb batch data=hmeq

          out=dm_data dmdbcat=dm_cat;

   var loan mortdue value yoj derog

       clage ninq clno debtinc;

   class bad(desc)

         job(asc);

   target bad;

run;

 

proc dmreg data=dm_data



           dmdbcat=dm_cat;

   class bad job;

   model bad = job loan mortdue value yoj derog

               clage ninq clno debtinc;

   title1 'DMREG Home Equity Data: 

           Default Deviations from the Mean Coding';

run;

Output: Deviations from the Mean Coding

FREQ Classification Table for JOB.

The categorical input JOB contains 7 levels. Notice that 279 cases have missing values. Both the DMREG and LOGISTIC procedures omit

observations that have missing values from the analysis. For this example, the missing values are imputed using the mode of JOB.



LOGISTIC Output


DMREG Output

Notice that the DMREG output matches the output generated from the LOGISTIC run.







Dostları ilə paylaş:
1   ...   71   72   73   74   75   76   77   78   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə