PROC FREQ creates a frequency table for each character variable. You
can use the information in these tables to determine the mode of each character
variable in the SAMPSIO.HMEQ.
proc freq data=sampsio.hmeq;
tables reason job;
run;
The SAS DATA step imputes the missing character values with the variable's
mode.
data hmeq;
set sampsio.hmeq;
if reason=' ' then reason='DebtCon';
if job=' ' then job='Other';
run;
The REPLACE option signals the STDIZE procedure to replace the missing
values and standardize the numeric variables using the METHOD=MEAN statistic.
Therefore, all the missing values should be replaced by 0.
proc stdize data=hmeq
out=rshmeq
method=mean
replace;
var mortdue value yoj derog delinq
clage ninq clno debtinc;
title 'Impute and Standardize';
run;
PROC PRINT prints the first 10 observations in the OUT= imputed/standardized
data set.
proc print data=rshmeq(obs=10);
title 'Partial Listing of the Imputed/Standardized Data Set';
run;
PROC PRINT prints the first 10 observations in the HMEQ input data
set.
proc print data=hmeq(obs=10);
title 'Partial Listing of the Input Data Set';
run;
The STDIZE Procedure
Example 4: Replacing Missing Values without
Standardizing the Variables
Features:
Using the REPONLY option.
q
This example demonstrates how to replace missing numeric values in the SAMPSIO.HMEQ (home equity) data set.
When
you use the REPONLY option, the STDIZE procedure does not standardize the numeric variables; it replaces
the missing values with the METHOD= location statistic.
Because the STDIZE procedure only accepts numeric variables, a predecessor SAS DATA step is used to replace
the missing character values with the variable's mode.
Program
proc freq data=sampsio.hmeq;
tables reason job;
run;
data hmeq;
set sampsio.hmeq;
if reason=' ' then reason='DebtCon';
if job=' ' then job='Other';
run;
proc stdize data=hmeq
out=replhmeq
method=mean
reponly;
var mortdue value yoj derog delinq
clage ninq clno debtinc;
title 'Impute Missing Numeric Values';
run;
proc print data=replhmeq(obs=10);
title 'Partial Listing of the Imputed Data Set';
run;
proc print data=hmeq(obs=10);
title 'Partial Listing of the Input Data Set';
run;
OUTPUT
PROC FREQ Frequency Table of the Home Equity Character Variables
Partial Listing of the Input Data Set
Cumulative Cumulative
REASON Frequency Percent Frequency Percent
-----------------------------------------------------
DebtCon 3928 68.8 3928 68.8
HomeImp 1780 31.2 5708 100.0
Frequency Missing = 252
Cumulative Cumulative
JOB Frequency Percent Frequency Percent
-----------------------------------------------------
Mgr 767 13.5 767 13.5
Office 948 16.7 1715 30.2
Other 2388 42.0 4103 72.2
ProfExe 1276 22.5 5379 94.7
Sales 109 1.9 5488 96.6
Self 193 3.4 5681 100.0
Frequency Missing = 279
PROC PRINT Listing of the Imputed Data Set and the Input Data Set
Partial Listing of the Imputed Data Set
OBS BAD LOAN MORTDUE VALUE
REASON JOB YOJ
1 1 1100 25860.00 39025.00 HomeImp Other 10.5000
2 1 1300 70053.00 68400.00 HomeImp Other 7.0000
3 1 1500 13500.00 16700.00 HomeImp Other 4.0000
4 1 1500 73760.82 101776.05 DebtCon Other 8.9223
5 0 1700 97800.00 112000.00 HomeImp Office 3.0000
6 1 1700 30548.00 40320.00 HomeImp Other 9.0000
7 1 1800 48649.00 57037.00 HomeImp Other 5.0000
8 1 1800 28502.00 43034.00 HomeImp Other 11.0000
9 1 2000 32700.00 46740.00 HomeImp Other 3.0000
10 1 2000 73760.82 62250.00 HomeImp Sales 16.0000
OBS DEROG DELINQ CLAGE NINQ CLNO DEBTINC
1 0.00000 0.00000 94.367 1.00000 9.0000 33.7799
2 0.00000 2.00000 121.833 0.00000 14.0000 33.7799
3 0.00000 0.00000 149.467 1.00000 10.0000 33.7799
4 0.25457 0.44944 179.766 1.18606 21.2961 33.7799
5 0.00000 0.00000 93.333 0.00000 14.0000 33.7799
6 0.00000 0.00000 101.466 1.00000 8.0000 37.1136
7 3.00000 2.00000 77.100 1.00000 17.0000 33.7799
8 0.00000 0.00000 88.766 0.00000 8.0000 36.8849
9 0.00000 2.00000 216.933 1.00000 12.0000 33.7799
10 0.00000 0.00000 115.800 0.00000 13.0000 33.7799
Partial Listing of the Input Data Set
M D
O R D E
R V E D E C B
L T A A E L L N C T
O B O D L S J Y R I A I L I
B A A U U O O O O N G N N N
S D N E E N B J G Q E Q O C
1 1 1100 25860 39025 HomeImp Other 10.5 0 0 94.367 1 9 .
2 1 1300 70053 68400 HomeImp Other 7.0 0 2 121.833 0 14 .
3 1 1500 13500 16700 HomeImp Other 4.0 0 0 149.467 1 10 .
4 1 1500 . . DebtCon Other . . . . . . .
5 0 1700 97800 112000 HomeImp Office 3.0 0 0 93.333 0 14 .
6 1 1700 30548 40320 HomeImp Other 9.0 0 0 101.466 1 8 37.1136
7 1 1800 48649 57037 HomeImp Other 5.0 3 2 77.100 1 17 .
8 1 1800 28502 43034 HomeImp Other 11.0 0 0 88.766 0 8 36.8849
9 1 2000 32700 46740 HomeImp Other 3.0 0 2 216.933 1 12 .
10 1 2000 . 62250 HomeImp Sales 16.0 0 0 115.800 0 13 .
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.