The order statistics approach for estimating quantiles is faster than the P2 method,
but it requires that the
entire data be stored in memory. The accuracy in estimating the quantiles are comparable for both
methods. The default is PCTLMTD=ORD_STAT if enough memory is available; otherwise,
PCTLMTD= P2.
Missing Values
Missing values can be replaced by the LOCATION measure or by any specified constant (see the
REPLACE option and the MISSING= option). You can also suppress standardization if you only want to
replace missing values (see the REPONLY option).
If the NOMISS option is used, PROC STDIZE omits observations that have any missing values in the
analyzed variables from computation of the location and scale measures. Otherwise, all nonmissing
values are used.
Output Data Sets
OUT=Data-Set
The output data set is a copy of the DATA= data set except that the analyzed variables (those in the
VAR statement, or if there is no VAR statement, all numeric variables not listed in any other statement)
have been standardized.
OUTSTAT=Data-Set
The new data set contains the following variables:
the BY variables, if any;
q
a new variable, _TYPE_, a character variable;
q
the variables analyzed, that is,
those in the VAR statement, or if there is no VAR statement, all
numeric variables not listed in any other statement.
q
Each observation in the new data set contains some type of statistic as indicated by the _TYPE_ variable.
The values of the _TYPE_ variable are as follows:
Values of the _TYPE_ Variable
_TYPE_
Contents
LOCATION
Location measure of
each variable.
SCALE
Scale measure of each
variable.
ADD
Constant from ADD=.
This value is the same
for each variable.
MULT
Constant from
MULT=. This value is
the same for each
variable.
N
Total number of
nonmissing positive
frequencies of each
variable.
NORM
Norm measure of each
variable. This
observation is
produced only if either
the NORM option is
specified and
METHOD= AGK,
IQR, MAD, or
SPACING or when the
SNORM option is
specified and
METHOD=SPACING.
Pn
Percentiles of each
variable specified by
PCTLPTS= where n is
any real number such
that 0 n 100.
Displayed Output
If you specify the PSTAT option, PROC STDIZE displays the following statistics for each variable:
Name: the name of the variable
q
Location: the location estimate
q
Scale: the scale estimate
q
Norm: the norm estimate
q
N: the total non-missing positive frequencies
q
Unstandardization
The formula for Unstandardization is based upon the location and scale measures and the constants for
addition and multiplication. All of these are identified by the _TYPE_ variable in the SAS-data-set.
The SAS-data-set must have a _TYPE_ variable that contains the following observations: a _TYPE_=
LOCATION observation and a _TYPE_=SCALE observation. _TYPE_=ADD, and _TYPE_=MULT are
optional observations; if they are not found in the SAS-data-set, the constants specified in the ADD= and
MULT= options (or their default values) are used for unstandardization. See OUTSTAT= for details
about the kind of statistics represented by each value of _TYPE_.
The formula for unstandardization is:
where:
result
is the value obtained from the previous standardization
adder
is the constant to add (the value found in the _TYPE_ variable of the SAS-data-set or specified in
the ADD= option)
multiplier
is the constant to multiply by (the value found in the _TYPE_ variable or specified in the MULT=
option)
original
is the original input value
location
is the location measure
scale
is the scale measure
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The STDIZE Procedure
Examples
The following examples were executed using the HP-UX version 10.20 operating system and the SAS
software release 6.12TS045.
Example 1: Getting Started with the STDIZE Procedure
Example 2: Unstandardizing a Data Set
Example 3: Replacing Missing Values with Standardizing
Example 4: Replacing Missing Values without Standardizing the Variables
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The STDIZE Procedure
Example 1: Getting Started with the STDIZE Procedure
Features:
Setting the Method= Standardization Statistic.
q
Standardizing Observations using BY-Group Processing
q
Outputting the OUT= Standardized data set
q
Outputting the OUTSTAT= Summary Statistic data set.
q
This example demonstrates how to center numeric variables by their medians with the STDIZE procedure.
Observations in the input data set are standardized separately in groups for each level of the binary target.
The example uses a fictitious mortgage data set named SAMPSIO.HMEQ, which contains 5,960 cases. It is stored
in the sample library. Each case represents an applicant for a home equity loan. All applicants have an existing
mortgage. The binary target BAD indicates whether or not an applicant eventually defaulted or was ever seriously
delinquent.
Program
proc sort data=sampsio.hmeq out=hmeq;
by bad;
run;
proc stdize data=hmeq
out=stdhmeq
method=median
outstat=stdstats;
var mortdue value yoj derog delinq
clage ninq clno debtinc;
by bad;
title 'Standardize using METHOD=Median';
title2 'For Each Level of the Target BAD';
run;