The STDIZE Procedure
Overview
The STDIZE procedure standardizes one or more numeric variables in a SAS data set by subtracting a
location measure and dividing by a scale measure. A variety of location and scale measures are provided,
including estimates that are resistant to outliers and clustering (see the METHOD= option). You can also
multiply each standardized value by a constant and add a constant. Thus the result is:
where:
result
is the final output value
adder
is the constant to add (the value specified in the ADD= option)
multiplier
is the constant to multiply by (the value specified in the MULT= option)
original
is the original input value
location
is the location measure
scale
is the scale measure
PROC STDIZE also finds quantiles in one pass of the data. It is especially useful when the data set is
very large and PROC UNIVARIATE may either run out of memory or take a long time to compute the
quantiles.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The STDIZE Procedure
Procedure Syntax
PROC STDIZE <
option(s)>;
BY variable-1 <...
variable-n>
;
FREQ variable;
LOCATION variable(s);
SCALE variable(s);
VAR variable(s);
WEIGHT variable;
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The STDIZE Procedure
PROC STDIZE Statement
Invokes the STDIZE procedure.
PROC STDIZE <
option(s)>;
Options
ADD=number
Specifies the constant to add to each value after standardizing and multiplying by the MULT=
number.
Default:
0
DATA= SAS-data-set
Specifies the input data source to be standardized.
Default:
_LAST_
FUZZ=c
Specifies the relative fuzz factor for writing the output.
Default:
1E-14.
For OUT= data set: if
,
then result= 0.
r
For OUTSTAT= data set: if
,
then SCALE= 0;
otherwise, if
,
then LOCATION=0.
r
INITIAL=method-name
Specifies the method for computing initial estimates for the A estimates: ABW, AWAVE, and
AHUBER. See the
Table of Methods for Computing Location and Scale Measures
for the list of
methods.
CAUTION:
ABW, AWAVE, AHUBER, and IN are not valid as INITIAL methods.
Default:
MAD
METHOD=method-name
Specifies the name of the standardization method. See
Standardization Methods
section for more
information on the method-names that are available for computing LOCATION and SCALE
measures.
Default:
STD
MISSING=method | numeric-value/<missing-option(s)>
Specifies the method or a numeric value for replacing missing values.
Use the MISSING= option when you want to replace missing values by something other
than the location measure associated with the METHOD= option, which is what the
REPLACE option uses as the replacement value. The usual methods include MEAN,
MEDIAN, and MIDRANGE. Any of the values for the METHOD= option can also be
specified for the MISSING= option, and the corresponding location measure will be used to
replace missing values. If a numeric value is given, it replaces missing values after
standardizing the data. However, the REPONLY option can be used together with the
MISSING= option to suppress standardization in case you only want to replace missing
values.
r
See the
Table of Methods for Computing Location and Scale Measures
for a list of the
values that can be specified for the MISSING= option (with the exception of
MISSING=IN).
r
MULT=c
Specifies the constant to multiply each value by, after standardizing.
Default:
1
NMARKERS=n
Specifies the number of markers for the P2 algorithm (PCTLMTD=P2).
Range:
Integer where n 5).
Default:
101
NOMISS
Omits observations that have missing values in the analyzed variables from computation of the
location and scale measures. Otherwise, all nonmissing values are used.
NORM
For METHOD= AGK, IQR, MAD, or SPACING, normalizes the scale estimator to be consistent
for the standard deviation of a normal distribution.
OUT= SAS-data-set
Specifies the output data set created by PROC STDIZE. The output data set is a copy of the
DATA= data set except that the analyzed variables (those in the VAR statement, or in the absence
of a VAR statement, all numeric variables not listed in any other statement) have been
standardized.
Default:
_DATA_. If the OUT= option is omitted, PROC STDIZE creates an output
data set and names it according to the DATAn convention, just as if you had
omitted a data set name in a DATA statement.
OUTSTAT= SAS-data-set
Specifies the output statistics data set that contains the location and scale measures and some other
simple statistics. A _TYPE_ variable is also created to help identify the type of statistics for each
observation. The value of the _TYPE_ variable can be:
LOCATION
Contains the location measure of each variable.
SCALE
Contains the scale measure of each variable.
NORM
Contains the norm measure of each variable.
ADD
Contains the constant from the ADD= option.
MULT
Contains the constant from the MULT= option.
N
Contains the total number of non-missing positive frequencies of each variable.
P
n
Contains the percentiles of each variable specified through the PCTLPTS= option.
Range:
0 n 100
PCTLDEF=value
Specifies one of the five available definitions described in the Computational Methods section in
the UNIVARIATE procedure that calculates percentiles when PCTLMTD=ORD_STAT is
specified.
Default:
5.
Range:
1, 2, 3, 4, 5
Tip:
When PCTLMTD=P2, the value of PCTLDEF is always 5.
PCTLMTD= method
Specifies the method used to estimate percentiles.
ORD_STAT