For METHOD=ABW(c), METHOD=AHUBER(c), or METHOD=AWAVE(c), c is a positive
numeric tuning constant (Iglewicz, 1983).
METHOD=COUNT in the ACECLUS procedure (Refer to SAS/STAT Software: Changes and
Enhancements for Release 6.12 p. 229).
contained in the spacing.
For METHOD=L(p), p is a numeric constant greater than or equal to 1 that specifies the power to
a _TYPE_ variable which identifies the observations that contain location and scale
measures. For example, PROC STDIZE produces an OUTSTAT= data set that contains
LOCATION and SCALE measures and some other statistics. _TYPE_='LOCATION'
identifies the observation that contains location measures and _TYPE_='SCALE' identifies
the observation that contains scale measures. You can also use the data set created by the
OUTSTAT= option from another PROC STDIZE statement as the IN= data set name. See
the Output Data Sets section below for the contents of the OUTSTAT data set.
the location and scale variables specified by the LOCATION and SCALE statements.
STDIZE continues to search for all variables specified in the VAR statement. If the _TYPE_ variable is
not found, PROC STDIZE searches for the location variables specified in the LOCATION statement and
the scale variables specified in the SCALE statement.
For robust estimators, see Goodall (1983) and Iglewicz (1983). MAD has the highest breakdown point
(50%) but is not very efficient. ABW, AHUBER, and AWAVE provide a good compromise between
breakdown and efficiency. L(p) location estimates are increasingly robust as p drops from 2 (least
squares, that is, the mean) to 1 (least absolute value, that is, the median), but the L(p) scale estimates are
Spacing is robust to both outliers and clustering (Jannsen, et al., 1983) and is therefore a good choice for
cluster analysis or nonparametric density estimation. The mid minimum spacing estimates the mode for
small p. AGK is also robust to clustering and more efficient than SPACING, but it is not as robust to
outliers and takes longer to compute. If you expect g clusters, the argument to SPACING or AGK should
be 1/g or less. AGK is less biased than SPACING in small samples. It would generally be reasonable to
use AGK for samples of size 100 or less and to use SPACING for samples of size 1000 or more, with the
treatment of intermediate sample sizes depending on the available computer resources.
Formulas for statistics of METHOD= MEAN, MEDIAN, SUM, USTD, STD, RANGE, and IQR are
given in Chapter 1, "SAS Elementary Statistics Procedure", in the SAS Procedures Guide. Note that the
computations of median and upper and lower quartiles depend on the PCTLMTD= option.
The rest of the statistics used in the above Table of Methods for Computing Location and Scale
Measures, with the exception of METHOD=IN, are described as follows:
is the th observation and is the total number of observations in the
"The FASTCLUS Procedure" in the SAS/STAT User's Guide). Specifying METHOD=L(p) in
the PROC STDIZE statement is almost the same as specifying LEAST=(p) option with
MAXCLUS=1 and using the default values of the MAXITER= option in the PROC FASTCLUS
statement. The only difference comes from the fact that the maximum number of iterations is a
criterion for convergence on all variables simultaneously in PROC STDIZE while it is a criterion
for convergence on a single multivariate statistic in PROC FASTCLUS. The location and scale
measures for L(p) are output to the OUTSEED= data set in PROC FASTCLUS.
1-step M-estimate. Also refer to p. 416-418, Chapter 12 of Iglewicz (1983) for the biweight
Hubers. Refer to p.371-374, Chapter 11 of Goodall (1983) for the Huber 1-step M-estimate. Also
refer to p. 416-418, Chapter 12 of Iglewicz (1983) for the Huber A-estimate.
Andrews' Wave. Refer to p. 376, Chapter 11 of Goodall (1983) for the Wave 1-step M-estimate.
Also refer to p. 416-418, Chapter 12 of Iglewicz (1983) for the Wave A-estimate.
This is the non-iterative univariate form of the estimator described by Art, Gnanadesikan, and
The AGK estimate is documented as the METHOD= option in the PROC ACECLUS statement of
the ACECLUS procedure. (See "The ACECLUS Procedure" in the SAS/STAT User's Guide).
Specifying METHOD= AGK(p) in the PROC STDIZE statement is the same as specifying
METHOD=COUNT and P=p in the PROC ACECLUS statement.
A spacing is the absolute difference between two data values. The minimum spacing for a
proportion p is the minimum absolute difference between two data values that contain a
proportion p of the data between them. The mid minimum spacing is the mean of these two data
Proc STDIZE offers two methods for computing quantiles:
the P2 approach
algorithm for histograms proposed by Jain
and Chlamtac (1985). The main difference comes from the movement of markers. P2 allows a marker to
move to the right (or left) by more than one position (to the largest possible integer) as long as it would
not result in two markers being in the same position. This modification is necessary to prorate the FREQ
Using the P2 approach to estimate quantiles beyond the quartiles (
P75) will not always
produce accurate results and a large sample size (10,000 or more) is required if the tail quantiles ( P10
and P90) are requested. Also, tail quantiles are not recommended for highly skewed and/or