The arboretum procedure



Yüklə 3,07 Mb.

səhifə138/148
tarix30.04.2018
ölçüsü3,07 Mb.
1   ...   134   135   136   137   138   139   140   141   ...   148

The STDIZE Procedure

Overview

The STDIZE procedure standardizes one or more numeric variables in a SAS data set by subtracting a

location measure and dividing by a scale measure. A variety of location and scale measures are provided,

including estimates that are resistant to outliers and clustering (see the METHOD= option). You can also

multiply each standardized value by a constant and add a constant. Thus the result is:

where:


result

is the final output value

adder

is the constant to add (the value specified in the ADD= option)



multiplier

is the constant to multiply by (the value specified in the MULT= option)

original

is the original input value

location

is the location measure

scale

is the scale measure



PROC STDIZE also finds quantiles in one pass of the data. It is especially useful when the data set is

very large and PROC UNIVARIATE may either run out of memory or take a long time to compute the

quantiles.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The STDIZE Procedure

Procedure Syntax

PROC STDIZE <option(s)>;

BY variable-1 <... variable-n> ;

FREQ variable;

LOCATION variable(s);

SCALE variable(s);

VAR variable(s);

WEIGHT variable;

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The STDIZE Procedure

PROC STDIZE Statement

Invokes the STDIZE procedure.

PROC STDIZE <option(s)>;

Options

ADD=number

Specifies the constant to add to each value after standardizing and multiplying by the MULT=

number.

Default:

0

DATA= SAS-data-set

Specifies the input data source to be standardized.

Default:

_LAST_


FUZZ=c

Specifies the relative fuzz factor for writing the output.



Default:

1E-14.


For OUT= data set: if 

,

then result= 0.



r   

For OUTSTAT= data set: if 

,

then SCALE= 0;



otherwise, if 

,

then LOCATION=0.



r   

INITIAL=method-name

Specifies the method for computing initial estimates for the A estimates: ABW, AWAVE, and

AHUBER. See the 

Table of Methods for Computing Location and Scale Measures

 for the list of

methods.


CAUTION:

ABW, AWAVE, AHUBER, and IN are not valid as INITIAL methods.   


Default:

MAD


METHOD=method-name

Specifies the name of the standardization method. See 

Standardization Methods

 section for more

information on the method-names that are available for computing LOCATION and SCALE

measures.



Default:

STD


MISSING=method | numeric-value/<missing-option(s)>

Specifies the method or a numeric value for replacing missing values.

Use the MISSING= option when you want to replace missing values by something other

than the location measure associated with the METHOD= option, which is what the

REPLACE option uses as the replacement value. The usual methods include MEAN,

MEDIAN, and MIDRANGE. Any of the values for the METHOD= option can also be

specified for the MISSING= option, and the corresponding location measure will be used to

replace missing values. If a numeric value is given, it replaces missing values after

standardizing the data. However, the REPONLY option can be used together with the

MISSING= option to suppress standardization in case you only want to replace missing

values.

r   


See the 

Table of Methods for Computing Location and Scale Measures

 for a list of the

values that can be specified for the MISSING= option (with the exception of

MISSING=IN).

r   


MULT=c

Specifies the constant to multiply each value by, after standardizing.



Default:

1

NMARKERS=n

Specifies the number of markers for the P2 algorithm (PCTLMTD=P2).

Range:

Integer where n   5).



Default:

101


NOMISS

Omits observations that have missing values in the analyzed variables from computation of the

location and scale measures. Otherwise, all nonmissing values are used.

NORM

For METHOD= AGK, IQR, MAD, or SPACING, normalizes the scale estimator to be consistent

for the standard deviation of a normal distribution.

OUT= SAS-data-set

Specifies the output data set created by PROC STDIZE. The output data set is a copy of the




DATA= data set except that the analyzed variables (those in the VAR statement, or in the absence

of a VAR statement, all numeric variables not listed in any other statement) have been

standardized.

Default:

_DATA_. If the OUT= option is omitted, PROC STDIZE creates an output

data set and names it according to the DATAn convention, just as if you had

omitted a data set name in a DATA statement.



OUTSTAT= SAS-data-set

Specifies the output statistics data set that contains the location and scale measures and some other

simple statistics. A _TYPE_ variable is also created to help identify the type of statistics for each

observation. The value of the _TYPE_ variable can be:

LOCATION

Contains the location measure of each variable.

SCALE

Contains the scale measure of each variable.



NORM

Contains the norm measure of each variable.

ADD

Contains the constant from the ADD= option.



MULT

Contains the constant from the MULT= option.

N

Contains the total number of non-missing positive frequencies of each variable.



Pn

Contains the percentiles of each variable specified through the PCTLPTS= option.



Range:

0   n   100



PCTLDEF=value

Specifies one of the five available definitions described in the Computational Methods section in

the UNIVARIATE procedure that calculates percentiles when PCTLMTD=ORD_STAT is

specified.



Default:

5.

Range:

1, 2, 3, 4, 5

Tip:

When PCTLMTD=P2, the value of PCTLDEF is always 5.



PCTLMTD= method

Specifies the method used to estimate percentiles.

ORD_STAT





Dostları ilə paylaş:
1   ...   134   135   136   137   138   139   140   141   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə