The arboretum procedure

Yüklə 3,07 Mb.

Pdf görüntüsü

səhifə	142/148
tarix	30.04.2018
ölçüsü	3,07 Mb.
	#40673

1 ... 138 139 140 141 142 143 144 145 ... 148

The order statistics approach for estimating quantiles is faster than the P2 method, but it requires that the

entire data be stored in memory. The accuracy in estimating the quantiles are comparable for both

methods. The default is PCTLMTD=ORD_STAT if enough memory is available; otherwise,

PCTLMTD= P2.

Missing Values

Missing values can be replaced by the LOCATION measure or by any specified constant (see the

REPLACE option and the MISSING= option). You can also suppress standardization if you only want to

replace missing values (see the REPONLY option).

If the NOMISS option is used, PROC STDIZE omits observations that have any missing values in the

analyzed variables from computation of the location and scale measures. Otherwise, all nonmissing

values are used.

Output Data Sets

OUT=Data-Set

The output data set is a copy of the DATA= data set except that the analyzed variables (those in the

VAR statement, or if there is no VAR statement, all numeric variables not listed in any other statement)

have been standardized.

OUTSTAT=Data-Set

The new data set contains the following variables:

the BY variables, if any;

a new variable, _TYPE_, a character variable;

the variables analyzed, that is, those in the VAR statement, or if there is no VAR statement, all

numeric variables not listed in any other statement.

Each observation in the new data set contains some type of statistic as indicated by the _TYPE_ variable.

The values of the _TYPE_ variable are as follows:

Values of the _TYPE_ Variable

_TYPE_

Contents

LOCATION

Location measure of

each variable.

SCALE

Scale measure of each

variable.

ADD

Constant from ADD=.

This value is the same

for each variable.

MULT

Constant from

MULT=. This value is

the same for each

variable.

Total number of

nonmissing positive

frequencies of each

variable.

NORM

Norm measure of each

variable. This

observation is

produced only if either

the NORM option is

specified and

METHOD= AGK,

IQR, MAD, or

SPACING or when the

SNORM option is

specified and

METHOD=SPACING.

Percentiles of each

variable specified by

PCTLPTS= where n is

any real number such

that 0 n 100.

Displayed Output

If you specify the PSTAT option, PROC STDIZE displays the following statistics for each variable:

Name: the name of the variable

Location: the location estimate

Scale: the scale estimate

Norm: the norm estimate

N: the total non-missing positive frequencies

Unstandardization

The formula for Unstandardization is based upon the location and scale measures and the constants for

addition and multiplication. All of these are identified by the _TYPE_ variable in the SAS-data-set.

The SAS-data-set must have a _TYPE_ variable that contains the following observations: a _TYPE_=

LOCATION observation and a _TYPE_=SCALE observation. _TYPE_=ADD, and _TYPE_=MULT are

optional observations; if they are not found in the SAS-data-set, the constants specified in the ADD= and

MULT= options (or their default values) are used for unstandardization. See OUTSTAT= for details

about the kind of statistics represented by each value of _TYPE_.

The formula for unstandardization is:

where:

result

is the value obtained from the previous standardization

adder

is the constant to add (the value found in the _TYPE_ variable of the SAS-data-set or specified in

the ADD= option)

multiplier

is the constant to multiply by (the value found in the _TYPE_ variable or specified in the MULT=

option)

original

is the original input value

location

is the location measure

scale

is the scale measure

The STDIZE Procedure

Examples

The following examples were executed using the HP-UX version 10.20 operating system and the SAS

software release 6.12TS045.

Example 1: Getting Started with the STDIZE Procedure

Example 2: Unstandardizing a Data Set

Example 3: Replacing Missing Values with Standardizing

Example 4: Replacing Missing Values without Standardizing the Variables

The STDIZE Procedure

Example 1: Getting Started with the STDIZE Procedure

Features:

Setting the Method= Standardization Statistic.

Standardizing Observations using BY-Group Processing

Outputting the OUT= Standardized data set

Outputting the OUTSTAT= Summary Statistic data set.

This example demonstrates how to center numeric variables by their medians with the STDIZE procedure.

Observations in the input data set are standardized separately in groups for each level of the binary target.

The example uses a fictitious mortgage data set named SAMPSIO.HMEQ, which contains 5,960 cases. It is stored

in the sample library. Each case represents an applicant for a home equity loan. All applicants have an existing

mortgage. The binary target BAD indicates whether or not an applicant eventually defaulted or was ever seriously

delinquent.

Program

proc sort data=sampsio.hmeq out=hmeq;

by bad;

run;

proc stdize data=hmeq

out=stdhmeq

method=median

outstat=stdstats;

var mortdue value yoj derog delinq

clage ninq clno debtinc;

by bad;

title 'Standardize using METHOD=Median';

title2 'For Each Level of the Target BAD';

run;

Yüklə 3,07 Mb.

Dostları ilə paylaş:

1 ... 138 139 140 141 142 143 144 145 ... 148