The arboretum procedure

Yüklə 3,07 Mb.

Pdf görüntüsü

səhifə	147/148
tarix	30.04.2018
ölçüsü	3,07 Mb.
	#40673

1 ... 140 141 142 143 144 145 146 147 148

PROC FREQ creates a frequency table for each character variable. You

can use this information to determine the mode of each character variable

in the SAMPSIO.HMEQ.

proc freq data=sampsio.hmeq;

tables reason job;

run;

The SAS DATA step imputes the missing character values with the variable's

mode.

data hmeq;

set sampsio.hmeq;

if reason=' ' then reason='DebtCon';

if job=' ' then job='Other';

run;

The REPONLY only option signals the STDIZE procedure to replace but

not standardize the numeric variables using the location measure of the METHOD=MEAN

statistic. Therefore, all the missing values should be replaced by the means

of their variables.

proc stdize data=hmeq

out=replhmeq

method=mean

reponly;

var mortdue value yoj derog delinq

clage ninq clno debtinc;

title 'Impute Missing Numeric Values';

run;

PROC PRINT prints the first 10 observations in the imputed data set.

proc print data=replhmeq(obs=10);

title 'Partial Listing of the Imputed Data Set';

run;

PROC PRINT prints the first 10 observations in the HMEQ input data set.

proc print data=hmeq(obs=10);

title 'Partial Listing of the Input Data Set';

run;

The STDIZE Procedure

References

Art, D., Gnanadesikan, R., and Kettenring, R. (1982), "Data-based Metrics for Cluster Analysis,"

Utilitas Mathematica, 21A, 75-99.

Goodall, C. (1983), "M-Estimators of Location: An Outline of Theory", in Hoaglin, D. C.,

Mosteller, M., and Tukey, J. W., eds., Understanding Robust and Exploratory Data Analysis,

New York: John Wiley and Son, Inc..

Iglewicz, B. (1983), "Robust scale estimators and confidence intervals for location", in Hoaglin,

D.C., Mosteller, M. and Tukey, J.W., eds., Understanding Robust and Exploratory Data

Analysis, New York: John Wiley and Son, Inc.

Jain, R. and Chlamtac I. (1985), "The

Algorithm for Dynamic Calculation of Quantiles and

Histograms Without Sorting Observations," Communications of the ACM October 1985, Volume

28, Number 10 .

Jannsen, P., Marron, J. S., Veraverbeke, N., and Sarle, W. S. (1995), "Scale measures for

bandwidth selection", Journal of Non-parametric Statistics, Journal 5, Number 4, pp. 359-380.

The TPARS Procedure

The TPARS Procedure

Overview

Procedure Syntax

PROC TPARS Statement

COPY Statement

OUTPUT Statement

Output

The TPARS Procedure

Overview

The TPARS procedure is used to create a term-by-document frequency table from a collection of

documents. Each document in the collection may be contained in a variable of a SAS data set or on the

file system. If the document is to reside on the file system, a data set variable will hold the path for the

document.

The TPARS Procedure

Procedure Syntax

PROC TPARS <option(s)>;

COPY variables;

OUTPUT option(s);

The TPARS Procedure

PROC TPARS Statement

Invoke the TPARS procedure.

PROC TPARS <option(s)>;

Options

DATA = SAS-data-set

Specifies the name of the input data set. This data set has a variable that contains the text to be

parsed or a variable that contains the file system path to the text that is to be parsed.

STOPLIST or STOP = SAS-data-set

Specifies the name of the data set that contains the list of words that are not to be indexed by the

TPARS procedure. In most situations this list should include articles, prepositions, pronouns, etc.

The variable name for the words in the data set must be Term.

VAR or TEXT = variable-name

Specifies the name of the variable that contains the text to be parsed. This option cannot be used

with the FVAR option.

FVAR or FILE = variable-name

Specifies the name of the variable that contains the text to be parsed. This option cannot be used

with the VAR option.

IN_KEY = SAS-data-set

Specifies the name of the SAS data set containing the KEY data. The KEY data is a set that

contains index/term pairs. It is used as input after the initial parsing (for training) has been done.

Successive runs will take as input the KEY data was output from the training runs. A second use

for the IN_KEY option is for term/concept extraction. In this case, the KEY data set is created

outside the TPARS procedure. The entries of the data set represent terms that you want to identify

as being contained in the data set. For example, if the following data is placed as the IN_KEY data

set, then only these words will be indexed.

Yüklə 3,07 Mb.

Dostları ilə paylaş:

1 ... 140 141 142 143 144 145 146 147 148