PROC FREQ creates a frequency table for each character variable. You
can use this information to determine the mode of each character variable
in the SAMPSIO.HMEQ.
proc freq data=sampsio.hmeq;
tables reason job;
run;
The SAS DATA step imputes the missing character values with the variable's
mode.
data hmeq;
set sampsio.hmeq;
if reason=' ' then reason='DebtCon';
if job=' ' then job='Other';
run;
The REPONLY only option signals the STDIZE procedure to replace but
not standardize the numeric variables using the location measure of the METHOD=MEAN
statistic. Therefore, all the missing values should be replaced by the means
of their variables.
proc stdize data=hmeq
out=replhmeq
method=mean
reponly;
var mortdue value yoj derog delinq
clage ninq clno debtinc;
title 'Impute Missing Numeric Values';
run;
PROC PRINT prints the first 10 observations in the imputed data set.
proc print data=replhmeq(obs=10);
title 'Partial Listing of the Imputed Data Set';
run;
PROC PRINT prints the first 10 observations in the HMEQ input data set.
proc print data=hmeq(obs=10);
title 'Partial Listing of the Input Data Set';
run;
The STDIZE Procedure
References
Art, D., Gnanadesikan, R., and Kettenring, R. (1982), "Data-based Metrics for Cluster Analysis,"
Utilitas Mathematica, 21A, 75-99.
Goodall, C. (1983), "M-Estimators of Location: An Outline of Theory", in Hoaglin, D. C.,
Mosteller, M., and Tukey, J. W., eds., Understanding Robust and Exploratory Data Analysis,
New York: John Wiley and Son, Inc..
Iglewicz, B. (1983), "Robust scale estimators and confidence intervals for location", in Hoaglin,
D.C., Mosteller, M. and Tukey, J.W., eds., Understanding Robust and Exploratory Data
Analysis, New York: John Wiley and Son, Inc.
Jain, R. and Chlamtac I. (1985), "The
Algorithm for Dynamic Calculation of Quantiles and
Histograms Without Sorting Observations," Communications of the ACM October 1985, Volume
28, Number 10 .
Jannsen, P., Marron, J. S., Veraverbeke, N., and Sarle, W. S. (1995), "Scale measures for
bandwidth selection", Journal of Non-parametric Statistics, Journal 5, Number 4, pp. 359-380.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The TPARS Procedure
The TPARS Procedure
Overview
Procedure Syntax
PROC TPARS Statement
COPY Statement
OUTPUT Statement
Output
Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The TPARS Procedure
Overview
The TPARS procedure is used to create a term-by-document frequency table from a collection of
documents. Each document in the collection may be contained in a variable of a SAS data set or on the
file system. If the document is to reside on the file system, a data set variable will hold the path for the
document.
Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The TPARS Procedure
Procedure Syntax
PROC TPARS <
option(s)>;
COPY variables;
OUTPUT option(s);
Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The TPARS Procedure
PROC TPARS Statement
Invoke the TPARS procedure.
PROC TPARS <
option(s)>;
Options
DATA = SAS-data-set
Specifies the name of the input data set. This data set has a variable that contains the text to be
parsed or a variable that contains the file system path to the text that is to be parsed.
STOPLIST or STOP = SAS-data-set
Specifies the name of the data set that contains the list of words that are not to be indexed by the
TPARS procedure. In most situations this list should include articles, prepositions, pronouns, etc.
The variable name for the words in the data set must be Term.
VAR or TEXT = variable-name
Specifies the name of the variable that contains the text to be parsed. This option cannot be used
with the FVAR option.
FVAR or FILE = variable-name
Specifies the name of the variable that contains the text to be parsed. This option cannot be used
with the VAR option.
IN_KEY = SAS-data-set
Specifies the name of the SAS data set containing the KEY data. The KEY data is a set that
contains index/term pairs. It is used as input after the initial parsing (for training) has been done.
Successive runs will take as input the KEY data was output from the training runs. A second use
for the IN_KEY option is for term/concept extraction. In this case, the KEY data set is created
outside the TPARS procedure. The entries of the data set represent terms that you want to identify
as being contained in the data set. For example, if the following data is placed as the IN_KEY data
set, then only these words will be indexed.