The arboretum procedure



Yüklə 3.07 Mb.

səhifə147/148
tarix30.04.2018
ölçüsü3.07 Mb.
1   ...   140   141   142   143   144   145   146   147   148
: documentation
documentation -> From cyber-crime to insider trading, digital investigators are increasingly being asked to
documentation -> EnCase Forensic Transform Your Investigations
documentation -> File Sharing Documentation Prepared by Alan Halter Created: 1/7/2016 Modified: 1/7/2016
documentation -> Gaia Data Release 1 Documentation release 0

 

PROC FREQ creates a frequency table for each character variable. You

can use this information to  determine the mode of each character variable

in the SAMPSIO.HMEQ.

proc freq data=sampsio.hmeq;

  tables reason job;

run;



 

The SAS DATA step imputes the missing character values with the variable's

mode.

data hmeq;



  set sampsio.hmeq;

  if reason=' ' then reason='DebtCon';

  if job=' ' then job='Other';

run;



 

The REPONLY only option signals the STDIZE procedure to replace but

not standardize the numeric variables using the location measure of the METHOD=MEAN

statistic. Therefore, all the missing values should be replaced by the means

of their variables.

proc stdize data=hmeq

            out=replhmeq

            method=mean

            reponly;

   var  mortdue value yoj derog delinq

       clage ninq clno debtinc;

   title 'Impute Missing Numeric Values';

run;



 

PROC PRINT prints the first 10 observations in the imputed data set.

proc print data=replhmeq(obs=10);

  title 'Partial Listing of the Imputed Data Set';

run;



 

PROC PRINT prints the first 10 observations in the HMEQ input data set.

proc print data=hmeq(obs=10);

  title 'Partial Listing of the Input Data Set';

run;



The STDIZE Procedure

References

Art, D., Gnanadesikan, R., and Kettenring, R. (1982), "Data-based Metrics for Cluster Analysis,"



Utilitas Mathematica, 21A, 75-99.

Goodall, C. (1983), "M-Estimators of Location: An Outline of Theory", in Hoaglin, D. C.,

Mosteller, M., and Tukey, J. W., eds., Understanding Robust and Exploratory Data Analysis,

New York: John Wiley and Son, Inc..

Iglewicz, B. (1983), "Robust scale estimators and confidence intervals for location", in Hoaglin,

D.C., Mosteller, M. and Tukey, J.W., eds., Understanding Robust and Exploratory Data



Analysis, New York: John Wiley and Son, Inc.

Jain, R. and Chlamtac I. (1985), "The 

 Algorithm for Dynamic Calculation of Quantiles and

Histograms Without Sorting Observations," Communications of the ACM October 1985, Volume

28, Number 10 .

Jannsen, P., Marron, J. S., Veraverbeke, N., and Sarle, W. S. (1995), "Scale measures for

bandwidth selection", Journal of Non-parametric Statistics, Journal 5, Number 4, pp. 359-380.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The TPARS Procedure

The TPARS Procedure

Overview

Procedure Syntax

PROC TPARS Statement

COPY Statement

OUTPUT Statement



Output

Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The TPARS Procedure

Overview

The TPARS procedure is used to create a term-by-document frequency table from a collection of

documents. Each document in the collection may be contained in a variable of a SAS data set or on the

file system. If the document is to reside on the file system, a data set variable will hold the path for the

document.

Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The TPARS Procedure

Procedure Syntax

PROC TPARS <option(s)>;

COPY variables;

OUTPUT option(s);

Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The TPARS Procedure

PROC TPARS Statement

Invoke the TPARS procedure.

PROC TPARS <option(s)>;

Options

DATA = SAS-data-set

Specifies the name of the input data set. This data set has a variable that contains the text to be

parsed or a variable that contains the file system path to the text that is to be parsed.

STOPLIST or STOP = SAS-data-set

Specifies the name of the data set that contains the list of words that are not to be indexed by the

TPARS procedure. In most situations this list should include articles, prepositions, pronouns, etc.

The variable name for the words in the data set must be Term.



VAR or TEXT = variable-name

Specifies the name of the variable that contains the text to be parsed. This option cannot be used

with the FVAR option.

FVAR or FILE = variable-name

Specifies the name of the variable that contains the text to be parsed. This option cannot be used

with the VAR option.

IN_KEY = SAS-data-set

Specifies the name of the SAS data set containing the KEY data. The KEY data is a set that

contains index/term pairs. It is used as input after the initial parsing (for training) has been done.

Successive runs will take as input the KEY data was output from the training runs. A second use

for the IN_KEY option is for term/concept extraction. In this case, the KEY data set is created

outside the TPARS procedure. The entries of the data set represent terms that you want to identify

as being contained in the data set. For example, if the following data is placed as the IN_KEY data

set, then only these words will be indexed.





Dostları ilə paylaş:
1   ...   140   141   142   143   144   145   146   147   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə