The arboretum procedure



Yüklə 3,07 Mb.
Pdf görüntüsü
səhifə148/148
tarix30.04.2018
ölçüsü3,07 Mb.
#40673
1   ...   140   141   142   143   144   145   146   147   148

This enables you to identify which documents contain words about the United States and North

Carolina, respectively.



ALPHANUM|ALPHA|SPACEDELIM'>TOKEN = ALPHANUM|ALPHA|SPACEDELIM

Specifies what will qualify a term to be indexed. The default value is ALPHANUM. The TOKEN

option works in conjunction with the ENHANCE option. The following descriptions hold for

when the ENHANCE option is not used.



ALPHANUM -- Terms are space and punctuation delimited. Each term will consist only of

alpha/numeric characters.

r   

ALPHA -- Terms are space, punctuation, and digit delimited. Each term will consist only of

alphabetical characters.

r   

SPACEDELIM -- Terms are space delimited. Terms will contain all characters including

punctuation.

r   

ENHANCE = language

This option is used to do a limited amount of language specific parsing. Currently only ENGLISH

is supported. If parsing any other languages, the ENHANCE option should not be used. The

ENHANCE option works in conjunction with the TOKEN option.

The following description hold for when the ENHANCE=ENGLISH is used.

ALPHANUM -- Terms are space delimited. Terms that contain punctuation are omitted

unless it is a contraction or a term with a single punctuation at the end. Contractions are

kept in their original form. A term with a single punctuation at the end of the term is kept

but the punctuation is removed.

r   

ALPHA -- Terms are space delimited. Terms that contain punctuation or digits are omitted

unless it is a contraction or a term with a single punctuation at the end. Contractions are

kept in their original form. A term with a single punctuation at the end of the term is kept

but the punctuation is removed.

r   

SPACEDELIM -- Terms are space delimited. In addition, end of word punctuations are

removed.


r   

Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The TPARS Procedure

COPY Statement

COPY variables;

variables

Specifies the variables that you want to keep from the input data set.

Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The TPARS Procedure

OUTPUT Statement

OUTPUT option(s);

Options

OUT = SAS-data-set

Specifies the name of the data set that will contain the term-by-document frequency table.



KEY = SAS-data-set

Specifies the name of the data set that will contain the index/term pairs, which were indexed in the

OUT data set.

MERGE = SAS-data-set

Specifies the data set that will contain all of the variables listed in the COPY statement along with

a new variable called DOC. The DOC variable is an index to the document number. This

document number corresponds to the numbers in the _DOCUMENT_ variable of the OUT data

set. The MERGE data set is used after a call to the SPSVD procedure and enables you to merge

the original data set with the reduced dimension data.

Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The TPARS Procedure

Output

The TPARS procedure generates two output data sets. One of the output data sets is a table in sparse

matrix format that contains the following variables:

_TERM_ -- is the parsed text.

q   

_TERMNUM_ -- is a unique numerical index associated with each term.



q   

_DOCUMENT_ -- is the document number.

q   

_COUNT_-- is the number of times that the term appears in the document.



q   

The table can be interpreted as an encoding of a sparse matrix. The following example represents a

collection of four documents.

 

The collection is indexed by only three words. House appears one time in Document 1. House appears



two times in Document 4. Garage appears three times in Document 2, etc.

Since the words are encoded into numerical representation, a KEY data set that contains the following

variables is also output.

TERM -- is the parsed text.

q   

KEY -- is a unique numerical index associated with each term.



q   

FREQ -- is the total number of times that a term appears in the document collection.

q   

NUMDOC -- is the number of documents in the collection that contain the term.



q   

As an example, the following KEY data set indicates that the terms House, Garage, and Sleep are being

identified by 1, 2, and 3, respectively. The term House appears three times in the document collection

and two documents in the collection contain the word House.




Note:   The values of _TERMNUM_ and KEY are identical.  

Copyright 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



Document Outline

  • decide.pdf
    • Local Disk
      • The DECIDE Procedure : The DECIDE Procedure 
      • The DECIDE Procedure : Overview 
      • The DECIDE Procedure : Procedure Syntax 
      • The DECIDE Procedure : PROC DECIDE Statement 
      • The DECIDE Procedure : CODE Statement 
      • The DECIDE Procedure : DECISION Statement 
      • The DECIDE Procedure : FREQ Statement 
      • The DECIDE Procedure : POSTERIORS Statement 
      • The DECIDE Procedure : PREDICTED Statement 
      • The DECIDE Procedure : TARGET Statement 
      • The DECIDE Procedure : Details 
      • The DECIDE Procedure : Example 
      • The DECIDE Procedure : Using the DECIDE Procedure Following the DISCRIM Procedure 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • References
  • dmdb.pdf
    • Local Disk
      • The DMDB Procedure : The DMDB Procedure 
      • The DMDB Procedure : Overview 
      • The DMDB Procedure : Procedure Syntax 
      • The DMDB Procedure : PROC DMDB Statement 
      • The DMDB Procedure : CLASS Statement 
      • The DMDB Procedure : FREQ Statement 
      • The DMDB Procedure : ID Statement 
      • The DMDB Procedure : TARGET Statement 
      • The DMDB Procedure : VARIABLE Statement 
      • The DMDB Procedure : WEIGHT Statement 
      • The DMDB Procedure : Details 
      • The DMDB Procedure : Examples 
      • The DMDB Procedure : Getting Started with the DMDB Procedure 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The DMDB Procedure : Specifying a FREQ Variable 
      • Sticky-note
  • dmine.pdf
    • Local Disk
      • The DMINE Procedure : The DMINE Procedure 
      • The DMINE Procedure : Overview 
      • The DMINE Procedure : Procedure Syntax 
      • The DMINE Procedure : PROC DMINE Statement 
      • The DMINE Procedure : FREQ Statement 
      • The DMINE Procedure : TARGET Statement 
      • The DMINE Procedure : VARIABLES Statement 
      • The DMINE Procedure : WEIGHT Statement 
      • The DMINE Procedure : Details 
      • The DMINE Procedure : Examples 
      • The DMINE Procedure : Modeling a Continuous Target with the DMINE Procedure (Simple Selection Settings) 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The DMINE Procedure : Including the AOV16 and Grouping Variables into the Analysis (Detailed Selection Settings) 
      • Sticky-note
      • Sticky-note
      • The DMINE Procedure : Modeling a Binary Target with the DMINE Procedure 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
  • dmreg.pdf
    • Local Disk
      • The DMREG Procedure : The DMREG Procedure 
      • The DMREG Procedure : Overview 
      • The DMREG Procedure : Procedure Syntax 
      • The DMREG Procedure : PROC DMREG Statement 
      • The DMREG Procedure : CLASS Statement 
      • The DMREG Procedure : CODE Statement 
      • The DMREG Procedure : DECISION Statement 
      • The DMREG Procedure : FREQ Statement 
      • The DMREG Procedure : MODEL Statement 
      • The DMREG Procedure : NLOPTIONS Statement 
      • The DMREG Procedure : Remote Statement 
      • The DMREG Procedure : SCORE Statement 
      • The DMREG Procedure : Details 
      • The DMREG Procedure : Examples 
      • The DMREG Procedure : Linear and Quadratic Logistic Regression with an Ordinal Target (Rings Data) 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The DMREG Procedure : Performing a Stepwise OLS Regression (DMREG Baseball Data) 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The DMREG Procedure : Comparison of the DMREG and LOGISTIC Procedures when Using a Categorical Input Variable 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • References
  • dmsplit.pdf
    • Local Disk
      • The DMSPLIT Procedure : The DMSPLIT Procedure 
      • The DMSPLIT Procedure : Overview 
      • The DMSPLIT Procedure : Procedure Syntax 
      • The DMSPLIT Procedure : PROC DMSPLIT Statement 
      • The DMSPLIT Procedure : FREQ Statement 
      • The DMSPLIT Procedure : TARGET Statement 
      • The DMSPLIT Procedure : VARIABLE Statement 
      • The DMSPLIT Procedure : WEIGHT Statement 
      • The DMSPLIT Procedure : Details 
      • The DMSPLIT Procedure : Examples 
      • The DMSPLIT Procedure : Creating a Decision Tree for a Binary Target with the DMSPLIT Procedure 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
  • emclus.pdf
    • Local Disk
      • The EMCLUS Procedure : The EMCLUS Procedure 
      • The EMCLUS Procedure : Overview 
      • The EMCLUS Procedure : Procedure Syntax 
      • The EMCLUS Procedure : PROC EMCLUS Statement 
      • The EMCLUS Procedure : VAR Statement 
      • The EMCLUS Procedure : INITCLUS Statement 
      • The EMCLUS Procedure : Output from PROC EMCLUS 
      • The EMCLUS Procedure : no-title for examples-section with id=genid-17 
      • The EMCLUS Procedure : Syntax for PROC FASTCLUS 
      • The EMCLUS Procedure : Use of the EMCLUS Procedure 
  • neural.pdf
    • Local Disk
      • The NEURAL Procedure : The NEURAL Procedure 
      • The NEURAL Procedure : Overview 
      • The NEURAL Procedure : Procedure Syntax 
      • The NEURAL Procedure : PROC NEURAL Statement 
      • The NEURAL Procedure : ARCHITECTURE Statement 
      • The NEURAL Procedure : CODE Statement 
      • The NEURAL Procedure : CONNECT Statement 
      • The NEURAL Procedure : CUT Statement 
      • The NEURAL Procedure : DECISION Statement 
      • The NEURAL Procedure : DELETE Statement 
      • The NEURAL Procedure : FREEZE Statement 
      • The NEURAL Procedure : FREQ Statement 
      • The NEURAL Procedure : HIDDEN Statement 
      • The NEURAL Procedure : ACTIVATION FUNCTIONS 
      • The NEURAL Procedure : COMBINATION FUNCTIONS 
      • The NEURAL Procedure : INITIAL Statement 
      • The NEURAL Procedure : INPUT Statement 
      • The NEURAL Procedure : NETOPTIONS Statement 
      • The NEURAL Procedure : NLOPTIONS Statement 
      • The NEURAL Procedure : PERTURB Statement 
      • The NEURAL Procedure : PRELIM Statement 
      • The NEURAL Procedure : QUIT Statement 
      • The NEURAL Procedure : RANOPTIONS Statement 
      • The NEURAL Procedure : SAVE Statement 
      • The NEURAL Procedure : SCORE Statement 
      • The NEURAL Procedure : SET Statement 
      • The NEURAL Procedure : SHOW Statement 
      • The NEURAL Procedure : TARGET Statement 
      • The NEURAL Procedure : THAW Statement 
      • The NEURAL Procedure : TRAIN Statement 
      • The NEURAL Procedure : USE Statement 
      • The NEURAL Procedure : Details 
      • The NEURAL Procedure : Examples 
      • The NEURAL Procedure : Developing a Simple Multilayer Perceptron (Rings Data) 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The NEURAL Procedure : Developing a Neural Network for a Continuous Target 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The NEURAL Procedure : Neural Network Hill-and-Plateau Example (Surf Data) 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • References
  • pmbr.pdf
    • Local Disk
      • The PMBR Procedure : The PMBR Procedure 
      • The PMBR Procedure : Overview 
      • The PMBR Procedure : PROC PMBR Statement 
      • The PMBR Procedure : VAR Statement 
      • The PMBR Procedure : TARGET Statement 
      • The PMBR Procedure : CLASS Statement 
  • rulegen.pdf
    • Local Disk
      • The RULEGEN Procedure : The RULEGEN Procedure 
      • The RULEGEN Procedure : Overview 
      • The RULEGEN Procedure : Procedure Syntax 
      • The RULEGEN Procedure : PROC RULEGEN Statement 
      • The RULEGEN Procedure : Details 
      • The RULEGEN Procedure : Example 
      • The RULEGEN Procedure : Performing an Association Discovery 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
  • sequence.pdf
    • Local Disk
      • The SEQUENCE Procedure : The SEQUENCE Procedure 
      • The SEQUENCE Procedure : Overview 
      • The SEQUENCE Procedure : Procedure Syntax 
      • The SEQUENCE Procedure : PROC SEQUENCE Statement 
      • The SEQUENCE Procedure : CUSTOMER Statement 
      • The SEQUENCE Procedure : TARGET Statement 
      • The SEQUENCE Procedure : VISIT Statement 
      • The SEQUENCE Procedure : Details 
      • The SEQUENCE Procedure : Examples 
      • The SEQUENCE Procedure : Performing a Simple 2–Item Sequence Discovery 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The SEQUENCE Procedure : Specifying the Maximum Number of Item Events and Setting the Lower Timing Limit 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • References
  • split.pdf
    • Local Disk
      • The SPLIT Procedure : The SPLIT Procedure 
      • The SPLIT Procedure : Overview 
      • The SPLIT Procedure : Procedure Syntax 
      • The SPLIT Procedure : PROC SPLIT Statement 
      • The SPLIT Procedure : CODE Statement 
      • The SPLIT Procedure : DECISION Statement 
      • The SPLIT Procedure : DESCRIBE Statement 
      • The SPLIT Procedure : FREQ Statement 
      • The SPLIT Procedure : INPUT Statement 
      • The SPLIT Procedure : PRIORS Statement 
      • The SPLIT Procedure : PRUNE Statement 
      • The SPLIT Procedure : SCORE Statement 
      • The SPLIT Procedure : TARGET Statement 
      • The SPLIT Procedure : Details 
      • The SPLIT Procedure : Examples 
      • The SPLIT Procedure : Creating a Decision Tree with a Categorical Target (Rings Data) 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The SPLIT Procedure : Creating a Decision Tree with an Interval Target (Baseball Data) 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • References
  • spsvd.pdf
    • Local Disk
      • The SPSVD Procedure : The SPSVD Procedure 
      • The SPSVD Procedure : Overview 
      • The SPSVD Procedure : Procedure Syntax 
      • The SPSVD Procedure : PROC SPSVD Statement 
      • The SPSVD Procedure : ROW Statement 
      • The SPSVD Procedure : COL Statement 
      • The SPSVD Procedure : ENTRY Statement 
      • The SPSVD Procedure : OUTPUT Statement 
      • The SPSVD Procedure : no-title for examples-section with id=z1783109 
      • The SPSVD Procedure : Example Use 
  • stdize.pdf
    • Local Disk
      • The STDIZE Procedure : The STDIZE Procedure 
      • The STDIZE Procedure : Overview 
      • The STDIZE Procedure : Procedure Syntax 
      • The STDIZE Procedure : PROC STDIZE Statement 
      • The STDIZE Procedure : BY Statement 
      • The STDIZE Procedure : FREQ Statement 
      • The STDIZE Procedure : LOCATION Statement 
      • The STDIZE Procedure : SCALE Statement 
      • The STDIZE Procedure : VAR Statement 
      • The STDIZE Procedure : WEIGHT Statement 
      • The STDIZE Procedure : Details 
      • The STDIZE Procedure : Examples 
      • The STDIZE Procedure : Getting Started with the STDIZE Procedure 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The STDIZE Procedure : Unstandardizing a Data Set 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The STDIZE Procedure : Replacing Missing Values with Standardizing 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • The STDIZE Procedure : Replacing Missing Values without Standardizing the Variables 
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • Sticky-note
      • References
  • tpars.pdf
    • Local Disk
      • The TPARS Procedure : The TPARS Procedure 
      • The TPARS Procedure : Overview 
      • The TPARS Procedure : Procedure Syntax 
      • The TPARS Procedure : PROC TPARS Statement 
      • The TPARS Procedure : COPY Statement 
      • The TPARS Procedure : OUTPUT Statement 
      • The TPARS Procedure : Output 
  • proc_arbor.pdf
    • Overview
      • Terminology
      • Basic Features
      • Enterprise Miner Tree Desktop Application
    • Getting Started
      • Running the ARBORETUM Procedure
      • A Brief Example
      • Selecting a Subtree
      • Changing a Splitting Rule
    • Syntax
      • PROC ARBORETUM Statement
      • ASSESS Statement
      • BRANCH Statement
      • CODE Statement
      • DECISION Statement
      • DESCRIBE Statement
      • FREQ Statement
      • INPUT Statement
      • INTERACT Statement
      • MAKEMACRO Statement
      • PERFORMANCE Statement
      • PRUNE Statement
      • REDO Statement
      • SAVE Statement
      • SCORE Statement
      • SEARCH Statement
      • SETRULE Statement
      • SPLIT Statement
      • SUBTREE Statement
      • TARGET Statement
      • TRAIN Statement
      • UNDO Statement
    • Details
      • Form of a Splitting Rule
      • Posterior and Within Node Probabilities
        • Incorporating Prior Probabilities
        • Incorporating Decisions, Profit, and Loss
      • Splitting Criteria
        • Reduction in Node Impurity
        • Statistical Tests and p-Values
        • Distributional Assumptions
        • Multiple Testing Assumptions
        • Adjusting p-Values for Multiple Tests
        • Adjusting p-Values for the Number of Input Values and Branches
        • Adjusting p-Values for the Depth of the Node
        • Adjusting p-Values for the Number of Input Variables
        • Splitting Criteria for an Ordinal Target
      • Missing Values
      • Unseen Categorical Values
      • Within Node Training Sample
      • Split Search Algorithm
      • Surrogate Splitting Rules
      • Tree Assessment and the Subtree Sequence
        • Retrospective Pruning
        • Formulas for Assessment Measures
        • Formula for Profit and Loss
        • Formula for Misclassification Rate
        • Formula for Average Square Error and Gini
        • Formula for Lift
      • Performance Considerations
        • Passes Over the Data
        • Memory Considerations
      • IMPORTANCE= Output Data Set
        • Variable Importance
        • Variables in the Data Set
      • NODESTATS= Output Data Set
      • PATH= Output Data Set
      • RULES= Output Data Set
      • SCORE Statement OUT= Output Data Set
        • Variable Names and Conditions for Their Creation
        • Decision Variables
        • Leaf Assignment Variables
      • SEQUENCE= Output Data Set
    • Examples
      • Example 1. Prior Probabilities with Biased Samples
        • Incorporating Prior Probabilities in the Tree Assessment
        • Incorporating Prior Probabilities in the Split Search
    • References
    • Subject Index
    • Syntax Index

Yüklə 3,07 Mb.

Dostları ilə paylaş:
1   ...   140   141   142   143   144   145   146   147   148




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə