The arboretum procedure



Yüklə 3,07 Mb.
Pdf görüntüsü
səhifə122/148
tarix30.04.2018
ölçüsü3,07 Mb.
#40673
1   ...   118   119   120   121   122   123   124   125   ...   148

Method=VARIANCE

Reduction in squared error from node means.

Method=PROBF

p-value of F-test associated with node variance. Default for INTERVAL.

Method=F


F statistic associated with node variance.

EXCLUDEMISS

Specifies that missing values be excluded during a split search.



EXHAUSTIVE=n

Specifies the most number of candidate splits to find in an exhaustive search. If more candidates

would have to be considered, a heuristic search is used instead. The EXHAUSTIVE option applies

to multi-way splits, and for binary splits on nominal targets with more than 2 values.



Default:

The default value is 5000.



INDMSPLIT

Requests that the tree created by PROC DMSPLIT be input to PROC SPLIT. The tree is expected

in the DMDBCAT= catalog. The DMDBCAT= option is required, and the INDMTREE and

INTREE= options are prohibited.



INTREE=SAS-tree-model

Names a data set created from the PROC SPLIT OUTTREE= option.



Caution:

When using the INTREE option, the IN, TARGET, and FREQ statements are

prohibited, as are the DECISION and PRIORS statements.

LEAFSIZE=n

Specifies the smallest number of training observations a node can have.



LIFTDEPTH=n

Specifies the proportion of observations to use with ASSESS=LIFT.



MAXBRANCH=n

Restricts the number of subsets a splitting rule can produce to n or fewer. A value of 2 results in

binary trees.

Range:

2 - 100


Default:

2

MAXDEPTH=depth

Specifies the maximum number of generations of nodes. The original node, generation 0, is called

the root node. The children of the root node are the first generation. PROC SPLIT will only

consider splitting nodes in the nth generation when n is less than the value of depth.

Default:

6



NODESAMPLE=n

Specifies the within node sample size used for finding splits. If the number of training

observations in a node is larger than n, then the split search for that node is based on a random

sample of size n.



Default:

5000


Range:

1   n   32767



NRULES=n

Specifies how many splitting rules are saved with each node. The tree only uses one rule. The

remaining rules are saved for comparison. Based on the criterion you selected, you can see how

well the variable that was used split the data, and how well the next n-1 would have split the data.



Default:

5

NSURRS=n

Specifies a number of surrogate rules sought in each non-leaf node. A surrogate rule is a backup to

the main splitting rule. When the main splitting rule relies on an input whose value is missing, the

first surrogate rule is invoked. For more information, see 

Missing Values

 in the Detail section.

Note:   The option to save surrogate rules in each node is often used by advocates of CART.  

Default:

0

OUTAFDS=SAS-data-set

Names the output data set that is to contain a tree description suitable for inputting data into

SAS/AF widgets such as ORGCHART and TREERING.



Definition:

A SAS/AF Widget is a visible part of a window, which can be treated as a

separate, isolated entity. For example, a SAS/AF Widget can be a scrollbar, a

text field, a pushbutton, and so on. It is an individual component of the user

interface.

OUTLEAF=SAS-data-set

Names the output data set that contains statistics for each leaf node.



OUTMATRIX=SAS-data-set

Names the output data set that contains tree summary statistics. For nominal targets, the summary

statistics consist of the counts and proportions of observations correctly classified. For interval

targets, the summary statistics include the average squared prediction error and R-squared, which

equals



OUTSEQ=SAS-data-set

Names the output data set that contains statistics on each sub-tree in the sub-tree sequence.



OUTTREE=SAS-data-set

Names the output data set that contains all the tree information. This data set can then be used on

subsequent executions of PROC SPLIT.

PADJUST=methods

Names methods of adjusting the p-values used with the PROBCHISQ and PROBFTEST criteria.

Possible methods are:

KASSAFTER

Bonferroni adjustment applied after split is chosen.

KASSBEFORE

Bonferroni adjustment applied before split is chosen.

DEVILLE


Adjustment independent of number of branches in split.

DEPTH


Adjustment for number of ancestor splits.

NOGABRIEL

Turns off adjustment that sometimes overrides KASS.

NONE


No adjustment is made.

Caution:

This option is ignored unless CRITERION= PROBCHISQ or PROBFTEST.



PVARS=n

Specifies the number of inputs to consider uncorrelated when adjusting p-values for the number of

inputs.

SPLITSIZE=n

Specifies the smallest number of training observations a node must have for PROC SPLIT to

consider splitting it.

Range:

Maximum is 32767 on most machines.



Default:

The greater of either 50 or the total number of cases in the training data set

divided by 100.

SUBTREE=method

Specifies how to construct the sub-tree in terms of selection methods. The following methods are




Yüklə 3,07 Mb.

Dostları ilə paylaş:
1   ...   118   119   120   121   122   123   124   125   ...   148




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə