The arboretum procedure



Yüklə 3,07 Mb.
Pdf görüntüsü
səhifə113/148
tarix30.04.2018
ölçüsü3,07 Mb.
#40673
1   ...   109   110   111   112   113   114   115   116   ...   148

SCAN

Retrieves a nearest neighbor by naively going through every observation in the

training data set and calculating its distance to a probe observation.

KDTREE Uses a KD-Tree to store the observations in the training data set in memory. This

enables the nearest neighbors to a point to be found in o(log n) time, assuming the

number of variables is small enough (fewer than ten to twenty). For more

information about KD-Tree, see Freidman, Bentley, and Fingel (1977). This method

has not been implemented yet.

RDTREE Uses an RD-Tree to store the observations in the training data set in memory. This is

a proprietary representation that, like a KD-Tree, also operates in o(log n) time, but

will generally examine fewer nodes than an RD-Tree to find the neighbors, and can

be applied with somewhat greater dimensionality.



EPSILON = positive number

Indicates an approximate nearest neighbor search when a non-zero number is specified, where the

nearest neighbors determined so far must be at most "epsilon" away from the actual nearest

neighbors to terminate the search. For large dimensionality, judicious use of epsilon can result in

radically improved performance. This option only applies to the KD-Tree or RD-Tree methods.

Defaults:

0.0


BUCKET = positive integer

Indicates the number of buckets to allow a leaf node to grow before splitting into a branch with

two new leaves. This value must be greater than or equal to 2. This option only applies to the

KD-Tree or RD-Tree methods.



Default:

8

SHOWNODES

Includes a variable _nnodes_ in the output data set that shows the number of point comparisons

that had to be done to determine the answer. This is useful as a point of comparison.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The PMBR Procedure

VAR Statement

VAR <variable>;

variable

Specifies all numeric variables that you want to treat as dimensions for the nearest neighbor

lookup. These should be standardized and orthogonal for the nearest neighbor search to be

accurate. If no VAR statement is specified, all numeric variables in the DMDB-encoded data set

will be used.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The PMBR Procedure

TARGET Statement

Specifies one variable to be used as the target. It can be numeric or character.

TARGET <variable>;

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The PMBR Procedure

CLASS Statement

This is currently ignored. The procedure determines whether the target is a class variable based

on the contents of the DMDB, and it cannot be changed in this procedure.

CLASS <variable>;

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The RULEGEN Procedure

The RULEGEN Procedure

Overview

Procedure Syntax

PROC RULEGEN Statement



Details

Example

Example 1: Performing an Association Discovery

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The RULEGEN Procedure

Overview

PROC RULEGEN uses the output data set created by PROC ASSOC and generates association rules and

computes statistics, such as confidence and lift, for the rules. PROC ASSOC identifies item sets that are

related. The RULEGEN procedure generates the rules governing their association. PROC RULEGEN

output is saved as a SAS data set that can be viewed or browsed by SAS procedures that you can create

to reflect your own evaluation criteria.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The RULEGEN Procedure

Procedure Syntax

PROC RULEGEN <option(s)>;

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The RULEGEN Procedure

PROC RULEGEN Statement

Invokes the RULEGEN procedure.

PROC RULEGEN<option(s)>;

Required Argument

OUT=SAS-data-set

Specifies the output data set to which the rules are written. The output data set has the following

variables: CONF, COUNT, EXP_CONF, ITEM1, ITEM2, ..., ITEMn+1, _LHAND, LIFT,

_RHAND, RULE, SET_SIZE, SUPPORT.

CONF=COUNT/lhs_count

Contains the percent of confidence.



Definition:

lhs_count is the number of transactions satisfying the left side of the

rule.


COUNT

Contains the number of transactions meeting the rule.

EXP_CONF=rhs_count/total

Contains the percent of expected confidence.



Definition:

rhs_count is the number of transactions satisfying the right side of the

rule.


ITEM1, ITEM2, ..., ITEMn+1

Contains individual items which make up the rule, including the arrow.

_LHAND

Identifies the left side of the rule, where the rule is expressed: _LHAND ==> _RHAND.



LIFT= CONF/EXP_CONF

Contains the lift ratio.

_RHAND

Identifies the right side of the rule, where the rule is expressed: _LHAND ==> _RHAND.



RULE

Contains the text of the rule, for example, A & B ==> C & D.

SET_SIZE



Yüklə 3,07 Mb.

Dostları ilə paylaş:
1   ...   109   110   111   112   113   114   115   116   ...   148




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə