SCAN
Retrieves a nearest neighbor by naively going through every observation in the
training data set and calculating its distance to a probe observation.
KDTREE Uses a KD-Tree to store the observations in the training data set in memory. This
enables the nearest neighbors to a point to be found in o(log n) time, assuming the
number of variables is small enough (fewer than ten to twenty). For more
information about KD-Tree, see Freidman, Bentley, and Fingel (1977). This method
has not been implemented yet.
RDTREE Uses an RD-Tree to store the observations in the training data set in memory. This is
a proprietary representation that, like a KD-Tree, also operates in o(log n) time, but
will generally examine fewer nodes than an RD-Tree to find the neighbors, and can
be applied with somewhat greater dimensionality.
EPSILON = positive number
Indicates an approximate nearest neighbor search when a non-zero number is specified, where the
nearest neighbors determined so far must be at most "epsilon" away from the actual nearest
neighbors to terminate the search. For large dimensionality, judicious use of epsilon can result in
radically improved performance. This option only applies to the KD-Tree or RD-Tree methods.
Defaults:
0.0
BUCKET = positive integer
Indicates the number of buckets to allow a leaf node to grow before splitting into a branch with
two new leaves. This value must be greater than or equal to 2. This option only applies to the
KD-Tree or RD-Tree methods.
Default:
8
SHOWNODES
Includes a variable _nnodes_ in the output data set that shows the number of point comparisons
that had to be done to determine the answer. This is useful as a point of comparison.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The PMBR Procedure
VAR Statement
VAR <
variable>;
variable
Specifies all numeric variables that you want to treat as dimensions for the nearest neighbor
lookup. These should be standardized and orthogonal for the nearest neighbor search to be
accurate. If no VAR statement is specified, all numeric variables in the DMDB-encoded data set
will be used.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The PMBR Procedure
TARGET Statement
Specifies one variable to be used as the target. It can be numeric or character.
TARGET <
variable>;
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The PMBR Procedure
CLASS Statement
This is currently ignored. The procedure determines whether the target is a class variable based
on the contents of the DMDB, and it cannot be changed in this procedure.
CLASS <
variable>;
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The RULEGEN Procedure
The RULEGEN Procedure
Overview
Procedure Syntax
PROC RULEGEN Statement
Details
Example
Example 1: Performing an Association Discovery
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The RULEGEN Procedure
Overview
PROC RULEGEN uses the output data set created by PROC ASSOC and generates association rules and
computes statistics, such as confidence and lift, for the rules. PROC ASSOC identifies item sets that are
related. The RULEGEN procedure generates the rules governing their association. PROC RULEGEN
output is saved as a SAS data set that can be viewed or browsed by SAS procedures that you can create
to reflect your own evaluation criteria.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The RULEGEN Procedure
Procedure Syntax
PROC RULEGEN <
option(s)>;
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The RULEGEN Procedure
PROC RULEGEN Statement
Invokes the RULEGEN procedure.
PROC RULEGEN<
option(s)>;
Required Argument
OUT=SAS-data-set
Specifies the output data set to which the rules are written. The output data set has the following
variables: CONF, COUNT, EXP_CONF, ITEM1, ITEM2, ..., ITEMn+1, _LHAND, LIFT,
_RHAND, RULE, SET_SIZE, SUPPORT.
CONF=COUNT/lhs_count
Contains the percent of confidence.
Definition:
lhs_count is the number of transactions satisfying the left side of the
rule.
COUNT
Contains the number of transactions meeting the rule.
EXP_CONF=rhs_count/total
Contains the percent of expected confidence.
Definition:
rhs_count is the number of transactions satisfying
the right side of the
rule.
ITEM1, ITEM2, ..., ITEMn+1
Contains individual items which make up the rule, including the arrow.
_LHAND
Identifies the left side of the rule, where the rule is expressed: _LHAND ==> _RHAND.
LIFT= CONF/EXP_CONF
Contains the lift ratio.
_RHAND
Identifies the right side of the rule, where the rule is expressed: _LHAND ==> _RHAND.
RULE
Contains the text of the rule, for example, A & B ==> C & D.
SET_SIZE