yes, then
it must be in class no—a form of closed world assumption. If this is
the case, then rules cannot conflict and there is no ambiguity in rule interpre-
tation: any interpretation strategy will give the same result. Such a set of rules
can be written as a logic expression in what is called disjunctive normal form:
that is, as a disjunction (OR) of conjunctive (AND) conditions.
It is this simple special case that seduces people into assuming rules are very
easy to deal with, because here each rule really does operate as a new, inde-
pendent piece of information that contributes in a straightforward way to the
disjunction. Unfortunately, it only applies to Boolean outcomes and requires the
closed world assumption, and both these constraints are unrealistic in most
practical situations. Machine learning algorithms that generate rules invariably
produce ordered rule sets in multiclass situations, and this sacrifices any possi-
bility of modularity because the order of execution is critical.
3.4 Association rules
Association rules are really no different from classification rules except that they
can predict any attribute, not just the class, and this gives them the freedom to
predict combinations of attributes too. Also, association rules are not intended
to be used together as a set, as classification rules are. Different association rules
express different regularities that underlie the dataset, and they generally predict
different things.
Because so many different association rules can be derived from even a tiny
dataset, interest is restricted to those that apply to a reasonably large number of
instances and have a reasonably high accuracy on the instances to which they
apply to. The coverage of an association rule is the number of instances for which
it predicts correctly—this is often called its support. Its accuracy—often called
confidence—is the number of instances
that it predicts correctly, expressed as a
proportion of all instances to which it applies. For example, with the rule:
If temperature
= cool then humidity = normal
the coverage is the number of days that are both cool and have normal humid-
ity (4 days in the data of Table 1.2), and the accuracy is the proportion of cool
days that have normal humidity (100% in this case). It is usual to specify
minimum coverage and accuracy values and to seek only those rules whose cov-
erage and accuracy are both at least these specified minima. In the weather data,
for example, there are 58 rules whose coverage and accuracy are at least 2 and
95%, respectively. (It may also be convenient to specify coverage as a percent-
age of the total number of instances instead.)
Association rules that predict multiple consequences must be interpreted
rather carefully. For example, with the weather data in Table 1.2 we saw this rule:
3 . 4
A S S O C I AT I O N RU L E S
6 9
P088407-Ch003.qxd 4/30/05 11:09 AM Page 69
If windy
= false and play = no then outlook = sunny
and humidity
= high
This is
not just a shorthand expression for the two separate rules:
If windy
= false and play = no then outlook = sunny
If windy
= false and play = no then humidity = high
It indeed implies that these exceed the minimum coverage and accuracy
figures—but it also implies more. The original rule means that the number of
examples that are nonwindy, nonplaying, with sunny outlook and high humidity,
is at least as great as the specified minimum coverage figure. It also means that
the number of such days, expressed as a proportion of nonwindy, nonplaying days,
is at least the specified minimum accuracy figure. This implies that the rule
If humidity
= high and windy = false and play = no
then outlook
= sunny
also holds, because it has the same coverage as the original rule, and its accu-
racy must be at least as high as the original rule’s
because the number of high-
humidity, nonwindy, nonplaying days is necessarily less than that of nonwindy,
nonplaying days—which makes the accuracy greater.
As we have seen, there are relationships between particular association
rules: some rules imply others. To reduce the number of rules that are produced,
in cases where several rules are related it makes sense to present only the
strongest one to the user. In the preceding example, only the first rule should
be printed.
3.5 Rules with exceptions
Returning to classification rules, a natural extension is to allow them to have
exceptions. Then incremental modifications can be made to a rule set by express-
ing exceptions to existing rules rather than reengineering the entire set. For
example, consider the iris problem described earlier. Suppose a new flower was
found with the dimensions given in Table 3.1, and an expert declared it to be
an instance of Iris setosa. If this flower was classified by the rules given in Chapter
1 (pages 15–16) for this problem, it would be misclassified by two of them:
7 0
C H A P T E R 3
|
O U T P U T: K N OW L E D G E R E P R E S E N TAT I O N
Table 3.1
A new iris flower.
Sepal length (cm)
Sepal width (cm)
Petal length (cm)
Petal width (cm)
Type
5.1
3.5
2.6
0.2
?
P088407-Ch003.qxd 4/30/05 11:09 AM Page 70