3.9 Clusters
When clusters rather than a classifier is learned, the output takes the form of a
diagram that shows how the instances fall into clusters. In the simplest case this
involves associating a cluster number with each instance, which might be
depicted by laying the instances out in two dimensions and partitioning the
space to show each cluster, as illustrated in Figure 3.9(a).
Some clustering algorithms allow one instance to belong to more than one
cluster, so the diagram might lay the instances out in two dimensions and draw
overlapping subsets representing each cluster—a Venn diagram. Some algo-
rithms associate instances with clusters probabilistically rather than categori-
cally. In this case, for every instance there is a probability or degree of
membership with which it belongs to each of the clusters. This is shown in
Figure 3.9(c). This particular association is meant to be a probabilistic one, so
the numbers for each example sum to one—although that is not always the
case. Other algorithms produce a hierarchical structure of clusters so that at
the top level the instance space divides into just a few clusters, each of which
divides into its own subclusters at the next level down, and so on. In this case a
diagram such as the one in Figure 3.9(d) is used, in which elements joined
together at lower levels are more tightly clustered than ones joined together at
3 . 9
C LU S T E R S
8 1
a
b
k
d
h
g
j
f
c
i
e
(a)
a
b
k
d
h
g
j
f
c
i
e
(b)
a
b
c
d
e
f
g
h
0.1
0.8
0.3
0.1
0.2
0.4
0.2
0.4
1
0.5
0.1
0.4
0.8
0.4
0.5
0.1
0.1
0.4
0.1
0.3
0.1
0.4
0.1
0.7
0.5
2
3
(c)
g a c
i e d k b j f h
(d)
Figure 3.9 Different ways of representing clusters.
P088407-Ch003.qxd 4/30/05 11:09 AM Page 81
higher levels. Diagrams
such as this are called dendrograms. This term means
just the same thing as tree diagrams (the Greek word dendron means “a tree”),
but in clustering the more exotic version seems to be preferred—perhaps
because biologic species are a prime application area for clustering techniques,
and ancient languages are often used for naming in biology.
Clustering is often followed by a stage in which a decision tree or rule set is
inferred that allocates each instance to the cluster in which it belongs. Then, the
clustering operation is just one step on the way to a structural description.
3.10 Further reading
Knowledge representation is a key topic in classical artificial intelligence and is
well represented by a comprehensive series of papers edited by Brachman and
Levesque (1985). However, these are about ways of representing handcrafted,
not learned knowledge, and the kind of representations that can be learned from
examples are quite rudimentary in comparison. In particular, the shortcomings
of propositional rules, which in logic are referred to as the propositional calcu-
lus, and the extra expressive power of relational rules, or the predicate calculus,
are well described in introductions to logic such as that in Chapter 2 of the book
by Genesereth and Nilsson (1987).
We mentioned the problem of dealing with conflict among different rules.
Various ways of doing this, called conflict resolution strategies, have been devel-
oped for use with rule-based programming systems. These are described in
books on rule-based programming, such as that by Brownstown et al. (1985).
Again, however, they are designed for use with handcrafted rule sets rather than
ones that have been learned. The use of hand-crafted rules with exceptions for
a large dataset has been studied by Gaines and Compton (1995), and Richards
and Compton (1998) describe their role as an alternative to classic knowledge
engineering.
Further information on the various styles of concept representation can be
found in the papers that describe machine learning methods of inferring con-
cepts from examples, and these are covered in the Further reading section of
Chapter 4 and the Discussion sections of Chapter 6.
8 2
C H A P T E R 3
|
O U T P U T: K N OW L E D G E R E P R E S E N TAT I O N
P088407-Ch003.qxd 4/30/05 11:09 AM Page 82
Now that we’ve seen how the inputs and outputs can be represented, it’s time
to look at the learning algorithms themselves. This chapter explains the basic
ideas behind the techniques that are used in practical data mining. We will not
delve too deeply into the trickier issues—advanced versions of the algorithms,
optimizations that are possible, complications that arise in practice. These topics
are deferred to Chapter 6, where we come to grips with real implementations
of machine learning methods such as the ones included in data mining toolkits
and used for real-world applications. It is important to understand these more
advanced issues so that you know what is really going on when you analyze a
particular dataset.
In this chapter we look at the basic ideas. One of the most instructive lessons
is that simple ideas often work very well, and we strongly recommend the adop-
tion of a “simplicity-first” methodology when analyzing practical datasets. There
are many different kinds of simple structure that datasets can exhibit. In one
dataset, there might be a single attribute that does all the work and the others
may be irrelevant or redundant. In another dataset, the attributes might
c h a p t e r
4
Algorithms:
The Basic Methods
8 3
P088407-Ch004.qxd 4/30/05 11:13 AM Page 83