S u rv e y pa p e r

Yüklə 471,61 Kb.

Pdf görüntüsü

səhifə	10/16
tarix	08.10.2017
ölçüsü	471,61 Kb.
	#3814

1 ... 6 7 8 9 10 11 12 13 ... 16

8 k NN: k -nearest neighbor classiﬁcation

Top 10 algorithms in data mining

21

Fig. 5 The AdaBoost algorithm

AdaBoost and its variants have been applied to diverse domains with great success. For

example, Viola and Jones [

] combined AdaBoost with a cascade process for face detection.

They regarded rectangular features as weak learners, and by using AdaBoost to weight the

weak learners, they got very intuitive features for face detection. In order to get high accuracy

as well as high efﬁciency, they used a cascade process (which is beyond the scope of this

article). As the result, they reported a very strong face detector: On a 466 MHz machine,

face detection on a 384

× 288 image cost only 0.067 seconds, which is 15 times faster than

state-of-the-art face detectors at that time but with comparable accuracy. This face detector

has been recognized as one of the most exciting breakthroughs in computer vision (in par-

ticular, face detection) during the past decade. It is not strange that “Boosting” has become

a buzzword in computer vision and many other application areas.

7.3 Further research

Many interesting topics worth further studying. Here we only discuss on one theoretical topic

and one applied topic.

Many empirical study show that AdaBoost often does not overﬁt, i.e., the test error of

AdaBoost often tends to decrease even after the training error is zero. Many researchers have

studied this and several theoretical explanations have been given, e.g. [

]. Schapire et al.

[

] presented a margin-based explanation. They argued that AdaBoost is able to increase the

margins even after the training error is zero, and thus it does not overﬁt even after a large num-

ber of rounds. However, Breiman [

] indicated that larger margin does not necessarily mean

123

X. Wu et al.

better generalization, which seriously challenged the margin-based explanation. Recently,

Reyzin and Schapire [

] found that Breiman considered minimum margin instead of aver-

age or median margin, which suggests that the margin-based explanation still has chance to

survive. If this explanation succeeds, a strong connection between AdaBoost and SVM could

be found. It is obvious that this topic is well worth studying.

Many real-world applications are born with high dimensionality, i.e., with a large amount

of input features. There are two paradigms that can help us to deal with such kind of data, i.e.,

dimension reduction and feature selection. Dimension reduction methods are usually based

on mathematical projections, which attempt to transform the original features into an appro-

priate feature space. After dimension reduction, the original meaning of the features is usually

lost. Feature selection methods directly select some original features to use, and therefore

they can preserve the original meaning of the features, which is very desirable in many appli-

cations. However, feature selection methods are usually based on heuristics, lacking solid

theoretical foundation. Inspired by Viola and Jones’s work [

], we think AdaBoost could

be very useful in feature selection, especially when considering that it has solid theoretical

foundation. Current research mainly focus on images, yet we think general AdaBoost-based

feature selection techniques are well worth studying.

8 kNN: k-nearest neighbor classiﬁcation

8.1 Description of the algorithm

One of the simplest, and rather trivial classiﬁers is the Rote classiﬁer, which memorizes the

entire training data and performs classiﬁcation only if the attributes of the test object match

one of the training examples exactly. An obvious drawback of this approach is that many test

records will not be classiﬁed because they do not exactly match any of the training records. A

more sophisticated approach, k-nearest neighbor (kNN) classiﬁcation [

], ﬁnds a group

of k objects in the training set that are closest to the test object, and bases the assignment of

a label on the predominance of a particular class in this neighborhood. There are three key

elements of this approach: a set of labeled objects, e.g., a set of stored records, a distance

or similarity metric to compute distance between objects, and the value of k, the number of

nearest neighbors. To classify an unlabeled object, the distance of this object to the labeled

objects is computed, its k-nearest neighbors are identiﬁed, and the class labels of these nearest

neighbors are then used to determine the class label of the object.

Figure

provides a high-level summary of the nearest-neighbor classiﬁcation method.

Given a training set D and a test object x

= (x , y ), the algorithm computes the distance (or

similarity) between z and all the training objects

(x, y) ∈ D to determine its nearest-neighbor

list, D

z

. (x is the data of a training object, while y is its class. Likewise, x is the data of the

test object and y is its class.)

Once the nearest-neighbor list is obtained, the test object is classiﬁed based on the majority

class of its nearest neighbors:

Majority Voting: y

= argmax

)∈D

z

I

(v = y

(18)

where

v is a class label, y

is the class label for the i th nearest neighbors, and I

(·) is an

indicator function that returns the value 1 if its argument is true and 0 otherwise.

123

Yüklə 471,61 Kb.

Dostları ilə paylaş:

1 ... 6 7 8 9 10 11 12 13 ... 16