Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	205/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 201 202 203 204 205 206 207 208 ... 219

for both methods. The default implementation of classifyInstance() calls distri-

butionForInstance(). If the class is nominal, it predicts the class with maximum

probability, or a missing value if all probabilities returned by distributionForIn-

stance() are zero. If the class is numeric, distributionForInstance() must return a

single-element array that holds the numeric prediction, and this is what classi-

fyInstance() extracts and returns. Conversely, the default implementation of dis-

tributionForInstance() wraps the prediction obtained from classifyInstance() into

a single-element array. If the class is nominal, distributionForInstance() assigns

a probability of one to the class predicted by classifyInstance() and a probabil-

ity of zero to the others. If classifyInstance() returns a missing value, then all

probabilities are set to zero. To give you a better feeling for just what these

methods do, the weka.classiﬁers.trees.Id3 class overrides them both.

Let’s look ﬁrst at classifyInstance(), which predicts a class value for a given

instance. As mentioned in the previous section, nominal class values, like

nominal attribute values, are coded and stored in double variables, representing

the index of the value’s name in the attribute declaration. This is used in favor

of a more elegant object-oriented approach to increase speed of execution. In

the implementation of ID3, classifyInstance() ﬁrst checks whether there are

missing values in the instance to be classiﬁed; if so, it throws an exception.

Otherwise, it descends the tree recursively, guided by the instance’s attribute

values, until a leaf is reached. Then it returns the class value m_ClassValue stored

at the leaf. Note that this might be a missing value, in which case the instance

is left unclassiﬁed. The method distributionForInstance() works in exactly the

same way, returning the probability distribution stored in m_Distribution.

Most machine learning models, and in particular decision trees, serve as a

more or less comprehensible explanation of the structure found in the data.

Accordingly, each of Weka’s classiﬁers, like many other Java objects, implements

a toString() method that produces a textual representation of itself in the form

of a String variable. ID3’s toString() method outputs a decision tree in roughly

the same format as J4.8 (Figure 10.5). It recursively prints the tree structure into

a String variable by accessing the attribute information stored at the nodes. To

obtain each attribute’s name and values, it uses the name() and value() methods

from weka.core.Attribute. Empty leaves without a class value are indicated by the

string null.

main()

The only method in weka.classiﬁers.trees.Id3 that hasn’t been described is

main(), which is called whenever the class is executed from the command line.

As you can see, it’s simple: it basically just tells Weka’s Evaluation class to eval-

uate Id3 with the given command-line options and prints the resulting string.

The one-line expression that does this is enclosed in a try–catch statement,

1 5 . 1

A N E X A M P L E C L A S S I F I E R

4 8 1

P088407-Ch015.qxd 4/30/05 10:58 AM Page 481

which catches the various exceptions that can be thrown by Weka’s routines or

other Java methods.

The evaluation() method in weka.classiﬁers.Evaluation interprets the generic

scheme-independent command-line options described in Section 13.3 and acts

appropriately. For example, it takes the

-t

option, which gives the name of the

training ﬁle, and loads the corresponding dataset. If there is no test ﬁle it per-

forms a cross-validation by creating a classiﬁer object and repeatedly calling

buildClassiﬁer() and classifyInstance() or distributionForInstance() on different

subsets of the training data. Unless the user suppresses output of the model by

setting the corresponding command-line option, it also calls the toString()

method to output the model built from the full training dataset.

What happens if the scheme needs to interpret a speciﬁc option such as a

pruning parameter? This is accomplished using the OptionHandler interface in

weka.core. A classiﬁer that implements this interface contains three methods,

listOptions(), setOptions(), and getOptions(), which can be used to list all the

classiﬁer’s scheme-speciﬁc options, to set some of them, and to get the options

that are currently set. The evaluation() method in Evaluation automatically calls

these methods if the classiﬁer implements the OptionHandler interface. Once

the scheme-independent options have been processed, it calls setOptions() to

process the remaining options before using buildClassiﬁer() to generate a new

classiﬁer. When it outputs the classiﬁer, it uses getOptions() to output a list of

the options that are currently set. For a simple example of how to implement

these methods, look at the source code for weka.classiﬁers.rules.OneR.

OptionHandler makes it possible to set options from the command line. To

set them from within the graphical user interfaces, Weka uses the Java beans

framework. All that is required are set...() and get...() methods for every param-

eter used by the class. For example, the methods setPruningParameter() and get-

PruningParameter() would be needed for a pruning parameter. There should

also be a pruningParameterTipText() method that returns a description of the

parameter for the graphical user interface. Again, see weka.classiﬁers.rules.OneR

for an example.

Some classiﬁers can be incrementally updated as new training instances

arrive; they don’t have to process all the data in one batch. In Weka, incremen-

tal classiﬁers implement the UpdateableClassiﬁer interface in weka.classiﬁers.

This interface declares only one method, namely, updateClassiﬁer(), which takes

a single training instance as its argument. For an example of how to use this

interface, look at the source code for weka.classiﬁers.lazy.IBk.

If a classiﬁer is able to make use of instance weights, it should implement the

WeightedInstancesHandler() interface from weka.core. Then other algorithms,

such as those for boosting, can make use of this property.

In weka.core are many other useful interfaces for classiﬁers—for example,

interfaces for classiﬁers that are randomizable, summarizable, drawable, and

4 8 2

C H A P T E R 1 5

W R I T I N G N EW L E A R N I N G S C H E M E S

P088407-Ch015.qxd 4/30/05 10:58 AM Page 482

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 201 202 203 204 205 206 207 208 ... 219