Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə205/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   201   202   203   204   205   206   207   208   ...   219

for both methods. The default implementation of classifyInstance() calls distri-

butionForInstance(). If the class is nominal, it predicts the class with maximum

probability, or a missing value if all probabilities returned by distributionForIn-



stance() are zero. If the class is numeric, distributionForInstance() must return a

single-element array that holds the numeric prediction, and this is what classi-



fyInstance() extracts and returns. Conversely, the default implementation of dis-

tributionForInstance() wraps the prediction obtained from classifyInstance() into

a single-element array. If the class is nominal, distributionForInstance() assigns

a probability of one to the class predicted by classifyInstance() and a probabil-

ity of zero to the others. If classifyInstance() returns a missing value, then all

probabilities are set to zero. To give you a better feeling for just what these

methods do, the weka.classifiers.trees.Id3 class overrides them both.

Let’s look first at classifyInstance(), which predicts a class value for a given

instance. As mentioned in the previous section, nominal class values, like

nominal attribute values, are coded and stored in double variables, representing

the index of the value’s name in the attribute declaration. This is used in favor

of a more elegant object-oriented approach to increase speed of execution. In

the implementation of ID3, classifyInstance() first checks whether there are

missing values in the instance to be classified; if so, it throws an exception.

Otherwise, it descends the tree recursively, guided by the instance’s attribute

values, until a leaf is reached. Then it returns the class value m_ClassValue stored

at the leaf. Note that this might be a missing value, in which case the instance

is left unclassified. The method distributionForInstance() works in exactly the

same way, returning the probability distribution stored in m_Distribution.

Most machine learning models, and in particular decision trees, serve as a

more or less comprehensible explanation of the structure found in the data.

Accordingly, each of Weka’s classifiers, like many other Java objects, implements

toString() method that produces a textual representation of itself in the form

of a String variable. ID3’s toString() method outputs a decision tree in roughly

the same format as J4.8 (Figure 10.5). It recursively prints the tree structure into

String variable by accessing the attribute information stored at the nodes. To

obtain each attribute’s name and values, it uses the name() and value() methods

from weka.core.Attribute. Empty leaves without a class value are indicated by the

string null.



main()

The only method in weka.classifiers.trees.Id3 that hasn’t been described is



main(), which is called whenever the class is executed from the command line.

As you can see, it’s simple: it basically just tells Weka’s Evaluation class to eval-

uate Id3 with the given command-line options and prints the resulting string.

The one-line expression that does this is enclosed in a try–catch statement,

1 5 . 1

A N   E X A M P L E   C L A S S I F I E R



4 8 1

P088407-Ch015.qxd  4/30/05  10:58 AM  Page 481




which catches the various exceptions that can be thrown by Weka’s routines or

other Java methods.

The evaluation() method in weka.classifiers.Evaluation interprets the generic

scheme-independent command-line options described in Section 13.3 and acts

appropriately. For example, it takes the 

-t

option, which gives the name of the



training file, and loads the corresponding dataset. If there is no test file it per-

forms a cross-validation by creating a classifier object and repeatedly calling



buildClassifier()  and classifyInstance()  or  distributionForInstance() on different

subsets of the training data. Unless the user suppresses output of the model by

setting the corresponding command-line option, it also calls the toString()

method to output the model built from the full training dataset.

What happens if the scheme needs to interpret a specific option such as a

pruning parameter? This is accomplished using the OptionHandler interface in



weka.core. A classifier that implements this interface contains three methods,

listOptions(), setOptions(), and  getOptions(), which can be used to list all the

classifier’s scheme-specific options, to set some of them, and to get the options

that are currently set. The evaluation() method in Evaluation automatically calls

these methods if the classifier implements the OptionHandler interface. Once

the scheme-independent options have been processed, it calls setOptions() to

process the remaining options before using buildClassifier() to generate a new

classifier. When it outputs the classifier, it uses getOptions() to output a list of

the options that are currently set. For a simple example of how to implement

these methods, look at the source code for weka.classifiers.rules.OneR.

OptionHandler makes it possible to set options from the command line. To

set them from within the graphical user interfaces, Weka uses the Java beans

framework. All that is required are set...() and get...() methods for every param-

eter used by the class. For example, the methods setPruningParameter() and get-



PruningParameter() would be needed for a pruning parameter. There should

also be a pruningParameterTipText() method that returns a description of the

parameter for the graphical user interface. Again, see weka.classifiers.rules.OneR

for an example.

Some classifiers can be incrementally updated as new training instances

arrive; they don’t have to process all the data in one batch. In Weka, incremen-

tal classifiers implement the UpdateableClassifier interface in weka.classifiers.

This interface declares only one method, namely, updateClassifier(), which takes

a single training instance as its argument. For an example of how to use this

interface, look at the source code for weka.classifiers.lazy.IBk.

If a classifier is able to make use of instance weights, it should implement the

WeightedInstancesHandler() interface from weka.core. Then other algorithms,

such as those for boosting, can make use of this property.

In  weka.core are many other useful interfaces for classifiers—for example,

interfaces for classifiers that are randomizable, summarizable, drawable, and

4 8 2

C H A P T E R   1 5



|

W R I T I N G   N EW   L E A R N I N G   S C H E M E S

P088407-Ch015.qxd  4/30/05  10:58 AM  Page 482



Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   201   202   203   204   205   206   207   208   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə