Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	202/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 198 199 200 201 202 203 204 205 ... 219

15.1 An example classiﬁer
Table 15.1 Simple learning schemes in Weka.
Figure 15.1

P088407-Ch014.qxd 4/30/05 11:04 AM Page 470

Suppose you need to implement a special-purpose learning algorithm that is

not included in Weka. Or suppose you are engaged in machine learning research

and want to investigate a new learning scheme. Or suppose you just want to

learn more about the inner workings of an induction algorithm by actually pro-

gramming it yourself. This section uses a simple example to show how to make

full use of Weka’s class hierarchy when writing classiﬁers.

Weka includes the elementary learning schemes listed in Table 15.1, mainly

for educational purposes. None take any scheme-speciﬁc command-line

options. They are all useful for understanding the inner workings of a classiﬁer.

As an example, we describe the weka.classiﬁers.trees.Id3 scheme, which imple-

ments the ID3 decision tree learner from Section 4.3.

15.1 An example classiﬁer

Figure 15.1 gives the source code of weka.classiﬁers.trees.Id3, which, as you can

see from the code, extends the Classiﬁer class. Every classiﬁer in Weka does so,

whether it predicts a nominal class or a numeric one.

c h a p t e r

Writing New Learning Schemes

4 7 1

P088407-Ch015.qxd 4/30/05 10:58 AM Page 471

The ﬁrst method in weka.classiﬁers.trees.Id3 is globalInfo(): we mention

it here before moving on to the more interesting parts. It simply returns a

string that is displayed in Weka’s graphical user interfaces when this scheme is

selected.

buildClassiﬁer()

The buildClassiﬁer() method constructs a classiﬁer from a training dataset. In

this case it ﬁrst checks the data for a nonnominal class, missing attribute value,

or any attribute that is not nominal, because the ID3 algorithm cannot handle

these. It then makes a copy of the training set (to avoid changing the original

data) and calls a method from weka.core.Instances to delete all instances with

missing class values, because these instances are useless in the training process.

Finally, it calls makeTree(), which actually builds the decision tree by recursively

generating all subtrees attached to the root node.

makeTree()

The ﬁrst step in makeTree() is to check whether the dataset is empty. If it is, a

leaf is created by setting m_Attribute to null. The class value m_ClassValue

assigned to this leaf is set to be missing, and the estimated probability for each

of the dataset’s classes in m_Distribution is initialized to 0. If training instances

are present, makeTree() ﬁnds the attribute that yields the greatest information

gain for them. It ﬁrst creates a Java enumeration of the dataset’s attributes. If the

index of the class attribute is set—as it will be for this dataset—the class is auto-

matically excluded from the enumeration.

Inside the enumeration, each attribute’s information gain is computed by

computeInfoGain() and stored in an array. We will return to this method later.

The index() method from weka.core.Attribute returns the attribute’s index in the

dataset, which is used to index the array. Once the enumeration is complete, the

attribute with the greatest information gain is stored in the instance variable

m_Attribute. The maxIndex() method from weka.core.Utils returns the index of

the greatest value in an array of integers or doubles. (If there is more than one

element with the maximum value, the ﬁrst is returned.) The index of this attrib-

4 7 2

C H A P T E R 1 5

W R I T I N G N EW L E A R N I N G S C H E M E S

Table 15.1

Simple learning schemes in Weka.

Scheme

Description

Book section

weka.classiﬁers.bayes.NaiveBayesSimple

Probabilistic learner

4.2

weka.classiﬁers.trees.Id3

Decision tree learner

4.3

weka.classiﬁers.rules.Prism

Rule learner

4.4

weka.classiﬁers.lazy.IB1

Instance-based learner

4.7

P088407-Ch015.qxd 4/30/05 10:58 AM Page 472

1 5 . 1

A N E X A M P L E C L A S S I F I E R

4 7 3

package weka.classifiers.trees;

import weka.classifiers.*;

import weka.core.*;

import java.io.*;

import java.util.*;

/**

* Class implementing an Id3 decision tree classifier.

public class Id3 extends Classifier {

/** The node's successors. */

private Id3[] m_Successors;

/** Attribute used for splitting. */

private Attribute m_Attribute;

/** Class value if node is leaf. */

private double m_ClassValue;

/** Class distribution if node is leaf. */

private double[] m_Distribution;

/** Class attribute of dataset. */

private Attribute m_ClassAttribute;

/**

* Returns a string describing the classifier.

* @return a description suitable for the GUI.

public String globalInfo() {

return "Class for constructing an unpruned decision tree based on the ID3 "

+ "algorithm. Can only deal with nominal attributes. No missing values "

+ "allowed. Empty leaves may result in unclassified instances. For more "

+ "information see: \n\n"

+ " R. Quinlan (1986). \"Induction of decision "

+ "trees\". Machine Learning. Vol.1, No.1, pp. 81-106";

}

Figure 15.1 Source code for the ID3 decision tree learner.

P088407-Ch015.qxd 4/30/05 10:58 AM Page 473

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 198 199 200 201 202 203 204 205 ... 219