Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə202/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   198   199   200   201   202   203   204   205   ...   219

P088407-Ch014.qxd  4/30/05  11:04 AM  Page 470


Suppose you need to implement a special-purpose learning algorithm that is

not included in Weka. Or suppose you are engaged in machine learning research

and want to investigate a new learning scheme. Or suppose you just want to

learn more about the inner workings of an induction algorithm by actually pro-

gramming it yourself. This section uses a simple example to show how to make

full use of Weka’s class hierarchy when writing classifiers.

Weka includes the elementary learning schemes listed in Table 15.1, mainly

for educational purposes. None take any scheme-specific command-line

options. They are all useful for understanding the inner workings of a classifier.

As an example, we describe the weka.classifiers.trees.Id3 scheme, which imple-

ments the ID3 decision tree learner from Section 4.3.

15.1 An example classifier

Figure 15.1 gives the source code of weka.classifiers.trees.Id3, which, as you can

see from the code, extends the Classifier class. Every classifier in Weka does so,

whether it predicts a nominal class or a numeric one.

c h a p t e r

15

Writing New Learning Schemes



4 7 1

P088407-Ch015.qxd  4/30/05  10:58 AM  Page 471




The first method in weka.classifiers.trees.Id3 is  globalInfo(): we mention 

it here before moving on to the more interesting parts. It simply returns a 

string that is displayed in Weka’s graphical user interfaces when this scheme is

selected.



buildClassifier()

The buildClassifier() method constructs a classifier from a training dataset. In

this case it first checks the data for a nonnominal class, missing attribute value,

or any attribute that is not nominal, because the ID3 algorithm cannot handle

these. It then makes a copy of the training set (to avoid changing the original

data) and calls a method from weka.core.Instances to delete all instances with

missing class values, because these instances are useless in the training process.

Finally, it calls makeTree(), which actually builds the decision tree by recursively

generating all subtrees attached to the root node.

makeTree()

The first step in makeTree() is to check whether the dataset is empty. If it is, a

leaf is created by setting m_Attribute to null. The class value m_ClassValue

assigned to this leaf is set to be missing, and the estimated probability for each

of the dataset’s classes in m_Distribution is initialized to 0. If training instances

are present, makeTree() finds the attribute that yields the greatest information

gain for them. It first creates a Java enumeration of the dataset’s attributes. If the

index of the class attribute is set—as it will be for this dataset—the class is auto-

matically excluded from the enumeration.

Inside the enumeration, each attribute’s information gain is computed by



computeInfoGain() and stored in an array. We will return to this method later.

The index() method from weka.core.Attribute returns the attribute’s index in the

dataset, which is used to index the array. Once the enumeration is complete, the

attribute with the greatest information gain is stored in the instance variable



m_Attribute. The maxIndex() method from weka.core.Utils returns the index of

the greatest value in an array of integers or doubles. (If there is more than one

element with the maximum value, the first is returned.) The index of this attrib-

4 7 2


C H A P T E R   1 5

|

W R I T I N G   N EW   L E A R N I N G   S C H E M E S



Table 15.1

Simple learning schemes in Weka.

Scheme


Description

Book section



weka.classifiers.bayes.NaiveBayesSimple

Probabilistic learner

4.2

weka.classifiers.trees.Id3

Decision tree learner

4.3

weka.classifiers.rules.Prism

Rule learner

4.4

weka.classifiers.lazy.IB1

Instance-based learner

4.7

P088407-Ch015.qxd  4/30/05  10:58 AM  Page 472




1 5 . 1

A N   E X A M P L E   C L A S S I F I E R

4 7 3

package weka.classifiers.trees;



import weka.classifiers.*;

import weka.core.*;

import java.io.*;

import java.util.*;

/**

 * Class implementing an Id3 decision tree classifier.



 */

public class Id3 extends Classifier { 

  /** The node's successors. */ 

  private Id3[] m_Successors;

  /** Attribute used for splitting. */

  private Attribute m_Attribute;

  /** Class value if node is leaf. */

  private double m_ClassValue;

  /** Class distribution if node is leaf. */

  private double[] m_Distribution;

  /** Class attribute of dataset. */

  private Attribute m_ClassAttribute;

  /**

   * Returns a string describing the classifier.



   * @return a description suitable for the GUI.

   */


  public String globalInfo() { 

    return  "Class for constructing an unpruned decision tree based on the ID3 " 

      + "algorithm. Can only deal with nominal attributes. No missing values " 

      + "allowed. Empty leaves may result in unclassified instances. For more " 

      + "information see: \n\n"

      + " R. Quinlan (1986). \"Induction of decision " 

      + "trees\". Machine Learning. Vol.1, No.1, pp. 81-106";

  } 


Figure 15.1 Source code for the ID3 decision tree learner.

P088407-Ch015.qxd  4/30/05  10:58 AM  Page 473




Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   198   199   200   201   202   203   204   205   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə