graphable. For more information
on these and other interfaces, look at the
Javadoc for the classes in weka.core.
15.2 Conventions for implementing classifiers
There are some conventions that you must obey when implementing classifiers
in Weka. If you do not, things will go awry. For example, Weka’s evaluation
module might not compute the classifier’s statistics properly when evaluating
it.
The first convention has already been mentioned: each time a classifier’s
buildClassifier() method is called, it must reset the model. The
CheckClassifier
class performs tests to ensure that this is the case. When buildClassifier() is called
on a dataset, the same result must always be obtained, regardless of how often
the classifier has previously been applied to the same or other datasets. However,
buildClassifier() must not reset instance variables that correspond to scheme-
specific options, because these settings must persist through multiple calls of
buildClassifier(). Also, calling
buildClassifier() must never change the input data.
Two other conventions have also been mentioned. One is that when a
classifier can’t make a prediction, its classifyInstance() method must return
Instance.missingValue() and its distributionForInstance() method must return
probabilities of zero for all classes. The ID3 implementation in Figure 15.1 does
this. Another convention is that with classifiers for numeric prediction, classi-
fyInstance() returns the numeric value that the classifier predicts. Some classi-
fiers, however, are able to predict nominal classes and their class probabilities,
as well as numeric class values—weka.classifiers.lazy.IBk is an example. These
implement the distributionForInstance() method, and if the class is numeric it
returns an array of size 1 whose only element contains the predicted numeric
value.
Another convention—not absolutely essential but useful nonetheless—is that
every classifier implements a toString() method that outputs a textual descrip-
tion of itself.
1 5 . 2
C O N V E N T I O N S F O R I M P L E M E N T I N G C L A S S I F I E R S
4 8 3
P088407-Ch015.qxd 4/30/05 10:58 AM Page 483
References
Adriaans, P., and D. Zantige. 1996. Data mining. Harlow, England: Addison-
Wesley.
Agrawal, R., and R. Srikant. 1994. Fast algorithms for mining association rules in
large databases. In J. Bocca, M. Jarke, and C. Zaniolo, editors,
Proceedings of
the International Conference on Very Large Databases, Santiago, Chile. San
Francisco: Morgan Kaufmann, pp. 478–499.
Agrawal, R., T. Imielinski, and A. Swami. 1993a. Database mining: A performance
perspective. IEEE Transactions on Knowledge and Data Engineering 5(6):
914–925.
———. 1993b. Mining association rules between sets of items in large databases.
In P. Buneman and S. Jajodia, editors, Proceedings of the ACM SIGMOD Inter-
national Conference on Management of Data, Washington, DC. New York:
ACM, pp. 207–216.
Aha, D. 1992. Tolerating noisy, irrelevant, and novel attributes in instance-
based learning algorithms. International Journal of Man-Machine Studies
36(2):267–287.
Almuallin, H., and T. G. Dietterich. 1991. Learning with many irrelevant features.
In Proceedings of the Ninth National Conference on Artificial Intelligence,
Anaheim, CA. Menlo Park, CA: AAAI Press, pp. 547–552.
———. 1992. Efficient algorithms for identifying relevant features. In Proceedings
of the Ninth Canadian Conference on Artificial Intelligence, Vancouver, BC. San
Francisco: Morgan Kaufmann, pp. 38–45.
Appelt, D. E. 1999. Introduction to information extraction technology. Tutorial, Int
Joint Conf on Artificial Intelligence IJCAI’99. Morgan Kaufmann, San Mateo.
Tutorial notes available at www.ai.sri.com/~appelt/ie-tutorial.
4 8 5
P088407-REF.qxd 4/30/05 11:24 AM Page 485
Asmis, E. 1984.
Epicurus’ scientific method. Ithaca, NY: Cornell University Press.
Atkeson, C. G., S. A. Schaal, and A. W. Moore. 1997. Locally weighted learning. AI
Review 11:11–71.
Bay, S. D. 1999. Nearest-neighbor classification from multiple feature subsets.
Intelligent Data Analysis 3(3):191–209.
Bay, S. D., and M. Schwabacher. 2003. Near linear time detection of distance-based
outliers and applications to security. In Proceedings of the Workshop on Data
Mining for Counter Terrorism and Security, San Francisco. Society for
Industrial and Applied Mathematics, Philadelphia, PA.
Bayes, T. 1763. An essay towards solving a problem in the doctrine of chances.
Philosophical Transactions of the Royal Society of London 53:370–418.
Beck, J. R., and E. K. Schultz. 1986. The use of ROC curves in test performance eval-
uation. Archives of Pathology and Laboratory Medicine 110:13–20.
Bergadano, F., and D. Gunetti. 1996. Inductive logic programming: From machine
learning to software engineering. Cambridge, MA: MIT Press.
Berners-Lee, T., J. Hendler, and O. Lassila. 2001. The semantic web. Scientific
American 284(5):34–43.
Berry, M. J. A., and G. Linoff. 1997. Data mining techniques for marketing, sales, and
customer support. New York: John Wiley.
Bigus, J. P. 1996. Data mining with neural networks. New York: McGraw Hill.
Bishop, C. M. 1995. Neural networks for pattern recognition. New York: Oxford
University Press.
Blake, C., E. Keogh, and C. J. Merz. 1998. UCI Repository of machine learning
databases
[http://www.ics.uci.edu/~mlearn/MLRepository.html].
Depart-
ment of Information and Computer Science, University of California, Irvine,
CA.
BLI (Bureau of Labour Information). 1988. Collective Bargaining Review
(November). Ottawa, Ontario, Canada: Labour Canada, Bureau of Labour
Information.
Blum, A., and T. Mitchell. 1998. Combining labeled and unlabeled data with co-
training. In Proceedings of the Eleventh Annual Conference on Computational
Learning Theory, Madison, WI. San Francisco: Morgan Kaufmann, pp. 92–100.
Bouckaert, R. R. 1995. Bayesian belief networks: From construction to inference. PhD
Dissertation, Computer Science Department, University of Utrecht, The
Netherlands.
4 8 6
R E F E R E N C E S
P088407-REF.qxd 4/30/05 11:24 AM Page 486