Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə206/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   202   203   204   205   206   207   208   209   ...   219

graphable. For more information on these and other interfaces, look at the

Javadoc for the classes in weka.core.



15.2 Conventions for implementing classifiers

There are some conventions that you must obey when implementing classifiers

in Weka. If you do not, things will go awry. For example, Weka’s evaluation

module might not compute the classifier’s statistics properly when evaluating

it.

The first convention has already been mentioned: each time a classifier’s



buildClassifier() method is called, it must reset the model. The CheckClassifier

class performs tests to ensure that this is the case. When buildClassifier() is called

on a dataset, the same result must always be obtained, regardless of how often

the classifier has previously been applied to the same or other datasets. However,



buildClassifier() must not reset instance variables that correspond to scheme-

specific options, because these settings must persist through multiple calls of



buildClassifier(). Also, calling buildClassifier() must never change the input data.

Two other conventions have also been mentioned. One is that when a 

classifier can’t make a prediction, its classifyInstance() method must return

Instance.missingValue() and its distributionForInstance() method must return

probabilities of zero for all classes. The ID3 implementation in Figure 15.1 does

this. Another convention is that with classifiers for numeric prediction, classi-

fyInstance() returns the numeric value that the classifier predicts. Some classi-

fiers, however, are able to predict nominal classes and their class probabilities,

as well as numeric class values—weka.classifiers.lazy.IBk is an example. These

implement the distributionForInstance() method, and if the class is numeric it

returns an array of size 1 whose only element contains the predicted numeric

value.


Another convention—not absolutely essential but useful nonetheless—is that

every classifier implements a toString() method that outputs a textual descrip-

tion of itself.

1 5 . 2


C O N V E N T I O N S   F O R   I M P L E M E N T I N G   C L A S S I F I E R S

4 8 3


P088407-Ch015.qxd  4/30/05  10:58 AM  Page 483


P088407-Ch015.qxd  4/30/05  10:58 AM  Page 484


References

Adriaans, P., and D. Zantige. 1996. Data mining. Harlow, England: Addison-

Wesley.

Agrawal, R., and R. Srikant. 1994. Fast algorithms for mining association rules in



large databases. In J. Bocca, M. Jarke, and C. Zaniolo, editors, Proceedings of

the International Conference on Very Large Databases, Santiago, Chile. San

Francisco: Morgan Kaufmann, pp. 478–499.

Agrawal, R., T. Imielinski, and A. Swami. 1993a. Database mining: A performance

perspective. IEEE Transactions on Knowledge and Data Engineering 5(6):

914–925.

———. 1993b. Mining association rules between sets of items in large databases.

In P. Buneman and S. Jajodia, editors, Proceedings of the ACM SIGMOD Inter-

national Conference on Management of Data, Washington, DC. New York:

ACM, pp. 207–216.

Aha, D. 1992. Tolerating noisy, irrelevant, and novel attributes in instance-

based learning algorithms. International Journal of Man-Machine Studies

36(2):267–287.

Almuallin, H., and T. G. Dietterich. 1991. Learning with many irrelevant features.

In  Proceedings of the Ninth National Conference on Artificial Intelligence,

Anaheim, CA. Menlo Park, CA: AAAI Press, pp. 547–552.

———. 1992. Efficient algorithms for identifying relevant features. In Proceedings

of the Ninth Canadian Conference on Artificial Intelligence, Vancouver, BC. San

Francisco: Morgan Kaufmann, pp. 38–45.

Appelt, D. E. 1999. Introduction to information extraction technology. Tutorial, Int

Joint Conf on Artificial Intelligence IJCAI’99. Morgan Kaufmann, San Mateo.

Tutorial notes available at www.ai.sri.com/~appelt/ie-tutorial.

4 8 5

P088407-REF.qxd  4/30/05  11:24 AM  Page 485




Asmis, E. 1984. Epicurus’ scientific method. Ithaca, NY: Cornell University Press.

Atkeson, C. G., S. A. Schaal, and A. W. Moore. 1997. Locally weighted learning. AI



Review 11:11–71.

Bay, S. D. 1999. Nearest-neighbor classification from multiple feature subsets.



Intelligent Data Analysis 3(3):191–209.

Bay, S. D., and M. Schwabacher. 2003. Near linear time detection of distance-based

outliers and applications to security. In Proceedings of the Workshop on Data

Mining for Counter Terrorism and Security, San Francisco. Society for 

Industrial and Applied Mathematics, Philadelphia, PA.

Bayes, T. 1763. An essay towards solving a problem in the doctrine of chances.

Philosophical Transactions of the Royal Society of London 53:370–418.

Beck, J. R., and E. K. Schultz. 1986. The use of ROC curves in test performance eval-

uation. Archives of Pathology and Laboratory Medicine 110:13–20.

Bergadano, F., and D. Gunetti. 1996. Inductive logic programming: From machine



learning to software engineering. Cambridge, MA: MIT Press.

Berners-Lee, T., J. Hendler, and O. Lassila. 2001. The semantic web. Scientific 



American 284(5):34–43.

Berry, M. J. A., and G. Linoff. 1997. Data mining techniques for marketing, sales, and



customer support. New York: John Wiley.

Bigus, J. P. 1996. Data mining with neural networks. New York: McGraw Hill.

Bishop, C. M. 1995. Neural networks for pattern recognition. New York: Oxford 

University Press.

Blake, C., E. Keogh, and C. J. Merz. 1998. UCI Repository of machine learning 

databases

[http://www.ics.uci.edu/~mlearn/MLRepository.html].

Depart-

ment of Information and Computer Science, University of California, Irvine,



CA.

BLI (Bureau of Labour Information). 1988. Collective Bargaining Review

(November). Ottawa, Ontario, Canada: Labour Canada, Bureau of Labour

Information.

Blum, A., and T. Mitchell. 1998. Combining labeled and unlabeled data with co-

training. In Proceedings of the Eleventh Annual Conference on Computational



Learning Theory, Madison, WI. San Francisco: Morgan Kaufmann, pp. 92–100.

Bouckaert, R. R. 1995. Bayesian belief networks: From construction to inference. PhD

Dissertation, Computer Science Department, University of Utrecht, The

Netherlands.

4 8 6

R E F E R E N C E S



P088407-REF.qxd  4/30/05  11:24 AM  Page 486


Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   202   203   204   205   206   207   208   209   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə