HAN
06-ack-xxxi-xxxiv-9780123814791
2011/6/1
3:36
Page xxxiii
#3
Acknowledgments
xxxiii
Man Lam, James Lau, Deyi Li, George (Wenmin) Li, Jin Li, Ze-Nian Li, Nancy Liao,
Gang Liu, Junqiang Liu, Ling Liu, Alan (Yijun) Lu, Hongjun Lu, Tong Lu, Wei Lu,
Xuebin Lu, Wo-Shun Luk, Heikki Mannila, Runying Mao, Abhay Mehta, Gabor Melli,
Alberto Mendelzon, Tim Merrett, Harvey Miller, Drew Miners, Behzad Mortazavi-Asl,
Richard Muntz, Raymond T. Ng, Vicent Ng, Shojiro Nishio, Beng-Chin Ooi, Tamer
Ozsu, Jian Pei, Gregory Piatetsky-Shapiro, Helen Pinto, Fred Popowich, Amynmohamed
Rajan, Peter Scheuermann, Shashi Shekhar, Wei-Min Shen, Avi Silberschatz, Evangelos
Simoudis, Nebojsa Stefanovic, Yin Jenny Tam, Simon Tang, Zhaohui Tang, Dick Tsur,
Anthony K. H. Tung, Ke Wang, Wei Wang, Zhaoxia Wang, Tony Wind, Lara Winstone,
Ju Wu, Betty (Bin) Xia, Cindy M. Xin, Xiaowei Xu, Qiang Yang, Yiwen Yin, Clement Yu,
Jeffrey Yu, Philip S. Yu, Osmar R. Zaiane, Carlo Zaniolo, Shuhua Zhang, Zhong Zhang,
Yvonne Zheng, Xiaofang Zhou, and Hua Zhu.
We are also grateful to Jean Hou, Helen Pinto, Lara Winstone, and Hua Zhu for their
help with some of the original figures in this book, and to Eugene Belchev for his careful
proofreading of each chapter.
We also wish to thank Diane Cerra, our Executive Editor at Morgan Kaufmann Pub-
lishers, for her enthusiasm, patience, and support during our writing of this book, as
well as Howard Severson, our Production Editor, and his staff for their conscientious
efforts regarding production. We are indebted to all of the reviewers for their invaluable
feedback. Finally, we thank our families for their wholehearted support throughout this
project.
HAN
07-ata-xxxv-xxxvi-9780123814791
2011/6/1
3:33
Page xxxv
#1
About the Authors
Jiawei Han is a Bliss Professor of Engineering in the Department of Computer Science
at the University of Illinois at Urbana-Champaign. He has received numerous awards
for his contributions on research into knowledge discovery and data mining, including
ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achieve-
ment Award (2005), and IEEE W. Wallace McDowell Award (2009). He is a Fellow of
ACM and IEEE. He served as founding Editor-in-Chief of ACM Transactions on Know-
ledge Discovery from Data (2006–2011) and as an editorial board member of several jour-
nals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining
and Knowledge Discovery.
Micheline Kamber has a master’s degree in computer science (specializing in artifi-
cial intelligence) from Concordia University in Montreal, Quebec. She was an NSERC
Scholar and has worked as a researcher at McGill University, Simon Fraser University,
and in Switzerland. Her background in data mining and passion for writing in easy-
to-understand terms help make this text a favorite of professionals, instructors, and
students.
Jian Pei is currently an associate professor at the School of Computing Science, Simon
Fraser University in British Columbia. He received a Ph.D. degree in computing sci-
ence from Simon Fraser University in 2002 under Dr. Jiawei Han’s supervision. He has
published prolifically in the premier academic forums on data mining, databases, Web
searching, and information retrieval and actively served the academic community. His
publications have received thousands of citations and several prestigious awards. He is
an associate editor of several data mining and data analytics journals.
xxxv
HAN
08-ch01-001-038-9780123814791
2011/6/1
3:12
Page 1
#1
1
Introduction
This book is an introduction
to the young and fast-growing field of data mining (also known
as knowledge discovery from data, or KDD for short). The book focuses on fundamental
data mining concepts and techniques for discovering interesting patterns from data in
various applications. In particular, we emphasize prominent techniques for developing
effective, efficient, and scalable data mining tools.
This chapter is organized as follows. In Section 1.1, you will learn why data mining is
in high demand and how it is part of the natural evolution of information technology.
Section 1.2 defines data mining with respect to the knowledge discovery process. Next,
you will learn about data mining from many aspects, such as the kinds of data that can
be mined (Section 1.3), the kinds of knowledge to be mined (Section 1.4), the kinds of
technologies to be used (Section 1.5), and targeted applications (Section 1.6). In this
way, you will gain a multidimensional view of data mining. Finally, Section 1.7 outlines
major data mining research and development issues.
1.1
Why Data Mining?
Necessity, who is the mother of invention. – Plato
We live in a world where vast amounts of data are collected daily. Analyzing such data
is an important need. Section 1.1.1 looks at how data mining can meet this need by
providing tools to discover knowledge from data. In Section 1.1.2, we observe how data
mining can be viewed as a result of the natural evolution of information technology.
1.1.1
Moving toward the Information Age
“We are living in the information age” is a popular saying; however, we are actually living
in the data age. Terabytes or petabytes
1
of data pour into our computer networks, the
World Wide Web (WWW), and various data storage devices every day from business,
1
A petabyte is a unit of information or computer storage equal to 1 quadrillion bytes, or a thousand
terabytes, or 1 million gigabytes.
c 2012 Elsevier Inc. All rights reserved.
Data Mining: Concepts and Techniques
1