HAN
04-fore-xix-xxii-9780123814791
2011/6/1
3:32
Page xix
#1
Foreword
Analyzing large amounts of data is a necessity. Even popular science books, like “super
crunchers,” give compelling cases where large amounts of data yield discoveries and
intuitions that surprise even experts. Every enterprise benefits from collecting and ana-
lyzing its data: Hospitals can spot trends and anomalies in their patient records, search
engines can do better ranking and ad placement, and environmental and public health
agencies can spot patterns and abnormalities in their data. The list continues, with
cybersecurity and computer network intrusion detection; monitoring of the energy
consumption of household appliances; pattern analysis in bioinformatics and pharma-
ceutical data; financial and business intelligence data; spotting trends in blogs, Twitter,
and many more. Storage is inexpensive and getting even less so, as are data sensors. Thus,
collecting and storing data is easier than ever before.
The problem then becomes how to analyze the data. This is exactly the focus of this
Third Edition of the book. Jiawei, Micheline, and Jian give encyclopedic coverage of all
the related methods, from the classic topics of clustering and classification, to database
methods (e.g., association rules, data cubes) to more recent and advanced topics (e.g.,
SVD/PCA, wavelets, support vector machines).
The exposition is extremely accessible to beginners and advanced readers alike. The
book gives the fundamental material first and the more advanced material in follow-up
chapters. It also has numerous rhetorical questions, which I found extremely helpful for
maintaining focus.
We have used the first two editions as textbooks in data mining courses at Carnegie
Mellon and plan to continue to do so with this Third Edition. The new version has
significant additions: Notably, it has more than 100 citations to works from 2006
onward, focusing on more recent material such as graphs and social networks, sen-
sor networks, and outlier detection. This book has a new section for visualization, has
expanded outlier detection into a whole chapter, and has separate chapters for advanced
xix
HAN
04-fore-xix-xxii-9780123814791
2011/6/1
3:32
Page xx
#2
xx
Foreword
methods—for example, pattern mining with top-k patterns and more and clustering
methods with biclustering and graph clustering.
Overall, it is an excellent book on classic and modern data mining methods, and it is
ideal not only for teaching but also as a reference book.
Christos Faloutsos
Carnegie Mellon University
HAN
04-fore-xix-xxii-9780123814791
2011/6/1
3:32
Page xxi
#3
Foreword to Second Edition
We are deluged by data—scientific data, medical data, demographic data, financial data,
and marketing data. People have no time to look at this data. Human attention has
become the precious resource. So, we must find ways to automatically analyze the
data, to automatically classify it, to automatically summarize it, to automatically dis-
cover and characterize trends in it, and to automatically flag anomalies. This is one
of the most active and exciting areas of the database research community. Researchers
in areas including statistics, visualization, artificial intelligence, and machine learning
are contributing to this field. The breadth of the field makes it difficult to grasp the
extraordinary progress over the last few decades.
Six years ago, Jiawei Han’s and Micheline Kamber’s seminal textbook organized and
presented Data Mining. It heralded a golden age of innovation in the field. This revision
of their book reflects that progress; more than half of the references and historical notes
are to recent work. The field has matured with many new and improved algorithms, and
has broadened to include many more datatypes: streams, sequences, graphs, time-series,
geospatial, audio, images, and video. We are certainly not at the end of the golden age—
indeed research and commercial interest in data mining continues to grow—but we are
all fortunate to have this modern compendium.
The book gives quick introductions to database and data mining concepts with
particular emphasis on data analysis. It then covers in a chapter-by-chapter tour the
concepts and techniques that underlie classification, prediction, association, and clus-
tering. These topics are presented with examples, a tour of the best algorithms for each
problem class, and with pragmatic rules of thumb about when to apply each technique.
The Socratic presentation style is both very readable and very informative. I certainly
learned a lot from reading the first edition and got re-educated and updated in reading
the second edition.
Jiawei Han and Micheline Kamber have been leading contributors to data mining
research. This is the text they use with their students to bring them up to speed on
xxi
HAN
04-fore-xix-xxii-9780123814791
2011/6/1
3:32
Page xxii
#4
xxii
Foreword to Second Edition
the field. The field is evolving very rapidly, but this book is a quick way to learn the
basic ideas, and to understand where the field is today. I found it very informative and
stimulating, and believe you will too.
Jim Gray
In his memory