HAN
05-pref-xxiii-xxx-9780123814791
2011/6/1
3:35
Page xxiii
#1
Preface
The computerization of our society has substantially enhanced our capabilities for both
generating and collecting data from diverse sources. A tremendous amount of data has
flooded almost every aspect of our lives. This explosive growth in stored or transient
data has generated an urgent need for new techniques and automated tools that can
intelligently assist us in transforming the vast amounts of data into useful information
and knowledge. This has led to the generation of a promising and flourishing frontier
in computer science called data mining, and its various applications. Data mining, also
popularly referred to as knowledge discovery from data (KDD), is the automated or con-
venient extraction of patterns representing knowledge implicitly stored or captured in
large databases, data warehouses, the Web, other massive information repositories, or
data streams.
This book explores the concepts and techniques of knowledge discovery and data min-
ing. As a multidisciplinary field, data mining draws on work from areas including statistics,
machine learning, pattern recognition, database technology, information retrieval,
network science, knowledge-based systems, artificial intelligence, high-performance
computing, and data visualization. We focus on issues relating to the feasibility, use-
fulness, effectiveness, and scalability of techniques for the discovery of patterns hidden
in large data sets. As a result, this book is not intended as an introduction to statis-
tics, machine learning, database systems, or other such areas, although we do provide
some background knowledge to facilitate the reader’s comprehension of their respective
roles in data mining. Rather, the book is a comprehensive introduction to data mining.
It is useful for computing science students, application developers, and business
professionals, as well as researchers involved in any of the disciplines previously listed.
Data mining emerged during the late 1980s, made great strides during the 1990s, and
continues to flourish into the new millennium. This book presents an overall picture
of the field, introducing interesting data mining techniques and systems and discussing
applications and research directions. An important motivation for writing this book was
the need to build an organized framework for the study of data mining—a challenging
task, owing to the extensive multidisciplinary nature of this fast-developing field. We
hope that this book will encourage people with different backgrounds and experiences
to exchange their views regarding data mining so as to contribute toward the further
promotion and shaping of this exciting and dynamic field.
xxiii
HAN
05-pref-xxiii-xxx-9780123814791
2011/6/1
3:35
Page xxiv
#2
xxiv
Preface
Organization of the Book
Since the publication of the first two editions of this book, great progress has been
made in the field of data mining. Many new data mining methodologies, systems, and
applications have been developed, especially for handling new kinds of data, includ-
ing information networks, graphs, complex structures, and data streams, as well as text,
Web, multimedia, time-series, and spatiotemporal data. Such fast development and rich,
new technical contents make it difficult to cover the full spectrum of the field in a single
book. Instead of continuously expanding the coverage of this book, we have decided to
cover the core material in sufficient scope and depth, and leave the handling of complex
data types to a separate forthcoming book.
The third edition substantially revises the first two editions of the book, with numer-
ous enhancements and a reorganization of the technical contents. The core technical
material, which handles mining on general data types, is expanded and substantially
enhanced. Several individual chapters for topics from the second edition (e.g., data pre-
processing, frequent pattern mining, classification, and clustering) are now augmented
and each split into two chapters for this new edition. For these topics, one chapter encap-
sulates the basic concepts and techniques while the other presents advanced concepts
and methods.
Chapters from the second edition on mining complex data types (e.g., stream data,
sequence data, graph-structured data, social network data, and multirelational data,
as well as text, Web, multimedia, and spatiotemporal data) are now reserved for a new
book that will be dedicated to advanced topics in data mining. Still, to support readers
in learning such advanced topics, we have placed an electronic version of the relevant
chapters from the second edition onto the book’s web site as companion material for
the third edition.
The chapters of the third edition are described briefly as follows, with emphasis on
the new material.
Chapter 1 provides an introduction to the multidisciplinary field of data mining. It
discusses the evolutionary path of information technology, which has led to the need
for data mining, and the importance of its applications. It examines the data types to be
mined, including relational, transactional, and data warehouse data, as well as complex
data types such as time-series, sequences, data streams, spatiotemporal data, multimedia
data, text data, graphs, social networks, and Web data. The chapter presents a general
classification of data mining tasks, based on the kinds of knowledge to be mined, the
kinds of technologies used, and the kinds of applications that are targeted. Finally, major
challenges in the field are discussed.
Chapter 2 introduces the general data features. It first discusses data objects and
attribute types and then introduces typical measures for basic statistical data descrip-
tions. It overviews data visualization techniques for various kinds of data. In addition
to methods of numeric data visualization, methods for visualizing text, tags, graphs,
and multidimensional data are introduced. Chapter 2 also introduces ways to measure
similarity and dissimilarity for various kinds of data.