HAN
20-ch13-585-632-9780123814791
2011/6/1
3:26
Page 624
#40
624
Chapter 13 Data Mining Trends and Research Frontiers
popular use of cellular phones, GPS, sensors, and other wireless equipment. As
outlined in Section 13.1.3, there are many challenging research issues realizing
real-time and effective knowledge discovery with such data.
Mining multimedia, text, and web data: As outlined in Section 13.1.3, mining such
kinds of data is a recent focus in data mining research. Great progress has been made,
yet there are still many open issues to be solved.
Mining biological and biomedical data: The unique combination of complexity,
richness, size, and importance of biological and biomedical data warrants spe-
cial attention in data mining. Mining DNA and protein sequences, mining high-
dimensional microarray data, and biological pathway and network analysis are just
a few topics in this field. Other areas of biological data mining research include
mining biomedical literature, link analysis across heterogeneous biological data, and
information integration of biological data by data mining.
Data mining with software engineering and system engineering: Software pro-
grams and large computer systems have become increasingly bulky in size
sophisticated in complexity, and tend to originate from the integration of multiple
components developed by different implementation teams. This trend has made it
an increasingly challenging task to ensure software robustness and reliability. The
analysis of the executions of a buggy software program is essentially a data mining
process—tracing the data generated during program executions may disclose impor-
tant patterns and outliers that could lead to the eventual automated discovery of
software bugs. We expect that the further development of data mining methodolo-
gies for software/system debugging will enhance software robustness and bring new
vigor to software/system engineering.
Visual and audio data mining: Visual and audio data mining is an effective way
to integrate with humans’ visual and audio systems and discover knowledge from
huge amounts of data. A systematic development of such techniques will facilitate
the promotion of human participation for effective and efficient data analysis.
Distributed data mining and real-time data stream mining: Traditional data min-
ing methods, designed to work at a centralized location, do not work well in
many of the distributed computing environments present today (e.g., the Inter-
net, intranets, local area networks, high-speed wireless networks, sensor networks,
and cloud computing). Advances in distributed data mining methods are expected.
Moreover, many applications involving stream data (e.g., e-commerce, Web mining,
stock analysis, intrusion detection, mobile data mining, and data mining for coun-
terterrorism) require dynamic data mining models to be built in real time. Additional
research is needed in this direction.
Privacy protection and information security in data mining: An abundance of
personal or confidential information available in electronic forms, coupled with
increasingly powerful data mining tools, poses a threat to data privacy and security.
Growing interest in data mining for counterterrorism also adds to the concern.
HAN
20-ch13-585-632-9780123814791
2011/6/1
3:26
Page 625
#41
13.6 Summary
625
Further development of privacy-preserving data mining methods is foreseen. The
collaboration of technologists, social scientists, law experts, governments, and
companies is needed to produce a rigorous privacy and security protection mech-
anism for data publishing and data mining.
With confidence, we look forward to the next generation of data mining technology
and the further benefits that it will bring.
13.6
Summary
Mining complex data types poses challenging issues, for which there are many dedi-
cated lines of research and development. This chapter presents a high-level overview
of mining complex data types, which includes mining sequence data such as time
series, symbolic sequences, and biological sequences; mining graphs and networks;
and mining other kinds of data, including spatiotemporal and cyber-physical system
data,
multimedia, text and Web data, and
data streams.
Several well-established statistical methods have been proposed for data analysis
such as regression, generalized linear models, analysis of variance, mixed-effect mod-
els, factor analysis, discriminant analysis, survival analysis, and quality control. Full
coverage of statistical data analysis methods is beyond the scope of this book. Inter-
ested readers are referred to the statistical literature cited in the bibliographic notes
(Section 13.8).
Researchers have been striving to build theoretical foundations for data mining. Sev-
eral interesting proposals have appeared, based on data reduction, data compression,
probability and statistics theory, microeconomic theory, and pattern discovery–based
inductive databases.
Visual data mining integrates data mining and data visualization to discover implicit
and useful knowledge from large data sets. Visual data mining includes data visu-
alization,
data mining result visualization,
data mining process visualization, and
interactive visual data mining.
Audio data mining uses audio signals to indicate data
patterns or features of data mining results.
Many customized data mining tools have been developed for domain-specific
applications, including finance, the retail and telecommunication industries, science
and engineering, intrusion detection and prevention, and recommender systems.
Such application domain-based studies integrate domain-specific knowledge with
data analysis techniques and provide mission-specific data mining solutions.
Ubiquitous data mining is the constant presence of data mining in many aspects
of our daily lives. It can influence how we shop, work, search for information, and
use a computer, as well as our leisure time, health, and well-being. In invisible data
mining, “smart” software, such as search engines, customer-adaptive web services