HAN
20-ch13-585-632-9780123814791
2011/6/1
3:26
Page 628
#44
628
Chapter 13 Data Mining Trends and Research Frontiers
13.13 What are the major challenges faced in bringing data mining research to
market? Illus-
trate one data mining research issue that, in your view, may have a strong impact on the
market and on society. Discuss how to approach such a research issue.
13.14 Based on your view, what is the most challenging research problem in data mining? If
you were given a number of years and a good number of researchers and implementors,
what would your plan be to make good progress toward an effective solution to such a
problem?
13.15 Based on your experience and knowledge, suggest a
new frontier in data mining that was
not mentioned in this chapter.
13.8
Bibliographic Notes
For mining complex data types, there are many research papers and books covering
various themes. We list here some recent books and well-cited survey or research articles
for references.
Time-series analysis has been studied in statistics and computer science commu-
nities for decades, with many textbooks such as Box, Jenkins, and Reinsel [BJR08];
Brockwell and Davis [BD02]; Chatfield [Cha03b]; Hamilton [Ham94]; and Shumway
and Stoffer [SS05]. A fast subsequence matching method in time-series databases
was presented by Faloutsos, Ranganathan, and Manolopoulos [FRM94]. Agrawal, Lin,
Sawhney, and Shim [ALSS95] developed a method for fast similarity search in the pres-
ence of noise, scaling, and translation in time-series databases. Shasha and Zhu present
an overview of the methods for high-performance discovery in time series [SZ04].
Sequential pattern mining methods have been studied by many researchers,
including Agrawal and Srikant [AS95]; Zaki [Zak01]; Pei, Han, Mortazavi-Asl, et al.
[PHM-A
+
04]; and Yan, Han, and Afshar [YHA03]. The study on sequence classifica-
tion includes Ji, Bailey, and Dong [JBD05] and Ye and Keogh [YK09], with a survey by
Xing, Pei, and Keogh [XPK10]. Dong and Pei [DP07] provide an overview on sequence
data mining methods.
Methods for analysis of biological sequences including Markov chains and hidden
Markov models are introduced in many books or tutorials such as Waterman [Wat95];
Setubal and Meidanis [SM97]; Durbin, Eddy, Krogh, and Mitchison [DEKM98];
Baldi and Brunak [BB01]; Krane and Raymer [KR03]; Rabiner [Rab89]; Jones and
Pevzner [JP04]; and Baxevanis and Ouellette [BO04]. Information about BLAST
(see also Korf, Yandell, and Bedell [KYB03]) can be found at the NCBI web site
www.ncbi.nlm.nih.gov/BLAST/.
Graph pattern mining has been studied extensively, including Holder, Cook, and
Djoko [HCD94]; Inokuchi, Washio, and Motoda [IWM98]; Kuramochi and Karypis
[KK01]; Yan and Han [YH02, YH03a]; Borgelt and Berthold [BB02]; Huan, Wang,
Bandyopadhyay, et al. [HWB
+
04]; and the Gaston tool by Nijssen and Kok [NK04].
HAN
20-ch13-585-632-9780123814791
2011/6/1
3:26
Page 629
#45
13.8 Bibliographic Notes
629
There has been a great deal of research on social and information network analysis,
including Newman [New10]; Easley and Kleinberg [EK10]; Yu, Han, and Faloutsos
[YHF10]; Wasserman and Faust [WF94]; Watts [Wat03]; and Newman, Barabasi,
and Watts [NBW06]. Statistical modeling of networks is studied popularly such
as Albert and Barbasi [AB99]; Watts [Wat03]; Faloutsos, Faloutsos, and Faloutsos
[FFF99]; Kumar, Raghavan, Rajagopalan, et al. [KRR
+
00]; and Leskovec, Kleinberg, and
Faloutsos [LKF05].
Data cleaning, integration, and validation by information net-
work analysis was studied by many, including Bhattacharya and Getoor [BG04] and
Yin, Han, and Yu [YHY07, YHY08].
Clustering, ranking, and classification in networks has been studied extensively,
including in Brin and Page [BP98]; Chakrabarti, Dom, and Indyk [CDI98]; Klein-
berg [Kle99]; Getoor, Friedman, Koller, and Taskar [GFKT01]; Newman and M. Girvan
[NG04]; Yin, Han, Yang, and Yu [YHYY04]; Yin, Han, and Yu [YHY05]; Xu, Yuruk,
Feng, and Schweiger [XYFS07]; Kulis, Basu, Dhillon, and Mooney [KBDM09]; Sun,
Han, Zhao, et al. [SHZ
+
09]; Neville, Gallaher, and Eliassi-Rad [NGE-R09]; and Ji, Sun,
Danilevsky et al. [JSD
+
10]. Role discovery and link prediction in information net-
works have been studied extensively as well, such as by Krebs [Kre02]; Kubica, Moore,
and Schneider [KMS03]; Liben-Nowell and Kleinberg [L-NK03]; and Wang, Han, Jia,
et al. [WHJ
+
10].
Similarity search and OLAP in information networks has been studied by many,
including Tian, Hankins, and Patel [THP08] and Chen, Yan, Zhu, et al. [CYZ
+
08].
Evolution of social and information networks has been studied by many researchers,
such as Chakrabarti, Kumar, and Tomkins [CKT06]; Chi, Song, Zhou, et al. [CSZ
+
07];
Tang, Liu, Zhang, and Nazeri [TLZN08]; Xu, Zhang, Yu, and Long [XZYL08]; Kim and
Han [KH09]; and Sun, Tang, and Han [STH
+
10].
Spatial and spatiotemporal data mining has been studied extensively, with a col-
lection of papers by Miller and Han [MH09], and was introduced in some textbooks,
such as Shekhar and Chawla [SC03] and Hsu, Lee, and Wang [HLW07]. Spatial clus-
tering algorithms have been studied extensively in Chapters 10 and 11 of this book.
Research has been conducted on spatial warehouses and OLAP, such as by Stefanovic,
Han, and Koperski [SHK00], and spatial and spatiotemporal data mining, such as by
Koperski and Han [KH95]; Mamoulis, Cao, Kollios, Hadjieleftheriou, et al. [MCK
+
04];
Tsoukatos and Gunopulos [TG01]; and Hadjieleftheriou, Kollios, Gunopulos, and
Tsotras [HKGT03]. Mining moving-object data has been studied by many, such as
Vlachos, Gunopulos, and Kollios [VGK02]; Tao, Faloutsos, Papadias, and Liu [TFPL04];
Li, Han, Kim, and Gonzalez [LHKG07]; Lee, Han, and Whang [LHW07]; and Li, Ding,
Han, et al. [LDH
+
10]. For the bibliography of temporal, spatial, and spatiotemporal
data mining research, see a collection by Roddick, Hornsby, and Spiliopoulou [RHS01].
Multimedia data mining has deep roots in image processing and pattern recogni-
tion, which have been studied extensively in many textbooks, including Gonzalez and
Woods [GW07]; Russ [Rus06]; Duda, Hart, and Stork [DHS01]; and Z. Zhang and
R. Zhang [ZZ09]. Searching and mining of multimedia data has been studied by many
(see, e.g., Fayyad and Smyth [FS93]; Faloutsos and Lin [FL95]; Natsev, Rastogi, and