HAN
20-ch13-585-632-9780123814791
2011/6/1
3:26
Page 626
#42
626
Chapter 13 Data Mining Trends and Research Frontiers
(e.g., using recommender algorithms), email managers, and so on, incorporates data
mining into its functional components, often unbeknownst to the user.
A major social concern of data mining is the issue of privacy and data security.
Privacy-preserving data mining deals with obtaining valid data mining results with-
out disclosing underlying sensitive values. Its goal is to ensure privacy protection and
security while preserving the overall quality of data mining results.
Data mining trends include further efforts toward the exploration of new applica-
tion areas; improved scalable, interactive, and constraint-based mining methods; the
integration of data mining with web service, database, warehousing, and cloud com-
puting systems; and mining social and information networks. Other trends include
the mining of spatiotemporal and cyber-physical system data, biological data, soft-
ware/system engineering data, and multimedia and text data, in addition to web
mining, distributed and real-time data stream mining, visual and audio mining, and
privacy and security in data mining.
13.7
Exercises
13.1 Sequence data are ubiquitous and have diverse applications. This chapter presented a
general overview of sequential pattern mining, sequence classification, sequence sim-
ilarity search, trend analysis, biological sequence alignment, and modeling. However,
we have not covered sequence clustering. Present an overview of methods for sequence
clustering.
13.2 This chapter presented an overview of sequence pattern mining and graph pattern
mining methods. Mining tree patterns and partial order patterns is also studied in
research. Summarize the methods for mining structured patterns, including sequences,
trees, graphs, and partial order relationships. Examine what kinds of structural pattern
mining have not been covered in research. Propose applications that can be created for
such new mining problems.
13.3 Many studies analyze homogeneous information networks (e.g., social networks con-
sisting of friends linked with friends). However, many other applications involve het-
erogeneous information networks (i.e., networks linking multiple types of object such
as research papers, conference, authors, and topics). What are the major differences
between methodologies for mining heterogeneous information networks and methods
for their homogeneous counterparts?
13.4 Research and describe a
data mining application that was not presented in this chapter.
Discuss how different forms of data mining can be used in the application.
13.5 Why is the establishment of
theoretical foundations important for data mining? Name
and describe the main theoretical foundations that have been proposed for data min-
ing. Comment on how they each satisfy (or fail to satisfy) the requirements of an ideal
theoretical framework for data mining.
HAN
20-ch13-585-632-9780123814791
2011/6/1
3:26
Page 627
#43
13.7 Exercises
627
13.6 (Research project) Building a theory of data mining requires setting up a
theoretical
framework so that the major data mining functions can be explained under this
framework. Take one theory as an example (e.g., data compression theory) and examine
how the major data mining functions fit into this framework. If some functions do not
fit well into the current theoretical framework, can you propose a way to extend the
framework to explain these functions?
13.7 There is a strong linkage between statistical data analysis and data mining. Some people
think of data mining as automated and scalable methods for statistical data analysis.
Do you agree or disagree with this perception? Present one statistical analysis method
that can be automated and/or scaled up nicely by integration with current data mining
methodology.
13.8 What are the differences between visual data mining and data visualization? Data visu-
alization may suffer from the data abundance problem. For example, it is not easy to
visually discover interesting properties of network connections if a social network is
huge, with complex and dense connections. Propose a visualization method that may
help people see through the network topology to the interesting features of a social
network.
13.9 Propose a few implementation methods for
audio data mining. Can we integrate audio
and visual data mining to bring fun and power to data mining? Is it possible to develop
some video data mining methods? State some scenarios and your solutions to make such
integrated audiovisual mining effective.
13.10 General-purpose computers and domain-independent relational database systems have
become a large market in the last several decades. However, many people feel that generic
data mining systems will not prevail in the data mining market. What do you think? For
data mining, should we focus our efforts on developing domain-independent data mining
tools or on developing domain-specific data mining solutions? Present your reasoning.
13.11 What is a recommender system? In what ways does it differ from a customer or product-
based clustering system? How does it differ from a typical classification or predictive
modeling system? Outline one method of collaborative filtering. Discuss why it works
and what its limitations are in practice.
13.12 Suppose that your local bank has a data mining system. The bank has been studying
your debit card usage patterns. Noticing that you make many transactions at home
renovation stores, the bank decides to contact you, offering information regarding their
special loans for home improvements.
(a) Discuss how this may conflict with your right to privacy.
(b) Describe another situation in which you feel that data mining can infringe on your
privacy.
(c) Describe a privacy-preserving data mining method that may allow the bank to per-
form customer pattern analysis without infringing on its customers’ right to privacy.
(d) What are some examples where data mining could be used to help society? Can you
think of ways it could be used that may be detrimental to society?