Data Mining. Concepts and Techniques, 3rd Edition

HAN 08-ch01-001-038-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	26/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 22 23 24 25 26 27 28 29 ... 343

Efﬁciency and Scalability
Data Mining and Society

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 31

#31

1.7 Major Issues in Data Mining

into the knowledge discovery process. Such knowledge can be used for pattern

evaluation as well as to guide the search toward interesting patterns.

Ad hoc data mining and data mining query languages: Query languages (e.g., SQL)

have played an important role in ﬂexible searching because they allow users to pose

ad hoc queries. Similarly, high-level data mining query languages or other high-level

ﬂexible user interfaces will give users the freedom to deﬁne ad hoc data mining tasks.

This should facilitate speciﬁcation of the relevant sets of data for analysis, the domain

knowledge, the kinds of knowledge to be mined, and the conditions and constraints

to be enforced on the discovered patterns. Optimization of the processing of such

ﬂexible mining requests is another promising area of study.

Presentation and visualization of data mining results: How can a data mining system

present data mining results, vividly and ﬂexibly, so that the discovered knowledge

can be easily understood and directly usable by humans? This is especially crucial

if the data mining process is interactive. It requires the system to adopt expressive

knowledge representations, user-friendly interfaces, and visualization techniques.

1.7.3

Efﬁciency and Scalability

Efﬁciency and scalability are always considered when comparing data mining algo-

rithms. As data amounts continue to multiply, these two factors are especially critical.

Efﬁciency and scalability of data mining algorithms: Data mining algorithms must be

efﬁcient and scalable in order to effectively extract information from huge amounts

of data in many data repositories or in dynamic data streams. In other words, the

running time of a data mining algorithm must be predictable, short, and acceptable

by applications. Efﬁciency, scalability, performance, optimization, and the ability to

execute in real time are key criteria that drive the development of many new data

mining algorithms.

Parallel, distributed, and incremental mining algorithms: The humongous size of many

data sets, the wide distribution of data, and the computational complexity of some

data mining methods are factors that motivate the development of parallel and dis-

tributed data-intensive mining algorithms. Such algorithms ﬁrst partition the data

into “pieces.” Each piece is processed, in parallel, by searching for patterns. The par-

allel processes may interact with one another. The patterns from each partition are

eventually merged.

Cloud computing and cluster computing, which use computers in a distributed

and collaborative way to tackle very large-scale computational tasks, are also active

research themes in parallel data mining. In addition, the high cost of some data min-

ing processes and the incremental nature of input promote incremental data mining,

which incorporates new data updates without having to mine the entire data “from

scratch.” Such methods perform knowledge modiﬁcation incrementally to amend

and strengthen what was previously discovered.

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 32

#32

32

Chapter 1 Introduction

1.7.4

Diversity of Database Types

The wide diversity of database types brings about challenges to data mining. These

include

Handling complex types of data: Diverse applications generate a wide spectrum of

new data types, from structured data such as relational and data warehouse data to

semi-structured and unstructured data; from stable data repositories to dynamic data

streams; from simple data objects to temporal data, biological sequences, sensor data,

spatial data, hypertext data, multimedia data, software program code, Web data, and

social network data. It is unrealistic to expect one data mining system to mine all

kinds of data, given the diversity of data types and the different goals of data mining.

Domain- or application-dedicated data mining systems are being constructed for in-

depth mining of speciﬁc kinds of data. The construction of effective and efﬁcient

data mining tools for diverse applications remains a challenging and active area of

research.

Mining dynamic, networked, and global data repositories: Multiple sources of data

are connected by the Internet and various kinds of networks, forming gigantic, dis-

tributed, and heterogeneous global information systems and networks. The discovery

of knowledge from different sources of structured, semi-structured, or unstructured

yet interconnected data with diverse data semantics poses great challenges to data

mining. Mining such gigantic, interconnected information networks may help dis-

close many more patterns and knowledge in heterogeneous data sets than can be dis-

covered from a small set of isolated data repositories. Web mining, multisource data

mining, and information network mining have become challenging and fast-evolving

data mining ﬁelds.

1.7.5

Data Mining and Society

How does data mining impact society? What steps can data mining take to preserve the

privacy of individuals? Do we use data mining in our daily lives without even knowing

that we do? These questions raise the following issues:

Social impacts of data mining: With data mining penetrating our everyday lives, it is

important to study the impact of data mining on society. How can we use data mining

technology to beneﬁt society? How can we guard against its misuse? The improper

disclosure or use of data and the potential violation of individual privacy and data

protection rights are areas of concern that need to be addressed.

Privacy-preserving data mining: Data mining will help scientiﬁc discovery, business

management, economy recovery, and security protection (e.g., the real-time dis-

covery of intruders and cyberattacks). However, it poses the risk of disclosing an

individual’s personal information. Studies on privacy-preserving data publishing and

data mining are ongoing. The philosophy is to observe data sensitivity and preserve

people’s privacy while performing successful data mining.

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 22 23 24 25 26 27 28 29 ... 343