Data Mining. Concepts and Techniques, 3rd Edition

HAN 08-ch01-001-038-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	14/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 10 11 12 13 14 15 16 17 ... 343

What Is Data Mining

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 4

#4

4

Chapter 1 Introduction

Since the 1960s, database and information technology has evolved systematically

from primitive ﬁle processing systems to sophisticated and powerful database systems.

The research and development in database systems since the 1970s progressed from

early hierarchical and network database systems to relational database systems (where

data are stored in relational table structures; see Section 1.3.1), data modeling tools,

and indexing and accessing methods. In addition, users gained convenient and ﬂexible

data access through query languages, user interfaces, query optimization, and transac-

tion management. Efﬁcient methods for online transaction processing (OLTP), where a

query is viewed as a read-only transaction, contributed substantially to the evolution and

wide acceptance of relational technology as a major tool for efﬁcient storage, retrieval,

and management of large amounts of data.

After the establishment of database management systems, database technology

moved toward the development of advanced database systems, data warehousing, and

data mining for advanced data analysis and web-based databases. Advanced database

systems, for example, resulted from an upsurge of research from the mid-1980s onward.

These systems incorporate new and powerful data models such as extended-relational,

object-oriented, object-relational, and deductive models. Application-oriented database

systems have ﬂourished, including spatial, temporal, multimedia, active, stream and

sensor, scientiﬁc and engineering databases, knowledge bases, and ofﬁce information

bases. Issues related to the distribution, diversiﬁcation, and sharing of data have been

studied extensively.

Advanced data analysis sprang up from the late 1980s onward. The steady and

dazzling progress of computer hardware technology in the past three decades led to

large supplies of powerful and affordable computers, data collection equipment, and

storage media. This technology provides a great boost to the database and information

industry, and it enables a huge number of databases and information repositories to be

available for transaction management, information retrieval, and data analysis. Data

can now be stored in many different kinds of databases and information repositories.

One emerging data repository architecture is the data warehouse (Section 1.3.2).

This is a repository of multiple heterogeneous data sources organized under a uni-

ﬁed schema at a single site to facilitate management decision making. Data warehouse

technology includes data cleaning, data integration, and online analytical processing

(OLAP)—that is, analysis techniques with functionalities such as summarization, con-

solidation, and aggregation, as well as the ability to view information from different

angles. Although OLAP tools support multidimensional analysis and decision making,

additional data analysis tools are required for in-depth analysis—for example, data min-

ing tools that provide data classiﬁcation, clustering, outlier/anomaly detection, and the

characterization of changes in data over time.

Huge volumes of data have been accumulated beyond databases and data ware-

houses. During the 1990s, the World Wide Web and web-based databases (e.g., XML

databases) began to appear. Internet-based global information bases, such as the WWW

and various kinds of interconnected, heterogeneous databases, have emerged and play

a vital role in the information industry. The effective and efﬁcient analysis of data from

such different forms of data by integration of information retrieval, data mining, and

information network analysis technologies is a challenging task.

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 5

#5

1.2 What Is Data Mining?

5

How can I analyze these data?

Figure 1.2

The world is data rich but information poor.

In summary, the abundance of data, coupled with the need for powerful data analysis

tools, has been described as a data rich but information poor situation (Figure 1.2). The

fast-growing, tremendous amount of data, collected and stored in large and numerous

data repositories, has far exceeded our human ability for comprehension without power-

ful tools. As a result, data collected in large data repositories become “data tombs”—data

archives that are seldom visited. Consequently, important decisions are often made

based not on the information-rich data stored in data repositories but rather on a deci-

sion maker’s intuition, simply because the decision maker does not have the tools to

extract the valuable knowledge embedded in the vast amounts of data. Efforts have

been made to develop expert system and knowledge-based technologies, which typically

rely on users or domain experts to manually input knowledge into knowledge bases.

Unfortunately, however, the manual knowledge input procedure is prone to biases and

errors and is extremely costly and time consuming. The widening gap between data and

information calls for the systematic development of data mining tools that can turn data

tombs into “golden nuggets” of knowledge.

1.2

What Is Data Mining?

It is no surprise that data mining, as a truly interdisciplinary subject, can be deﬁned

in many different ways. Even the term data mining does not really present all the major

components in the picture. To refer to the mining of gold from rocks or sand, we say gold

mining instead of rock or sand mining. Analogously, data mining should have been more

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 10 11 12 13 14 15 16 17 ... 343