Data Mining. Concepts and Techniques, 3rd Edition

HAN 08-ch01-001-038-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	13/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 9 10 11 12 13 14 15 16 ... 343

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 2

#2

2

Chapter 1 Introduction

society, science and engineering, medicine, and almost every other aspect of daily life.

This explosive growth of available data volume is a result of the computerization of

our society and the fast development of powerful data collection and storage tools.

Businesses worldwide generate gigantic data sets, including sales transactions, stock

trading records, product descriptions, sales promotions, company proﬁles and perfor-

mance, and customer feedback. For example, large stores, such as Wal-Mart, handle

hundreds of millions of transactions per week at thousands of branches around the

world. Scientiﬁc and engineering practices generate high orders of petabytes of data in

a continuous manner, from remote sensing, process measuring, scientiﬁc experiments,

system performance, engineering observations, and environment surveillance.

Global backbone telecommunication networks carry tens of petabytes of data trafﬁc

every day. The medical and health industry generates tremendous amounts of data from

medical records, patient monitoring, and medical imaging. Billions of Web searches

supported by search engines process tens of petabytes of data daily. Communities and

social media have become increasingly important data sources, producing digital pic-

tures and videos, blogs, Web communities, and various kinds of social networks. The

list of sources that generate huge amounts of data is endless.

This explosively growing, widely available, and gigantic body of data makes our

time truly the data age. Powerful and versatile tools are badly needed to automatically

uncover valuable information from the tremendous amounts of data and to transform

such data into organized knowledge. This necessity has led to the birth of data mining.

The ﬁeld is young, dynamic, and promising. Data mining has and will continue to make

great strides in our journey from the data age toward the coming information age.

Example 1.1

Data mining turns a large collection of data into knowledge. A search engine (e.g.,

Google) receives hundreds of millions of queries every day. Each query can be viewed

as a transaction where the user describes her or his information need. What novel and

useful knowledge can a search engine learn from such a huge collection of queries col-

lected from users over time? Interestingly, some patterns found in user search queries

can disclose invaluable knowledge that cannot be obtained by reading individual data

items alone. For example, Google’s Flu Trends uses speciﬁc search terms as indicators of

ﬂu activity. It found a close relationship between the number of people who search for

ﬂu-related information and the number of people who actually have ﬂu symptoms. A

pattern emerges when all of the search queries related to ﬂu are aggregated. Using aggre-

gated Google search data, Flu Trends can estimate ﬂu activity up to two weeks faster

than traditional systems can.

This example shows how data mining can turn a large

collection of data into knowledge that can help meet a current global challenge.

1.1.2

Data Mining as the Evolution of Information Technology

Data mining can be viewed as a result of the natural evolution of information tech-

nology. The database and data management industry evolved in the development of

This is reported in [GMP

09].

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 3

#3

1.1 Why Data Mining?

3

Data Collection and Database Creation

(1960s and earlier)

Primitive file processing

Database Management Systems

(1970s to early 1980s)

Hierarchical and network database systems

Relational database systems

Data modeling: entity-relationship models, etc.

Indexing and accessing methods

Query languages: SQL, etc.

User interfaces, forms, and reports

Query processing and optimization

Transactions, concurrency control, and recovery

Online transaction processing (OLTP)

Advanced Database Systems

(mid-1980s to present)

Advanced data models: extended-relational,

object relational, deductive, etc.

Managing complex data: spatial, temporal,

multimedia, sequence and structured,

scientific, engineering, moving objects, etc.

Data streams and cyber-physical data systems

Web-based databases (XML, semantic web)

Managing uncertain data and data cleaning

Integration of heterogeneous sources

Text database systems and integration with

information retrieval

Extremely large data management

Database system tuning and adaptive systems

Advanced queries: ranking, skyline, etc.

Cloud computing and parallel data processing

Issues of data privacy and security

Advanced Data Analysis

(late- 1980s to present)

Data warehouse and OLAP

Data mining and knowledge discovery:

classification, clustering, outlier analysis,

association and correlation, comparative

summary, discrimination analysis, pattern

discovery, trend and deviation analysis, etc.

Mining complex types of data: streams,

sequence, text, spatial, temporal, multimedia,

Web, networks, etc.

Data mining applications: business, society,

retail, banking, telecommunications, science

and engineering, blogs, daily life, etc.

Data mining and society: invisible data

mining, privacy-preserving data mining,

mining social and information networks,

recommender systems, etc.

Future Generation of Information Systems

(Present to future)

Figure 1.1

The evolution of database system technology.

several critical functionalities (Figure 1.1): data collection and database creation, data

management (including data storage and retrieval and database transaction processing),

and advanced data analysis (involving data warehousing and data mining). The early

development of data collection and database creation mechanisms served as a prerequi-

site for the later development of effective mechanisms for data storage and retrieval,

as well as query and transaction processing. Nowadays numerous database systems

offer query and transaction processing as common practice. Advanced data analysis has

naturally become the next step.

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 9 10 11 12 13 14 15 16 ... 343