Data Mining. Concepts and Techniques, 3rd Edition

HAN 08-ch01-001-038-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	18/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 14 15 16 17 18 19 20 21 ... 343

Other Kinds of Data
What Kinds of Patterns Can Be Mined
Class/Concept Description: Characterization and Discrimination

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 14

#14

14

Chapter 1 Introduction

of items that are frequently sold together. The mining of such frequent patterns from

transactional data is discussed in Chapters 6 and 7.

1.3.4

Other Kinds of Data

Besides relational database data, data warehouse data, and transaction data, there are

many other kinds of data that have versatile forms and structures and rather different

semantic meanings. Such kinds of data can be seen in many applications: time-related

or sequence data (e.g., historical records, stock exchange data, and time-series and bio-

logical sequence data), data streams (e.g., video surveillance and sensor data, which are

continuously transmitted), spatial data (e.g., maps), engineering design data (e.g., the

design of buildings, system components, or integrated circuits), hypertext and multi-

media data (including text, image, video, and audio data), graph and networked data

(e.g., social and information networks), and the Web (a huge, widely distributed infor-

mation repository made available by the Internet). These applications bring about new

challenges, like how to handle data carrying special structures (e.g., sequences, trees,

graphs, and networks) and speciﬁc semantics (such as ordering, image, audio and video

contents, and connectivity), and how to mine patterns that carry rich structures and

semantics.

Various kinds of knowledge can be mined from these kinds of data. Here, we list

just a few. Regarding temporal data, for instance, we can mine banking data for chang-

ing trends, which may aid in the scheduling of bank tellers according to the volume of

customer trafﬁc. Stock exchange data can be mined to uncover trends that could help

you plan investment strategies (e.g., the best time to purchase AllElectronics stock). We

could mine computer network data streams to detect intrusions based on the anomaly of

message ﬂows, which may be discovered by clustering, dynamic construction of stream

models or by comparing the current frequent patterns with those at a previous time.

With spatial data, we may look for patterns that describe changes in metropolitan

poverty rates based on city distances from major highways. The relationships among

a set of spatial objects can be examined in order to discover which subsets of objects

are spatially autocorrelated or associated. By mining text data, such as literature on data

mining from the past ten years, we can identify the evolution of hot topics in the ﬁeld. By

mining user comments on products (which are often submitted as short text messages),

we can assess customer sentiments and understand how well a product is embraced by

a market. From multimedia data, we can mine images to identify objects and classify

them by assigning semantic labels or tags. By mining video data of a hockey game, we

can detect video sequences corresponding to goals. Web mining can help us learn about

the distribution of information on the WWW in general, characterize and classify web

pages, and uncover web dynamics and the association and other relationships among

different web pages, users, communities, and web-based activities.

It is important to keep in mind that, in many applications, multiple types of data

are present. For example, in web mining, there often exist text data and multimedia

data (e.g., pictures and videos) on web pages, graph data like web graphs, and map

data on some web sites. In bioinformatics, genomic sequences, biological networks, and

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 15

#15

1.4 What Kinds of Patterns Can Be Mined?

3-D spatial structures of genomes may coexist for certain biological objects. Mining

multiple data sources of complex data often leads to fruitful ﬁndings due to the mutual

enhancement and consolidation of such multiple sources. On the other hand, it is also

challenging because of the difﬁculties in data cleaning and data integration, as well as

the complex interactions among the multiple sources of such data.

While such data require sophisticated facilities for efﬁcient storage, retrieval, and

updating, they also provide fertile ground and raise challenging research and imple-

mentation issues for data mining. Data mining on such data is an advanced topic. The

methods involved are extensions of the basic techniques presented in this book.

1.4

What Kinds of Patterns Can Be Mined?

We have observed various types of data and information repositories on which data

mining can be performed. Let us now examine the kinds of patterns that can be mined.

There are a number of data mining functionalities. These include characterization

and discrimination (Section 1.4.1); the mining of frequent patterns, associations, and

correlations (Section 1.4.2); classiﬁcation and regression (Section 1.4.3); clustering anal-

ysis (Section 1.4.4); and outlier analysis (Section 1.4.5). Data mining functionalities are

used to specify the kinds of patterns to be found in data mining tasks. In general, such

tasks can be classiﬁed into two categories: descriptive and predictive. Descriptive min-

ing tasks characterize properties of the data in a target data set. Predictive mining tasks

perform induction on the current data in order to make predictions.

Data mining functionalities, and the kinds of patterns they can discover, are described

below. In addition, Section 1.4.6 looks at what makes a pattern interesting. Interesting

patterns represent knowledge.

1.4.1

Class/Concept Description: Characterization

and Discrimination

Data entries can be associated with classes or concepts. For example, in the AllElectronics

store, classes of items for sale include computers and printers, and concepts of customers

include bigSpenders and budgetSpenders. It can be useful to describe individual classes

and concepts in summarized, concise, and yet precise terms. Such descriptions of a class

or a concept are called class/concept descriptions. These descriptions can be derived

using (1) data characterization, by summarizing the data of the class under study (often

called the target class) in general terms, or (2) data discrimination, by comparison of

the target class with one or a set of comparative classes (often called the contrasting

classes), or (3) both data characterization and discrimination.

Data characterization is a summarization of the general characteristics or features

of a target class of data. The data corresponding to the user-speciﬁed class are typically

collected by a query. For example, to study the characteristics of software products with

sales that increased by 10% in the previous year, the data related to such products can

be collected by executing an SQL query on the sales database.

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 14 15 16 17 18 19 20 21 ... 343