Data Mining. Concepts and Techniques, 3rd Edition

HAN 22-ind-673-708-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	336/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 332 333 334 335 336 337 338 339 ... 343

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 686

#14

686

Index

dispersion of data, 44, 48–51

dissimilarity

asymmetric binary, 71

between attributes of mixed type, 76–77

between binary attributes, 71–72

measuring, 65–78, 79

between nominal attributes, 69

on numeric data, 72–74

between ordinal attributes, 75

symmetric binary, 70–71

dissimilarity matrix, 67, 68

data matrix versus, 67–68

n-by-n table representation, 68

as one-mode matrix, 68

distance measures, 461–462

Euclidean, 72–73

Manhattan, 72–73

Minkowski, 73

supremum, 73–74

types of, 72

distance-based cluster analysis, 445

distance-based outlier detection, 561–562

nested loop algorithm, 561, 562

See also outlier detection

distributed data mining, 615, 624

distributed privacy preservation, 622

distributions

boxplots for visualizing, 49–50

ﬁve-number summary, 49

distributive measures, 145

Divisive Analysis (DIANA), 459, 460

divisive hierarchical method, 459

agglomerative hierarchical clustering versus,

459–460

DIANA, 459, 460

DNA chips, 512

document classiﬁcation, 430

documents

language model, 26

topic model, 26–27

drill-across operation, 148

drill-down operation, 11, 146–147

drill-through operation, 148

dynamic itemset counting, 256

E

eager learners, 423, 437

Eclat (Equivalence Class Transformation) algorithm,

260, 272

e-commerce, 609

editing method, 425

efﬁciency

Apriori algorithm, 255–256

backpropagation, 404

data mining algorithms, 31

elbow method, 486

email spam ﬁltering, 435

engineering applications, 613

ensemble methods, 378–379, 386

bagging, 379–380

boosting, 380–382

for class imbalance problem, 385

random forests, 382–383

types of, 378, 386

enterprise warehouses, 132

entity identiﬁcation problem, 94

entity-relationship (ER) data model, 9, 139

epoch updating, 404

equal-frequency histograms, 107, 116

equal-width histograms, 107, 116

equivalence classes, 427

error rates, 367

error-correcting codes, 431–432

Euclidean distance, 72

mathematical properties, 72–73

weighted, 74

See also distance measures

evaluation metrics, 364–370

evolution, of database system technology, 3–5

evolutionary searches, 579

exception-based, discovery-driven exploration,

231–234, 235

exceptions, 231

exhaustive rules, 358

expectation-maximization (EM) algorithm,

505–508, 538

expectation step (E-step), 505

fuzzy clustering with, 505–507

maximization step (M-step), 505

for mixture models, 507–508

for probabilistic model-based clustering,

507–508

steps, 505

See also probabilistic model-based clustering

expected values, 97

cell, 234

exploratory data mining. See multidimensional data

mining

extraction

data, 134

rule, from decision tree, 357–359

extraction/transformation/loading (ETL) tools, 93

extractors, 151

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 687

#15

Index

687

F

fact constellation, 141

example, 141–142

illustrated, 142

fact tables, 136

summary, 165

factor analysis, 600

facts, 136

false negatives, 365

false positives, 365

farthest-neighbor clustering algorithm, 462

ﬁeld overloading, 92

ﬁnancial data analysis, 607–609

credit policy analysis, 608–609

crimes detection, 609

data warehouses, 608

loan payment prediction, 608–609

targeted marketing, 609

FindCBLOF algorithm, 569–570

ﬁve-number summary, 49

ﬁxed-width clustering, 570

FOIL, 359, 363, 418

Forest-RC, 383

forward algorithm, 591

FP-growth, 257–259, 272

algorithm illustration, 260

example, 257–258

performance, 259

FP-trees, 257

condition pattern base, 258

construction, 257–258

main memory-based, 259

mining, 258, 259

Frag-Shells, 212, 213

fraudulent analysis, 610–611

frequency patterns

approximate, 281, 307–312

compressed, 281, 307–312

constraint-based, 281

near-match, 281

redundancy-aware top-k, 281

top-k, 281

frequent itemset mining, 18, 272, 282

Apriori algorithm, 248–253

closed patterns, 262–264

market basket analysis, 244–246

max patterns, 262–264

methods, 248–264

pattern-growth approach, 257–259

with vertical data format, 259–262, 272

frequent itemsets, 243, 246, 272

association rule generation from, 253, 254

closed, 247, 248, 262–264, 308

ﬁnding, 247

ﬁnding by conﬁned candidate generation,

248–253

maximal, 247, 248, 262–264, 308

subsets, 309

frequent pattern mining, 279

advanced forms of patterns, 320

application domain-speciﬁc semantics, 282

applications, 317–319, 321

approximate patterns, 307–312

classiﬁcation criteria, 280–283

colossal patterns, 301–307

compressed patterns, 307–312

constraint-based, 294–301, 320

data analysis usages, 282

for data cleaning, 318

direct discriminative, 422

high-dimensional data, 301–307

in high-dimensional space, 320

in image data analysis, 319

for indexing structures, 319

kinds of data and features, 282

multidimensional associations, 287–289

in multilevel, multidimensional space, 283–294

multilevel associations, 283–294

in multimedia data analysis, 319

negative patterns, 291–294

for noise ﬁltering, 318

Pattern-Fusion, 302–307

quantitative association rules, 289–291

rare patterns, 291–294

in recommender systems, 319

road map, 279–283

scalable computation and, 319

scope of, 319–320

in sequence or structural data analysis, 319

in spatiotemporal data analysis, 319

for structure and cluster discovery, 318

for subspace clustering, 318–319

in time-series data analysis, 319

top-k, 310

in video data analysis, 319

See also frequent patterns

frequent pattern-based classiﬁcation, 415–422, 437

associative, 415, 416–419

discriminative, 416, 419–422

framework, 422

frequent patterns, 17, 243

abstraction levels, 281

association rule mapping, 280

basic, 280

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 332 333 334 335 336 337 338 339 ... 343