Data Mining. Concepts and Techniques, 3rd Edition


HAN 22-ind-673-708-9780123814791



Yüklə 7,95 Mb.
Pdf görüntüsü
səhifə336/343
tarix08.10.2017
ölçüsü7,95 Mb.
#3817
1   ...   332   333   334   335   336   337   338   339   ...   343

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 686

#14

686

Index

dispersion of data, 44, 48–51

dissimilarity

asymmetric binary, 71

between attributes of mixed type, 76–77

between binary attributes, 71–72

measuring, 65–78, 79

between nominal attributes, 69

on numeric data, 72–74

between ordinal attributes, 75

symmetric binary, 70–71

dissimilarity matrix, 67, 68

data matrix versus, 67–68

n-by-table representation, 68

as one-mode matrix, 68

distance measures, 461–462

Euclidean, 72–73

Manhattan, 72–73

Minkowski, 73

supremum, 73–74

types of, 72

distance-based cluster analysis, 445

distance-based outlier detection, 561–562

nested loop algorithm, 561, 562

See also outlier detection

distributed data mining, 615, 624

distributed privacy preservation, 622

distributions

boxplots for visualizing, 49–50

five-number summary, 49

distributive measures, 145

Divisive Analysis (DIANA), 459, 460

divisive hierarchical method, 459

agglomerative hierarchical clustering versus,

459–460

DIANA, 459, 460



DNA chips, 512

document classification, 430

documents

language model, 26

topic model, 26–27

drill-across operation, 148

drill-down operation, 11, 146–147

drill-through operation, 148

dynamic itemset counting, 256

E

eager learners, 423, 437

Eclat (Equivalence Class Transformation) algorithm,

260, 272


e-commerce, 609

editing method, 425

efficiency

Apriori algorithm, 255–256

backpropagation, 404

data mining algorithms, 31

elbow method, 486

email spam filtering, 435

engineering applications, 613

ensemble methods, 378–379, 386

bagging, 379–380

boosting, 380–382

for class imbalance problem, 385

random forests, 382–383

types of, 378, 386

enterprise warehouses, 132

entity identification problem, 94

entity-relationship (ER) data model, 9, 139

epoch updating, 404

equal-frequency histograms, 107, 116

equal-width histograms, 107, 116

equivalence classes, 427

error rates, 367

error-correcting codes, 431–432

Euclidean distance, 72

mathematical properties, 72–73

weighted, 74

See also distance measures

evaluation metrics, 364–370

evolution, of database system technology, 3–5

evolutionary searches, 579

exception-based, discovery-driven exploration,

231–234, 235

exceptions, 231

exhaustive rules, 358

expectation-maximization (EM) algorithm,

505–508, 538

expectation step (E-step), 505

fuzzy clustering with, 505–507

maximization step (M-step), 505

for mixture models, 507–508

for probabilistic model-based clustering,

507–508


steps, 505

See also probabilistic model-based clustering

expected values, 97

cell, 234

exploratory data mining. See multidimensional data

mining

extraction



data, 134

rule, from decision tree, 357–359

extraction/transformation/loading (ETL) tools, 93

extractors, 151




HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 687

#15

Index

687

F

fact constellation, 141

example, 141–142

illustrated, 142

fact tables, 136

summary, 165

factor analysis, 600

facts, 136

false negatives, 365

false positives, 365

farthest-neighbor clustering algorithm, 462

field overloading, 92

financial data analysis, 607–609

credit policy analysis, 608–609

crimes detection, 609

data warehouses, 608

loan payment prediction, 608–609

targeted marketing, 609

FindCBLOF algorithm, 569–570

five-number summary, 49

fixed-width clustering, 570

FOIL, 359, 363, 418

Forest-RC, 383

forward algorithm, 591

FP-growth, 257–259, 272

algorithm illustration, 260

example, 257–258

performance, 259

FP-trees, 257

condition pattern base, 258

construction, 257–258

main memory-based, 259

mining, 258, 259

Frag-Shells, 212, 213

fraudulent analysis, 610–611

frequency patterns

approximate, 281, 307–312

compressed, 281, 307–312

constraint-based, 281

near-match, 281

redundancy-aware top-k, 281

top-k, 281

frequent itemset mining, 18, 272, 282

Apriori algorithm, 248–253

closed patterns, 262–264

market basket analysis, 244–246

max patterns, 262–264

methods, 248–264

pattern-growth approach, 257–259

with vertical data format, 259–262, 272

frequent itemsets, 243, 246, 272

association rule generation from, 253, 254

closed, 247, 248, 262–264, 308

finding, 247

finding by confined candidate generation,

248–253


maximal, 247, 248, 262–264, 308

subsets, 309

frequent pattern mining, 279

advanced forms of patterns, 320

application domain-specific semantics, 282

applications, 317–319, 321

approximate patterns, 307–312

classification criteria, 280–283

colossal patterns, 301–307

compressed patterns, 307–312

constraint-based, 294–301, 320

data analysis usages, 282

for data cleaning, 318

direct discriminative, 422

high-dimensional data, 301–307

in high-dimensional space, 320

in image data analysis, 319

for indexing structures, 319

kinds of data and features, 282

multidimensional associations, 287–289

in multilevel, multidimensional space, 283–294

multilevel associations, 283–294

in multimedia data analysis, 319

negative patterns, 291–294

for noise filtering, 318

Pattern-Fusion, 302–307

quantitative association rules, 289–291

rare patterns, 291–294

in recommender systems, 319

road map, 279–283

scalable computation and, 319

scope of, 319–320

in sequence or structural data analysis, 319

in spatiotemporal data analysis, 319

for structure and cluster discovery, 318

for subspace clustering, 318–319

in time-series data analysis, 319

top-k, 310

in video data analysis, 319

See also frequent patterns

frequent pattern-based classification, 415–422, 437

associative, 415, 416–419

discriminative, 416, 419–422

framework, 422

frequent patterns, 17, 243

abstraction levels, 281

association rule mapping, 280

basic, 280



Yüklə 7,95 Mb.

Dostları ilə paylaş:
1   ...   332   333   334   335   336   337   338   339   ...   343




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə