HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 686
#14
686
Index
dispersion of data, 44, 48–51
dissimilarity
asymmetric binary, 71
between attributes of mixed type, 76–77
between binary attributes, 71–72
measuring, 65–78, 79
between nominal attributes, 69
on numeric data, 72–74
between ordinal attributes, 75
symmetric binary, 70–71
dissimilarity matrix, 67, 68
data matrix versus, 67–68
n-by-n table representation, 68
as one-mode matrix, 68
distance measures, 461–462
Euclidean, 72–73
Manhattan, 72–73
Minkowski, 73
supremum, 73–74
types of, 72
distance-based cluster analysis, 445
distance-based outlier detection, 561–562
nested loop algorithm, 561, 562
See also outlier detection
distributed data mining, 615, 624
distributed privacy preservation, 622
distributions
boxplots for visualizing, 49–50
five-number summary, 49
distributive measures, 145
Divisive Analysis (DIANA), 459, 460
divisive hierarchical method, 459
agglomerative hierarchical clustering versus,
459–460
DIANA, 459, 460
DNA chips, 512
document classification, 430
documents
language model, 26
topic model, 26–27
drill-across operation, 148
drill-down operation, 11, 146–147
drill-through operation, 148
dynamic itemset counting, 256
E
eager learners, 423, 437
Eclat (Equivalence Class Transformation) algorithm,
260, 272
e-commerce, 609
editing method, 425
efficiency
Apriori algorithm, 255–256
backpropagation, 404
data mining algorithms, 31
elbow method, 486
email spam filtering, 435
engineering applications, 613
ensemble methods, 378–379, 386
bagging, 379–380
boosting, 380–382
for class imbalance problem, 385
random forests, 382–383
types of, 378, 386
enterprise warehouses, 132
entity identification problem, 94
entity-relationship (ER) data model, 9, 139
epoch updating, 404
equal-frequency histograms, 107, 116
equal-width histograms, 107, 116
equivalence classes, 427
error rates, 367
error-correcting codes, 431–432
Euclidean distance, 72
mathematical properties, 72–73
weighted, 74
See also distance measures
evaluation metrics, 364–370
evolution, of database system technology, 3–5
evolutionary searches, 579
exception-based, discovery-driven exploration,
231–234, 235
exceptions, 231
exhaustive rules, 358
expectation-maximization (EM) algorithm,
505–508, 538
expectation step (E-step), 505
fuzzy clustering with, 505–507
maximization step (M-step), 505
for mixture models, 507–508
for probabilistic model-based clustering,
507–508
steps, 505
See also probabilistic model-based clustering
expected values, 97
cell, 234
exploratory data mining. See multidimensional data
mining
extraction
data, 134
rule, from decision tree, 357–359
extraction/transformation/loading (ETL) tools, 93
extractors, 151
HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 687
#15
Index
687
F
fact constellation, 141
example, 141–142
illustrated, 142
fact tables, 136
summary, 165
factor analysis, 600
facts, 136
false negatives, 365
false positives, 365
farthest-neighbor clustering algorithm, 462
field overloading, 92
financial data analysis, 607–609
credit policy analysis, 608–609
crimes detection, 609
data warehouses, 608
loan payment prediction, 608–609
targeted marketing, 609
FindCBLOF algorithm, 569–570
five-number summary, 49
fixed-width clustering, 570
FOIL, 359, 363, 418
Forest-RC, 383
forward algorithm, 591
FP-growth, 257–259, 272
algorithm illustration, 260
example, 257–258
performance, 259
FP-trees, 257
condition pattern base, 258
construction, 257–258
main memory-based, 259
mining, 258, 259
Frag-Shells, 212, 213
fraudulent analysis, 610–611
frequency patterns
approximate, 281, 307–312
compressed, 281, 307–312
constraint-based, 281
near-match, 281
redundancy-aware top-k, 281
top-k, 281
frequent itemset mining, 18, 272, 282
Apriori algorithm, 248–253
closed patterns, 262–264
market basket analysis, 244–246
max patterns, 262–264
methods, 248–264
pattern-growth approach, 257–259
with vertical data format, 259–262, 272
frequent itemsets, 243, 246, 272
association rule generation from, 253, 254
closed, 247, 248, 262–264, 308
finding, 247
finding by confined candidate generation,
248–253
maximal, 247, 248, 262–264, 308
subsets, 309
frequent pattern mining, 279
advanced forms of patterns, 320
application domain-specific semantics, 282
applications, 317–319, 321
approximate patterns, 307–312
classification criteria, 280–283
colossal patterns, 301–307
compressed patterns, 307–312
constraint-based, 294–301, 320
data analysis usages, 282
for data cleaning, 318
direct discriminative, 422
high-dimensional data, 301–307
in high-dimensional space, 320
in image data analysis, 319
for indexing structures, 319
kinds of data and features, 282
multidimensional associations, 287–289
in multilevel, multidimensional space, 283–294
multilevel associations, 283–294
in multimedia data analysis, 319
negative patterns, 291–294
for noise filtering, 318
Pattern-Fusion, 302–307
quantitative association rules, 289–291
rare patterns, 291–294
in recommender systems, 319
road map, 279–283
scalable computation and, 319
scope of, 319–320
in sequence or structural data analysis, 319
in spatiotemporal data analysis, 319
for structure and cluster discovery, 318
for subspace clustering, 318–319
in time-series data analysis, 319
top-k, 310
in video data analysis, 319
See also frequent patterns
frequent pattern-based classification, 415–422, 437
associative, 415, 416–419
discriminative, 416, 419–422
framework, 422
frequent patterns, 17, 243
abstraction levels, 281
association rule mapping, 280
basic, 280
Dostları ilə paylaş: |