HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 696
#24
696
Index
partitioning (Continued)
recursive, 335
tuples, 334
Partitioning Around Medoids (PAM) algorithm,
455–457
partitioning methods, 448, 451–457, 491
centroid-based, 451–454
global optimality, 449
iterative relocation techniques, 448
k-means, 451–454
k-medoids, 454–457
k-modes, 454
object-based, 454–457
See also cluster analysis
path-based similarity, 594
pattern analysis, in recommender systems,
282
pattern clustering, 308–310
pattern constraints, 297–300
pattern discovery, 601
pattern evaluation, 8
pattern evaluation measures, 267–271
all confidence
, 268
comparison, 269–270
cosine, 268
Kulczynski, 268
max confidence
, 268
null-invariant, 270–271
See also measures
pattern space pruning, 295
pattern-based classification, 282, 318
pattern-based clustering, 282, 516
Pattern-Fusion, 302–307
characteristics, 304
core pattern, 304–305
initial pool, 306
iterative, 306
merging subpatterns, 306
shortcuts identification, 304
See also colossal patterns
pattern-guided mining, 30
patterns
actionable, 22
co-location, 319
colossal, 301–307, 320
combined significance, 312
constraint-based generation, 296–301
context modeling of, 314–315
core, 304–305
distance, 309
evaluation methods, 264–271
expected, 22
expressed, 309
frequent, 17
hidden meaning of, 314
interesting, 21–23, 33
metric space, 306–307
negative, 280, 291–294, 320
negatively correlated, 292, 293
rare, 280, 291–294, 320
redundancy between, 312
relative significance, 312
representative, 309
search space, 303
strongly negatively correlated, 292
structural, 282
type specification, 15–23
unexpected, 22
See also frequent patterns
pattern-trees, 264
Pearson’s correlation coefficient, 222
percentiles, 48
perception-based classification (PBC), 348
illustrated, 349
as interactive visual approach, 607
pixel-oriented approach, 348–349
split screen, 349
tree comparison, 350
phylogenetic trees, 590
pivot (rotate) operation, 148
pixel-oriented visualization, 57
planning and analysis tools, 153
point queries, 216, 217, 220
pool-based approach, 433
positive correlation, 55, 56
positive tuples, 364
positively skewed data, 47
possibility theory, 428
posterior probability, 351
postpruning, 344–345, 346
power law distribution, 592
precision measure, 368–369
predicate sets
frequent, 288–289
k, 289
predicates
repeated, 288
variables, 295
prediction, 19
classification, 328
link, 593–594
loan payment, 608–609
with naive Bayesian classification, 353–355
numeric, 328, 385
HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 697
#25
Index
697
prediction cubes, 227–230, 235
example, 228–229
Probability-Based Ensemble, 229–230
predictive analysis, 18–19
predictive mining tasks, 15
predictive statistics, 24
predictors, 328
prepruning, 344, 346
prime relations
contrasting classes, 175, 177
deriving, 174
target classes, 175, 177
principle components analysis (PCA), 100, 102–103
application of, 103
correlation-based clustering with, 511
illustrated, 103
in lower-dimensional space extraction, 578
procedure, 102–103
prior probability, 351
privacy-preserving data mining, 33, 621, 626
distributed, 622
k-anonymity method, 621–622
l-diversity method, 622
as mining trend, 624–625
randomization methods, 621
results effectiveness, downgrading, 622
probabilistic clusters, 502–503
probabilistic hierarchical clustering, 467–470
agglomerative clustering framework, 467,
469
algorithm, 470
drawbacks of using, 469–470
generative model, 467–469
interpretability, 469
understanding, 469
See also hierarchical methods
probabilistic model-based clustering, 497–508, 538
expectation-maximization algorithm, 505–508
fuzzy clusters and, 499–501
product reviews example, 498
user search intent example, 498
See also cluster analysis
probability
estimation techniques, 355
posterior, 351
prior, 351
probability and statistical theory, 601
Probability-Based Ensemble (PBE), 229–230
PROCLUS, 511
profiles, 614
proximity measures, 67
for binary attributes, 70–72
for nominal attributes, 68–70
for ordinal attributes, 74–75
proximity-based methods, 552, 560–567, 581
density-based, 564–567
distance-based, 561–562
effectiveness, 552
example, 552
grid-based, 562–564
types of, 552, 560
See also outlier detection
pruning
cost complexity algorithm, 345
data space, 300–301
decision trees, 331, 344–347
in k-nearest neighbor classification, 425
network, 406–407
pattern space, 295, 297–300
pessimistic, 345
postpruning, 344–345, 346
prepruning, 344, 346
rule, 363
search space, 263, 301
sets, 345
shared dimensions, 205
sub-itemset, 263
pyramid algorithm, 101
Q
quality control, 600
quantile plots, 51–52
quantile-quantile plots, 52
example, 53–54
illustrated, 53
See also graphic displays
quantitative association rules, 281, 283, 288,
320
clustering-based mining, 290–291
data cube-based mining, 289–290
exceptional behavior disclosure, 291
mining, 289
quartiles, 48
first, 49
third, 49
queries, 10
intercuboid expansion, 223–225
intracuboid expansion, 221–223
language, 10
OLAP, 129, 130
point, 216, 217, 220
processing, 163–164, 218–227
range, 220
relational operations, 10
Dostları ilə paylaş: |