HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 678
#6
678
Index
classification (Continued)
rule-based, 355–363, 386
scalability, 369
semi-supervised, 432–433, 437
sentiment, 434
spatial, 595
speed, 369
support vector machines (SVMs), 393,
408–415, 437
transfer learning, 434–436
tree pruning, 344–347, 385
web-document, 435
Classification Based on Associations (CBA), 417
Classification based on Multiple Association Rules
(CMAR), 417–418
Classification based on Predictive Association Rules
(CPAR), 418–419
classification-based outlier detection, 571–573, 582
one-class model, 571–572
semi-supervised learning, 572
See also outlier detection
classifiers, 328
accuracy, 330, 366
bagged, 379–380
Bayesian, 350, 353
case-based reasoning, 425–426
comparing with ROC curves, 373–377
comparison aspects, 369
decision tree, 331
error rate, 367
k-nearest-neighbor, 423–425
Naive Bayesian, 351–352
overfitting data, 330
performance evaluation metrics, 364–370
recognition rate, 366–367
rule-based, 355
Clementine, 603, 606
CLIQUE, 481–483
clustering steps, 481–482
effectiveness, 483
strategy, 481
See also cluster analysis; grid-based methods
closed data cubes, 192
closed frequent itemsets, 247, 308
example, 248
mining, 262–264
shortcomings for compression, 308–309
closed graphs, 591
closed patterns, 280
top-k most frequent, 307
closure checking, 263–264
cloud computing, 31
cluster analysis, 19–20, 443–495
advanced, 497–541
agglomerative hierarchical clustering,
459–461
applications, 444, 490
attribute types and, 446
as automatic classification, 445
biclustering, 511, 512–519
BIRCH, 458, 462–466
Chameleon, 458, 466–467
CLIQUE, 481–483
clustering quality measurement, 484, 487–490
clustering tendency assessment, 484–486
constraint-based, 447, 497, 532–538
correlation-based, 511
as data redundancy technique, 108
as data segmentation, 445
DBSCAN, 471–473
DENCLUE, 476–479
density-based methods, 449, 471–479, 491
in derived space, 519–520
dimensionality reduction methods, 519–522
discretization by, 116
distance measures, 461–462
distance-based, 445
divisive hierarchical clustering, 459–461
evaluation, 483–490, 491
example, 20
expectation-maximization (EM) algorithm,
505–508
graph and network data, 497, 522–532
grid-based methods, 450, 479–483, 491
heterogeneous networks, 593
hierarchical methods, 449, 457–470, 491
high-dimensional data, 447, 497, 508–522
homogeneous networks, 593
in image recognition, 444
incremental, 446
interpretability, 447
k-means, 451–454
k-medoids, 454–457
k-modes, 454
in large databases, 445
as learning by observation, 445
low-dimensional, 509
methods, 448–451
multiple-phase, 458–459
number of clusters determination, 484, 486–487
OPTICS, 473–476
orthogonal aspects, 491
for outlier detection, 445
outlier detection and, 543
HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 679
#7
Index
679
partitioning methods, 448, 451–457, 491
pattern, 282, 308–310
probabilistic hierarchical clustering, 467–470
probability model-based, 497–508
PROCLUS, 511
requirements, 445–448, 490–491
scalability, 446
in search results organization, 444
spatial, 595
spectral, 519–522
as standalone tool, 445
STING, 479–481
subspace, 318–319, 448
subspace search methods, 510–511
taxonomy formation, 20
techniques, 443, 444
as unsupervised learning, 445
usability, 447
use of, 444
cluster computing, 31
cluster samples, 108–109
cluster-based local outlier factor (CBLOF), 569–570
clustering. See cluster analysis
clustering features, 462, 463, 464
Clustering Large Applications based upon
Randomized Search (CLARANS), 457
Clustering Large Applications (CLARA), 456–457
clustering quality measurement, 484t, 487–490
cluster completeness, 488
cluster homogeneity, 487–488
extrinsic methods, 487–489
intrinsic methods, 487, 489–490
rag bag, 488
silhouette coefficient, 489–490
small cluster preservation, 488
clustering space, 448
clustering tendency assessment, 484–486
homogeneous hypothesis, 486
Hopkins statistic, 484–485
nonhomogeneous hypothesis, 486
nonuniform distribution of data, 484
See also cluster analysis
clustering with obstacles problem, 537
clustering-based methods, 552, 567–571
example, 553
See also outlier detection
clustering-based outlier detection, 567–571, 582
approaches, 567
distance to closest cluster, 568–569
fixed-width clustering, 570
intrusion detection by, 569–570
objects not belonging to a cluster, 568
in small clusters, 570–571
weakness of, 571
clustering-based quantitative associations, 290–291
clusters, 66, 443, 444, 490
arbitrary shape, discovery of, 446
assignment rule, 497–498
completeness, 488
constraints on, 533
cuts and, 529–530
density-based, 472
determining number of, 484, 486–487
discovery of, 318
fuzzy, 499–501
graph clusters, finding, 528–529
on high-dimensional data, 509
homogeneity, 487–488
merging, 469, 470
ordering, 474–475, 477
pattern-based, 516
probabilistic, 502–503
separation of, 447
shapes, 471
small, preservation, 488
CMAR. See Classification based on Multiple
Association Rules
CN2, 359, 363
collaborative recommender systems, 610, 617, 618
collective outlier detection, 548, 582
categories of, 576
contextual outlier detection versus, 575
on graph data, 576
structure discovery, 575
collective outliers, 575, 581
mining, 575–576
co-location patterns, 319, 595
colossal patterns, 302, 320
core descendants, 305, 306
core patterns, 304–305
illustrated, 303
mining challenge, 302–303
Pattern-Fusion mining, 302–307
combined significance, 312
complete-linkage algorithm, 462
completeness
data, 84–85
data mining algorithm, 22
complex data types, 166
biological sequence data, 586, 590–591
graph patterns, 591–592
mining, 585–598, 625
networks, 591–592
in science applications, 612
Dostları ilə paylaş: |