HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 688
#16
688
Index
frequent patterns (Continued)
closed, 262–264, 280
concepts, 243–244
constraint-based, 281
dimensions, 281
diversity, 280
exploration, 313–319
growth, 257–259, 272
max, 262–264, 280
mining, 243–244, 279–325
mining constraints or criteria, 281
number of dimensions involved in, 281
semantic annotation of, 313–317
sequential, 243
strong associations, 437
structured, 243
trees, 257–259
types of values in, 281
frequent subgraphs, 591
front-end client layer, 132
full materialization, 159, 179, 234
fuzzy clustering, 499–501, 538
data set for, 506
with EM algorithm, 505–507
example, 500
expectation step (E-step), 505
flexibility, 501
maximization step (M-step), 506–507
partition matrix, 499
as soft clusters, 501
fuzzy logic, 428
fuzzy sets, 428–429, 437, 499
evaluation, 500–501
example, 499
G
gain ratio, 340
C4.5 use of, 340
formula, 341
maximum, 341
gateways, 131
gene expression, 513–514
generalization
attribute, 169–170
attribute, control, 170
attribute, threshold control, 170
in multimedia data mining, 596
process, 172
results presentation, 174
synchronous, 175
generalized linear models, 599–600
generalized relations
attribute-oriented induction, 172
presentation of, 174
threshold control, 170
generative model, 467–469
genetic algorithms, 426–427, 437
genomes, 15
geodesic distance, 525–526, 539
diameter, 525
eccentricity, 525
measurements based on, 526
peripheral vertex, 525
radius, 525
geographic data warehouses, 595
geometric projection visualization, 58–60
Gini index, 341
binary enforcement, 332
binary indexes, 341
CART use of, 341
decision tree induction using,
342–343
minimum, 342
partitioning and, 342
global constants, for missing values, 88
global outliers, 545, 581
detection, 545
example, 545
Google
Flu Trends, 2
popularity of, 619–620
gradient descent strategy, 396–397
algorithms, 397
greedy hill-climbing, 397
as iterative, 396–397
graph and network data clustering, 497,
522–532, 539
applications, 523–525
bipartite graph, 523
challenges, 523–525, 530
cuts and clusters, 529–530
generic method, 530–531
geodesic distance, 525–526
methods, 528–532
similarity measures, 525–528
SimRank, 526–528
social network, 524–525
web search engines, 523–524
See also cluster analysis
graph cuts, 539
graph data, 14
graph index structures, 591
graph pattern mining, 591–592, 612–613
graphic displays
data presentation software, 44–45
histogram, 54, 55
HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 689
#17
Index
689
quantile plot, 51–52
quantile-quantile plot, 52–54
scatter plot, 54–56
greedy hill-climbing, 397
greedy methods, attribute subset selection,
104–105
grid-based methods, 450, 479–483, 491
CLIQUE, 481–483
STING, 479–481
See also cluster analysis
grid-based outlier detection, 562–564
CELL method, 562, 563
cell properties, 562
cell pruning rules, 563
See also outlier detection
group-based support, 286
group-by
clause, 231
grouping attributes, 231
grouping variables, 231
Grubb’s test, 555
H
hamming distance, 431
hard constraints, 534, 539
example, 534
handling, 535–536
harmonic mean, 369
hash-based technique, 255
heterogeneous networks, 592
classification of, 593
clustering of, 593
ranking of, 593
heterogeneous transfer learning, 436
hidden Markov model (HMM), 590, 591
hierarchical methods, 449, 457–470, 491
agglomerative, 459–461
algorithmic, 459, 461–462
Bayesian, 459
BIRCH, 458, 462–466
Chameleon, 458, 466–467
complete linkages, 462, 463
distance measures, 461–462
divisive, 459–461
drawbacks, 449
merge or split points and, 458
probabilistic, 459, 467–470
single linkages, 462, 463
See also cluster analysis
hierarchical visualization, 63
treemaps, 63, 65
Worlds-with-Worlds, 63, 64
high-dimensional data, 301
clustering, 447
data distribution of, 560
frequent pattern mining, 301–307
outlier detection in, 576–580, 582
row enumeration, 302
high-dimensional data clustering, 497, 508–522,
538, 553
biclustering, 512–519
dimensionality reduction methods, 510,
519–522
example, 508–509
problems, challenges, and methodologies,
508–510
subspace clustering methods, 509,
510–511
See also cluster analysis
HilOut algorithm, 577–578
histograms, 54, 106–108, 116
analysis by discretization, 115–116
attributes, 106
binning, 106
construction, 559
equal-frequency, 107
equal-width, 107
example, 54
illustrated, 55, 107
multidimensional, 108
as nonparametric model, 559
outlier detection using, 558–560
holdout method, 370, 386
holistic measures, 145
homogeneous networks, 592
classification of, 593
clustering of, 593
Hopkins statistic, 484–485
horizontal data format, 259
hybrid OLAP (HOLAP), 164–165, 179
hybrid-dimensional association rules,
288
I
IBM Intelligent Miner, 603, 606
iceberg condition, 191
iceberg cubes, 160, 179, 190, 235
BUC construction, 201
computation, 160, 193–194, 319
computation and storage, 210–211
computation with Star-Cubing algorithm,
204–210
materialization, 319
specification of, 190–191
See also data cubes
icon-based visualization, 60
Chernoff faces, 60–61
Dostları ilə paylaş: |