HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 676
#4
676
Index
biclusters, 511
with coherent values, 516
with coherent values on rows, 516
with constant values, 515
with constant values on columns, 515
with constant values on rows, 515
as submatrix, 515
types of, 515–516
bimodal, 47
bin boundaries, 89
binary attributes, 41, 79
asymmetric, 42, 70
as Boolean, 41
contingency table for, 70
dissimilarity between, 71–72
example, 41–42
proximity measures, 70–72
symmetric, 42, 70–71
See also attributes
binning
discretization by, 115
equal-frequency, 89
smoothing by bin boundaries, 89
smoothing by bin means, 89
smoothing by bin medians, 89
biological sequences, 586, 624
alignment of, 590–591
analysis, 590
BLAST, 590
hidden Markov model, 591
as mining trend, 624
multiple sequence alignment, 590
pairwise alignment, 590
phylogenetic tree, 590
substitution matrices, 590
bipartite graphs, 523
BIRCH, 458, 462–466
CF-trees, 462–463, 464, 465–466
clustering feature, 462, 463, 464
effectiveness, 465
multiphase clustering technique, 464–465
See also hierarchical methods
bitmap indexing, 160–161, 179
bitmapped join indexing, 163, 179
bivariate distribution, 40
BLAST. See Basic Local Alignment Search Tool
BOAT. See Bootstrapped Optimistic Algorithm for
Tree construction
Boolean association rules, 281
Boolean attributes, 41
boosting, 380
accuracy, 382
AdaBoost, 380–382
bagging versus, 381–382
weight assignment, 381
bootstrap method, 371, 386
bottom-up design approach, 133, 151–152
bottom-up subspace search, 510–511
boxplots, 49
computation, 50
example, 50
five-number summary, 49
illustrated, 50
in outlier visualization, 555
BUC, 200–204, 235
for 3-D data cube computation, 200
algorithm, 202
Apriori property, 201
bottom-up construction, 201
iceberg cube construction, 201
partitioning snapshot, 203
performance, 204
top-down processing order, 200, 201
business intelligence (BI), 27
business metadata, 135
business query view, 151
C
C4.5, 332, 385
class-based ordering, 358
gain ratio use, 340
greedy approach, 332
pessimistic pruning, 345
rule extraction, 358
See also decision tree induction
cannot-link constraints, 533
CART, 332, 385
cost complexity pruning algorithm, 345
Gini index use, 341
greedy approach, 332
See also decision tree induction
case updating, 404
case-based reasoning (CBR), 425–426
challenges, 426
categorical attributes, 41
CBA. See Classification Based on Associations
CBLOF. See cluster-based local outlier factor
CELL method, 562, 563
cells, 10–11
aggregate, 189
ancestor, 189
base, 189
descendant, 189
HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 677
#5
Index
677
dimensional, 189
exceptions, 231
residual value, 234
central tendency measures, 39, 44, 45–47
mean, 45–46
median, 46–47
midrange, 47
for missing values, 88
models, 47
centroid distance, 108
CF-trees, 462–463, 464
nodes, 465
parameters, 464
structure illustration, 464
CHAID, 343
Chameleon, 459, 466–467
clustering illustration, 466
relative closeness, 467
relative interconnectivity, 466–467
See also hierarchical methods
Chernoff faces, 60
asymmetrical, 61
illustrated, 62
ChiMerge, 117
chi-square test, 95
chunking, 195
chunks, 195
2-D, 197
3-D, 197
computation of, 198
scanning order, 197
CLARA. See Clustering Large Applications
CLARANS. See Clustering Large Applications
based upon Randomized Search
class comparisons, 166, 175, 180
attribute-oriented induction for,
175–178
mining, 176
presentation of, 175–176
procedure, 175–176
class conditional independence, 350
class imbalance problem, 384–385, 386
ensemble methods for, 385
on multiclass tasks, 385
oversampling, 384–385, 386
threshold-moving approach, 385
undersampling, 384–385, 386
class label attributes, 328
class-based ordering, 357
class/concept descriptions, 15
classes, 15, 166
contrasting, 15
equivalence, 427
target, 15
classification, 18, 327–328, 385
accuracy, 330
accuracy improvement techniques, 377–385
active learning, 433–434
advanced methods, 393–442
applications, 327
associative, 415, 416–419, 437
automatic, 445
backpropagation, 393, 398–408, 437
bagging, 379–380
basic concepts, 327–330
Bayes methods, 350–355
Bayesian belief networks, 393–397, 436
boosting, 380–382
case-based reasoning, 425–426
of class-imbalanced data, 383–385
confusion matrix, 365–366, 386
costs and benefits, 373–374
decision tree induction, 330–350
discriminative frequent pattern-based, 437
document, 430
ensemble methods, 378–379
evaluation metrics, 364–370
example, 19
frequent pattern-based, 393, 415–422, 437
fuzzy set approaches, 428–429, 437
general approach to, 328
genetic algorithms, 426–427, 437
heterogeneous networks, 593
homogeneous networks, 593
IF-THEN rules for, 355–357
interpretability, 369
k-nearest-neighbor, 423–425
lazy learners, 393, 422–426
learning step, 328
model representation, 18
model selection, 364, 370–377
multiclass, 430–432, 437
in multimedia data mining, 596
neural networks for, 19, 398–408
pattern-based, 282, 318
perception-based, 348–350
precision measure, 368–369
as prediction problem, 328
process, 328
process illustration, 329
random forests, 382–383
recall measure, 368–369
robustness, 369
rough set approach, 427–428, 437
Dostları ilə paylaş: |