HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 674
#2
674
Index
applications (Continued)
targeted, 27–28
telecommunications industry, 611
Web search engines, 28
application-specific outlier detection, 548–549
approximate patterns, 281
mining, 307–312
Apriori algorithm, 248–253, 272
dynamic itemset counting, 256
efficiency, improving, 254–256
example, 250–252
hash-based technique, 255
join step, 249
partitioning, 255–256
prune step, 249–250
pseudocde, 253
sampling, 256
transaction reduction, 255
Apriori property, 194, 201, 249
antimonotonicity, 249
in Apriori algorithm, 298
Apriori pruning method, 194
arrays
3-D for dimensions, 196
sparse compression, 198–199
association analysis, 17–18
association rules, 245
approximate, 281
Boolean, 281
compressed, 281
confidence, 21, 245, 246, 416
constraint-based, 281
constraints, 296–297
correlation, 265, 272
discarded, 17
fittest, 426
frequent patterns and, 280
generation from frequent itemsets, 253, 254
hybrid-dimensional, 288
interdimensional, 288
intradimensional, 287
metarule-guided mining of, 295–296
minimum confidence threshold, 18, 245
minimum support threshold, 245
mining, 272
multidimensional, 17, 287–289, 320
multilevel, 281, 283–287, 320
near-match, 281
objective measures, 21
offspring, 426
quantitative, 281, 289, 320
redundancy-aware top-k, 281
single-dimensional, 17, 287
spatial, 595
strong, 264–265, 272
support, 21, 245, 246, 417
top-k, 281
types of values in, 281
associative classification, 415, 416–419, 437
CBA, 417
CMAR, 417–418
CPAR, 418–419
rule confidence, 416
rule support, 417
steps, 417
asymmetric binary dissimilarity, 71
asymmetric binary similarity, 71
attribute construction, 112
accuracy and, 105
multivariate splits, 344
attribute selection measures, 331, 336–344
CHAID, 343
gain ratio, 340–341
Gini index, 341–343
information gain, 336–340
Minimum Description Length (MDL),
343–344
multivariate splits, 343–344
attribute subset selection, 100, 103–105
decision tree induction, 105
forward selection/backward elimination
combination, 105
greedy methods, 104–105
stepwise backward elimination, 105
stepwise forward selection, 105
attribute vectors, 40, 328
attribute-oriented induction (AOI), 166–178, 180
algorithm, 173
for class comparisons, 175–178
for data characterization, 167–172
data generalization by, 166–178
generalized relation, 172
implementation of, 172–174
attributes, 9, 40
abstraction level differences, 99
behavioral, 546, 573
binary, 41–42, 79
Boolean, 41
categorical, 41
class label, 328
contextual, 546, 573
continuous, 44
correlated, 54–56
dimension correspondence, 10
HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 675
#3
Index
675
discrete, 44
generalization, 169–170
generalization control, 170
generalization threshold control, 170
grouping, 231
interval-scaled, 43, 79
of mixed type, 75–77
nominal, 41, 79
numeric, 43–44, 79
ordered, 103
ordinal, 41, 79
qualitative, 41
ratio-scaled, 43–44, 79
reducts of, 427
removal, 169
repetition, 346
set of, 118
splitting, 333
terminology for, 40
type determination, 41
types of, 39
unordered, 103
audio data mining, 604–607, 624
automatic classification, 445
AVA. See all-versus-all
AVC-group, 347
AVC-set, 347
average()
, 215
B
background knowledge, 30–31
backpropagation, 393, 398–408, 437
activation function, 402
algorithm illustration, 401
biases, 402, 404
case updating, 404
efficiency, 404
epoch updating, 404
error, 403
functioning of, 400–403
hidden layers, 399
input layers, 399
input propagation, 401–402
interpretability and, 406–408
learning, 400
learning rate, 403–404
logistic function, 402
multilayer feed-forward neural network,
398–399
network pruning, 406–407
neural network topology definition, 400
output layers, 399
sample learning calculations, 404–406
sensitivity analysis, 408
sigmoid function, 402
squashing function, 403
terminating conditions, 404
unknown tuple classification, 406
weights initialization, 401
See also classification
bagging, 379–380
algorithm illustration, 380
boosting versus, 381–382
in building random forests, 383
bar charts, 54
base cells, 189
base cuboids, 111, 137–138, 158
Basic Local Alignment Search Tool (BLAST), 591
Baum-Welch algorithm, 591
Bayes’ theorem, 350–351
Bayesian belief networks, 393–397, 436
algorithms, 396
components of, 394
conditional probability table (CPT),
394, 395
directed acyclic graph, 394–395
gradient descent strategy, 396–397
illustrated, 394
mechanisms, 394–396
problem modeling, 395–396
topology, 396
training, 396–397
See also classification
Bayesian classification
basis, 350
Bayes’ theorem, 350–351
class conditional independence, 350
naive, 351–355, 385
posterior probability, 351
prior probability, 351
BCubed precision metric, 488, 489
BCubed recall metric, 489
behavioral attributes, 546, 573
believability, data, 85
BI (business intelligence), 27
biases, 402, 404
biclustering, 512–519, 538
application examples, 512–515
enumeration methods, 517, 518–519
gene expression example, 513–514
methods, 517–518
optimization-based methods, 517–518
recommender system example, 514–515
types of, 538
Dostları ilə paylaş: |