HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 698
#26
698
Index
queries (Continued)
subcube, 216, 217–218
top-k, 225–227
query languages, 31
query models, 149–150
query-driven approach, 128
querying function, 433
R
rag bag criterion, 488
RainForest, 385
random forests, 382–383
random sampling, 370, 386
random subsampling, 370
random walk, 526
similarity based on, 527
randomization methods, 621
range, 48
interquartile, 49
range queries, 220
ranking
cubes, 225–227, 235
dimensions, 225
function, 225
heterogeneous networks, 593
rare patterns, 280, 283, 320
example, 291–292
mining, 291–294
ratio-scaled attributes, 43–44, 79
reachability density, 566
reachability distance, 565
recall measure, 368–369
recognition rate, 366–367
recommender systems, 282, 615
advantages, 616
biclustering for, 514–515
challenges, 617
collaborative, 610, 615, 616, 617, 618
content-based approach, 615, 616
data mining and, 615–618
error types, 617–618
frequent pattern mining for, 319
hybrid approaches, 618
intelligent query answering, 618
memory-based methods, 617
use scenarios, 616
recursive partitioning, 335
reduced support, 285, 286
redundancy
in data integration, 94
detection by correlations analysis, 94–98
redundancy-aware top-k patterns, 281, 311, 320
extracting, 310–312
finding, 312
strategy comparison, 311–312
trade-offs, 312
refresh, in back-end tools/utilities, 134
regression, 19, 90
coefficients, 105–106
example, 19
linear, 90, 105–106
in statistical data mining, 599
regression analysis, 19, 328
in time-series data, 587–588
relational databases, 9
components of, 9
mining, 10
relational schema for, 10
relational OLAP (ROLAP), 132, 164, 165, 179
relative significance, 312
relevance analysis, 19
repetition, 346
replication, 347
illustrated, 346
representative patterns, 309
retail industry, 609–611
RIPPER, 359, 363
robustness, classification, 369
ROC curves, 374, 386
classification models, 377
classifier comparison with, 373–377
illustrated, 376, 377
plotting, 375
roll-up operation, 11, 146
rough set approach, 428–429, 437
row enumeration, 302
rule ordering, 357
rule pruning, 363
rule quality measures, 361–363
rule-based classification, 355–363, 386
IF-THEN rules, 355–357
rule extraction, 357–359
rule induction, 359–363
rule pruning, 363
rule quality measures, 361–363
rules for constraints, 294
S
sales campaign analysis, 610
samples, 218
cluster, 108–109
data, 219
HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 699
#27
Index
699
simple random, 108
stratified, 109–110
sampling
in Apriori efficiency, 256
as data redundancy technique, 108–110
methods, 108–110
oversampling, 384–385
random, 386
with replacement, 380–381
uncertainty, 433
undersampling, 384–385
sampling cubes, 218–220, 235
confidence interval, 219–220
framework, 219–220
query expansion with, 221
SAS Enterprise Miner, 603, 604
scalability
classification, 369
cluster analysis, 446
cluster methods, 445
data mining algorithms, 31
decision tree induction and, 347–348
dimensionality and, 577
k-means, 454
scalable computation, 319
SCAN. See Structural Clustering Algorithm for
Networks
core vertex, 531
illustrated, 532
scatter plots, 54
2-D data set visualization with, 59
3-D data set visualization with, 60
correlations between attributes, 54–56
illustrated, 55
matrix, 56, 59
schemas
integration, 94
snowflake, 140–141
star, 139–140
science applications, 611–613
search engines, 28
search space pruning, 263, 301
second guess heuristic, 369
selection dimensions, 225
self-training, 432
semantic annotations
applications, 317, 313, 320–321
with context modeling, 316
from DBLP data set, 316–317
effectiveness, 317
example, 314–315
of frequent patterns, 313–317
mutual information, 315–316
task definition, 315
Semantic Web, 597
semi-offline materialization, 226
semi-supervised classification, 432–433,
437
alternative approaches, 433
cotraining, 432–433
self-training, 432
semi-supervised learning, 25
outlier detection by, 572
semi-supervised outlier detection, 551
sensitivity analysis, 408
sensitivity measure, 367
sentiment classification, 434
sequence data analysis, 319
sequences, 586
alignment, 590
biological, 586, 590–591
classification of, 589–590
similarity searches, 587
symbolic, 586, 588–590
time-series, 586, 587–588
sequential covering algorithm, 359
general-to-specific search, 360
greedy search, 361
illustrated, 359
rule induction with, 359–361
sequential pattern mining, 589
constraint-based, 589
in symbolic sequences, 588–589
shapelets method, 590
shared dimensions, 204
pruning, 205
shared-sorts, 193
shared-partitions, 193
shell cubes, 160
shell fragments, 192, 235
approach, 211–212
computation algorithm, 212, 213
computation example, 214–215
precomputing, 210
shrinking diameter, 592
sigmoid function, 402
signature-based detection, 614
significance levels, 373
significance measure, 312
significance tests, 372–373, 386
silhouette coefficient, 489–490
similarity
asymmetric binary, 71
cosine, 77–78
Dostları ilə paylaş: |