HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 694
#22
694
Index
neighborhoods
density, 471
distance-based outlier detection, 560
k-distance, 565
nested loop algorithm, 561, 562
networked data, 14
networks, 592
heterogeneous, 592, 593
homogeneous, 592, 593
information, 592–594
mining in science applications, 612–613
social, 592
statistical modeling of, 592–594
neural networks, 19, 398
backpropagation, 398–408
as black boxes, 406
for classification, 19, 398
disadvantages, 406
fully connected, 399, 406–407
learning, 398
multilayer feed-forward, 398–399
pruning, 406–407
rule extraction algorithms, 406, 407
sensitivity analysis, 408
three-layer, 399
topology definition, 400
two-layer, 399
neurodes, 399
Ng-Jordan-Weiss algorithm, 521, 522
no materialization, 159
noise filtering, 318
noisy data, 89–91
nominal attributes, 41
concept hierarchies for, 284
correlation analysis, 95–96
dissimilarity between, 69
example, 41
proximity measures, 68–70
similarity computation, 70
values of, 79, 288
See also attributes
nonlinear SVMs, 413–415
nonparametric statistical methods,
553–558
nonvolatile data warehouses, 127
normalization, 112, 120
data transformation by, 113–115
by decimal scaling, 115
min-max, 114
z-score, 114–115
null rules, 92
null-invariant measures, 270–271, 272
null-transactions, 270, 272
number of, 270
problem, 292–293
numeric attributes, 43–44, 79
covariance analysis, 98
interval-scaled, 43, 79
ratio-scaled, 43–44, 79
numeric data, dissimilarity on, 72–74
numeric prediction, 328, 385
classification, 328
support vector machines (SVMs) for, 408
numerosity reduction, 86, 100, 120
techniques, 100
O
object matching, 94
objective interestingness measures, 21–22
one-class model, 571–572
one-pass cube computation, 198
one-versus-all (OVA), 430
online analytical mining (OLAM), 155, 227
online analytical processing (OLAP), 4, 33, 128,
179
access patterns, 129
data contents, 128
database design, 129
dice operation, 148
drill-across operation, 148
drill-down operation, 11, 135–136, 146
drill-through operation, 148
example operations, 147
functionalities of, 154
hybrid OLAP, 164–165, 179
indexing, 125, 160–163
in information networks, 594
in knowledge discovery process, 125
market orientation, 128
multidimensional (MOLAP), 132, 164, 179
OLTP versus, 128–129, 130
operation integration, 125
operations, 146–148
pivot (rotate) operation, 148
queries, 129, 130, 163–164
query processing, 125, 163–164
relational OLAP, 132, 164, 165, 179
roll-up operation, 11, 135–136, 146
sample data effectiveness, 219
server architectures, 164–165
servers, 132
slice operation, 148
spatial, 595
statistical databases versus, 148–149
HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 695
#23
Index
695
user-control versus automation, 167
view, 129
online transaction processing (OLTP), 128
access patterns, 129
customer orientation, 128
data contents, 128
database design, 129
OLAP versus, 128–129, 130
view, 129
operational metadata, 135
OPTICS, 473–476
cluster ordering, 474–475, 477
core-distance, 475
density estimation, 477
reachability-distance, 475
structure, 476
terminology, 476
See also cluster analysis; density-based methods
ordered attributes, 103
ordering
class-based, 358
dimensions, 210
rule, 357
ordinal attributes, 42, 79
dissimilarity between, 75
example, 42
proximity measures, 74–75
outlier analysis, 20–21
clustering-based techniques, 66
example, 21
in noisy data, 90
spatial, 595
outlier detection, 543–584
angle-based (ABOD), 580
application-specific, 548–549
categories of, 581
CELL method, 562–563
challenges, 548–549
clustering analysis and, 543
clustering for, 445
clustering-based methods, 552–553, 560–567
collective, 548, 575–576
contextual, 546–547, 573–575
distance-based, 561–562
extending, 577–578
global, 545
handling noise in, 549
in high-dimensional data, 576–580, 582
with histograms, 558–560
intrusion detection, 569–570
methods, 549–553
mixture of parametric distributions, 556–558
multivariate, 556
novelty detection relationship, 545
proximity-based methods, 552, 560–567, 581
semi-supervised methods, 551
statistical methods, 552, 553–560, 581
supervised methods, 549–550
understandability, 549
univariate, 554
unsupervised methods, 550
outlier subgraphs, 576
outliers
angle-based, 20, 543, 544, 580
collective, 547–548, 581
contextual, 545–547, 573, 581
density-based, 564
distance-based, 561
example, 544
global, 545, 581
high-dimensional, modeling, 579–580
identifying, 49
interpretation of, 577
local proximity-based, 564–565
modeling, 548
in small clusters, 571
types of, 545–548, 581
visualization with boxplot, 555
oversampling, 384, 386
example, 384–385
P
pairwise alignment, 590
pairwise comparison, 372
PAM. See Partitioning Around Medoids algorithm
parallel and distributed data-intensive mining
algorithms, 31
parallel coordinates, 59, 62
parametric data reduction, 105–106
parametric statistical methods, 553–558
Pareto distribution, 592
partial distance method, 425
partial materialization, 159–160, 179, 234
strategies, 192
partition matrix, 538
partitioning
algorithms, 451–457
in Apriori efficiency, 255–256
bootstrapping, 371, 386
criteria, 447
cross-validation, 370–371, 386
Gini index and, 342
holdout method, 370, 386
random sampling, 370, 386
Dostları ilə paylaş: |