HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 682
#10
682
Index
data characterization, 15, 166
attribute-oriented induction, 167–172
data mining query, 167–168
example, 16
methods, 16
output, 16
data classification. See classification
data cleaning, 6, 85, 88–93, 120
in back-end tools/utilities, 134
binning, 89–90
discrepancy detection, 91–93
by information network analysis, 592–593
missing values, 88–89
noisy data, 89
outlier analysis, 90
pattern mining for, 318
as process, 91–93
regression, 90
See also data preprocessing
data constraints, 294
antimonotonic, 300
pruning data space with, 300–301
succinct, 300
See also constraints
data cube aggregation, 110–111
data cube computation, 156–160, 214–215
aggregation and, 193
average()
, 215
BUC, 200–204, 235
cube
operator, 157–159
cube shells, 211
full, 189–190, 195–199
general strategies for, 192–194
iceberg, 160, 193–194
memory allocation, 199
methods, 194–218, 235
multiway array aggregation, 195–199
one-pass, 198
preliminary concepts, 188–194
shell fragments, 210–218, 235
Star-Cubing, 204–210, 235
data cubes, 10, 136, 178, 188
3-D, 138
4-D, 138, 139
apex cuboid, 111, 138, 158
base cuboid, 111, 137–138, 158
closed, 192
cube shell, 192
cuboids, 137
curse of dimensionality, 158
discovery-driven exploration, 231–234
example, 11–13
full, 189–190, 196–197
gradient analysis, 321
iceberg, 160, 190–191, 201, 235
lattice of cuboids, 157, 234, 290
materialization, 159–160, 179, 234
measures, 145
multidimensional, 12, 136–139
multidimensional data mining and, 26
multifeature, 227, 230–231, 235
multimedia, 596
prediction, 227–230, 235
qualitative association mining, 289–290
queries, 230
query processing, 218–227
ranking, 225–227, 235
sampling, 218–220, 235
shell, 160, 211
shell fragments, 192, 210–218, 235
sparse, 190
spatial, 595
technology, 187–242
data discretization. See discretization
data dispersion, 44, 48–51
boxplots, 49–50
five-number summary, 49
quartiles, 48–49
standard deviation, 50–51
variance, 50–51
data extraction, in back-end tools/utilities, 134
data focusing, 168
data generalization, 179–180
by attribute-oriented induction, 166–178
data integration, 6, 85–86, 93–99, 120
correlation analysis, 94–98
detection/resolution of data value conflicts,
99
entity identification problem, 94
by information network analysis, 592–593
object matching, 94
redundancy and, 94–98
schema, 94
tuple duplication, 98–99
See also data preprocessing
data marts, 132, 142
data warehouses versus, 142
dependent, 132
distributed, 134
implementation, 132
independent, 132
data matrix, 67–68
dissimilarity matrix versus, 67–68
relational table, 67–68
HAN
22-ind-673-708-9780123814791
2011/6/1
3:27
Page 683
#11
Index
683
rows and columns, 68
as two-mode matrix, 68
data migration tools, 93
data mining, 5–8, 33, 598, 623
ad hoc, 31
applications, 607–618
biological data, 624
complex data types, 585–598, 625
cyber-physical system data, 596
data streams, 598
data types for, 8
data warehouses for, 154
database types and, 32
descriptive, 15
distributed, 615, 624
efficiency, 31
foundations, views on, 600–601
functionalities, 15–23, 34
graphs and networks, 591–594
incremental, 31
as information technology evolution, 2–5
integration, 623
interactive, 30
as interdisciplinary effort, 29–30
invisible, 33, 618–620, 625
issues in, 29–33, 34
in knowledge discovery, 7
as knowledge search through data, 6
machine learning similarities, 26
methodologies, 29–30, 585–607
motivation for, 1–5
multidimensional, 11–13, 26, 33–34, 155–156,
179, 227–230
multimedia data, 596
OLAP and, 154
as pattern/knowledge discovery process, 8
predictive, 15
presentation/visualization of results, 31
privacy-preserving, 32, 621–622, 624–625, 626
query languages, 31
relational databases, 10
scalability, 31
sequence data, 586
social impacts, 32
society and, 618–622
spatial data, 595
spatiotemporal data and moving objects,
595–596, 623–624
statistical, 598
text data, 596–597, 624
trends, 622–625, 626
ubiquitous, 618–620, 625
user interaction and, 30–31
visual and audio, 602–607, 624, 625
Web data, 597–598, 624
data mining systems, 10
data models
entity-relationship (ER), 9, 139
multidimensional, 135–146
data objects, 40, 79
similarity, 40
terminology for, 40
data preprocessing, 83–124
cleaning, 88–93
forms illustration, 87
integration, 93–99
overview, 84–87
quality, 84–85
reduction, 99–111
in science applications, 612
summary, 87
tasks in, 85–87
transformation, 111–119
data quality, 84, 120
accuracy, 84
believability, 85
completeness, 84–85
consistency, 85
interpretability, 85
timeliness, 85
data reduction, 86, 99–111, 120
attribute subset selection, 103–105
clustering, 108
compression, 100, 120
data cube aggregation, 110–111
dimensionality, 86, 99–100, 120
histograms, 106–108
numerosity, 86, 100, 120
parametric, 105–106
principle components analysis, 102–103
sampling, 108
strategies, 99–100
theory, 601
wavelet transforms, 100–102
See also data preprocessing
data rich but information poor, 5
data scrubbing tools, 92
data security-enhancing techniques, 621
data segmentation, 445
data selection, 8
data source view, 151
data streams, 14, 598, 624
data transformation, 8, 87, 111–119, 120
aggregation, 112
Dostları ilə paylaş: |