HAN
03-toc-ix-xviii-9780123814791
2011/6/1
3:32
Page x
#2
x
Contents
1.6
Which Kinds of Applications Are Targeted?
27
1.6.1
Business Intelligence
27
1.6.2
Web Search Engines
28
1.7
Major Issues in Data Mining
29
1.7.1
Mining Methodology
29
1.7.2
User Interaction
30
1.7.3
Efficiency and Scalability
31
1.7.4
Diversity of Database Types
32
1.7.5
Data Mining and Society
32
1.8
Summary
33
1.9
Exercises
34
1.10
Bibliographic Notes
35
Chapter 2 Getting to Know Your Data
39
2.1
Data Objects and Attribute Types
40
2.1.1
What Is an Attribute?
40
2.1.2
Nominal Attributes
41
2.1.3
Binary Attributes
41
2.1.4
Ordinal Attributes
42
2.1.5
Numeric Attributes
43
2.1.6
Discrete versus Continuous Attributes
44
2.2
Basic Statistical Descriptions of Data
44
2.2.1
Measuring the Central Tendency: Mean, Median, and Mode
45
2.2.2
Measuring the Dispersion of Data: Range, Quartiles, Variance,
Standard Deviation, and Interquartile Range
48
2.2.3
Graphic Displays of Basic Statistical Descriptions of Data
51
2.3
Data Visualization
56
2.3.1
Pixel-Oriented Visualization Techniques
57
2.3.2
Geometric Projection Visualization Techniques
58
2.3.3
Icon-Based Visualization Techniques
60
2.3.4
Hierarchical Visualization Techniques
63
2.3.5
Visualizing Complex Data and Relations
64
2.4
Measuring Data Similarity and Dissimilarity
65
2.4.1
Data Matrix versus Dissimilarity Matrix
67
2.4.2
Proximity Measures for Nominal Attributes
68
2.4.3
Proximity Measures for Binary Attributes
70
2.4.4
Dissimilarity of Numeric Data: Minkowski Distance
72
2.4.5
Proximity Measures for Ordinal Attributes
74
2.4.6
Dissimilarity for Attributes of Mixed Types
75
2.4.7
Cosine Similarity
77
2.5
Summary__79'>Summary
79
2.6
Exercises
79
2.7
Bibliographic Notes
81
HAN
03-toc-ix-xviii-9780123814791
2011/6/1
3:32
Page xi
#3
Contents
xi
Chapter 3 Data Preprocessing
83
3.1
Data Preprocessing: An Overview
84
3.1.1
Data Quality: Why Preprocess the Data?
84
3.1.2
Major Tasks in Data Preprocessing
85
3.2
Data Cleaning
88
3.2.1
Missing Values
88
3.2.2
Noisy Data
89
3.2.3
Data Cleaning as a Process
91
3.3
Data Integration
93
3.3.1
Entity Identification Problem
94
3.3.2
Redundancy and Correlation Analysis
94
3.3.3
Tuple Duplication
98
3.3.4
Data Value Conflict Detection and Resolution
99
3.4
Data Reduction
99
3.4.1
Overview of Data Reduction Strategies
99
3.4.2
Wavelet Transforms
100
3.4.3
Principal Components Analysis
102
3.4.4
Attribute Subset Selection
103
3.4.5
Regression and Log-Linear Models: Parametric
Data Reduction
105
3.4.6
Histograms
106
3.4.7
Clustering
108
3.4.8
Sampling
108
3.4.9
Data Cube Aggregation
110
3.5
Data Transformation and Data Discretization
111
3.5.1
Data Transformation Strategies Overview
112
3.5.2
Data Transformation by Normalization
113
3.5.3
Discretization by Binning
115
3.5.4
Discretization by Histogram Analysis
115
3.5.5
Discretization by Cluster, Decision Tree, and Correlation
Analyses
116
3.5.6
Concept Hierarchy Generation for Nominal Data
117
3.6
Summary
120
3.7
Exercises
121
3.8
Bibliographic Notes
123
Chapter 4 Data Warehousing and Online Analytical Processing
125
4.1
Data Warehouse: Basic Concepts
125
4.1.1
What Is a Data Warehouse?
126
4.1.2
Differences between Operational Database Systems
and Data Warehouses
128
4.1.3
But, Why Have a Separate Data Warehouse?
129