Data Mining. Concepts and Techniques, 3rd Edition

HAN 22-ind-673-708-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	343/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 335 336 337 338 339 340 341 342 343

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 700

#28

700

Index

similarity (Continued)

measuring, 65–78, 79

nominal attributes, 70

similarity measures, 447–448, 525–528

constraints on, 533

geodesic distance, 525–526

SimRank, 526–528

similarity searches, 587

in information networks, 594

in multimedia data mining, 596

simple random sample with replacement

(SRSWR), 108

simple random sample without replacement

(SRSWOR), 108

SimRank, 526–528, 539

computation, 527–528

random walk, 526–528

structural context, 528

simultaneous aggregation, 195

single-dimensional association rules, 17, 287

single-linkage algorithm, 460, 461

singular value decomposition (SVD), 587

skewed data

balanced, 271

negatively, 47

positively, 47

wavelet transforms on, 102

slice operation, 148

small-world phenomenon, 592

smoothing, 112

by bin boundaries, 89

by bin means, 89

by bin medians, 89

for data discretization, 90

snowﬂake schema, 140

example, 141

illustrated, 141

star schema versus, 140

social networks, 524–525, 526–528

densiﬁcation power law, 592

evolution of, 594

mining, 623

small-world phenomenon, 592

See also networks

social science/social studies data mining,

613

soft clustering, 501

soft constraints, 534, 539

example, 534

handling, 536–537

space-ﬁlling curve, 58

sparse data, 102

sparse data cubes, 190

sparsest cuts, 539

sparsity coefﬁcient, 579

spatial data, 14

spatial data mining, 595

spatiotemporal data analysis, 319

spatiotemporal data mining, 595, 623–624

specialized SQL servers, 165

speciﬁcity measure, 367

spectral clustering, 520–522, 539

effectiveness, 522

framework, 521

steps, 520–522

speech recognition, 430

speed, classiﬁcation, 369

spiral method, 152

split-point, 333, 340, 342

splitting attributes, 333

splitting criterion, 333, 342

splitting rules. See attribute selection measures

splitting subset, 333

SQL, as relational query language, 10

square-error function, 454

squashing function, 403

standard deviation, 51

example, 51

function of, 50

star schema, 139

example, 139–140

illustrated, 140

snowﬂake schema versus, 140

Star-Cubing, 204–210, 235

algorithm illustration, 209

bottom-up computation, 205

example, 207

for full cube computation, 210

ordering of dimensions and, 210

performance, 210

shared dimensions, 204–205

starnet query model, 149

example, 149–150

star-nodes, 205

star-trees, 205

compressed base table, 207

construction, 205

statistical data mining, 598–600

analysis of variance, 600

discriminant analysis, 600

factor analysis, 600

generalized linear models, 599–600

mixed-effect models, 600

quality control, 600

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 701

#29

Index

701

regression, 599

survival analysis, 600

statistical databases (SDBs), 148

OLAP systems versus, 148–149

statistical descriptions, 24, 79

graphic displays, 44–45, 51–56

measuring the dispersion, 48–51

statistical hypothesis test, 24

statistical models, 23–24

of networks, 592–594

statistical outlier detection methods, 552, 553–560,

581

computational cost of, 560

for data analysis, 625

effectiveness, 552

example, 552

nonparametric, 553, 558–560

parametric, 553–558

See also outlier detection

statistical theory, in exceptional behavior disclosure,

291

statistics, 23

inferential, 24

predictive, 24

StatSoft, 602, 603

stepwise backward elimination, 105

stepwise forward selection, 105

stick ﬁgure visualization, 61–63

STING, 479–481

advantages, 480–481

as density-based clustering method, 480

hierarchical structure, 479, 480

multiresolution approach, 481

See also cluster analysis; grid-based methods

stratiﬁed cross-validation, 371

stratiﬁed samples, 109–110

stream data, 598, 624

strong association rules, 272

interestingness and, 264–265

misleading, 265

Structural Clustering Algorithm for Networks

(SCAN), 531–532

structural context-based similarity, 526

structural data analysis, 319

structural patterns, 282

structure similarity search, 592

structures

as contexts, 575

discovery of, 318

indexing, 319

substructures, 243

Student’s t-test, 372

subcube queries, 216, 217–218

sub-itemset pruning, 263

subjective interestingness measures, 22

subject-oriented data warehouses, 126

subsequence, 589

matching, 587

subset checking, 263–264

subset testing, 250

subspace clustering, 448

frequent patterns for, 318–319

subspace clustering methods, 509, 510–511,

538

biclustering, 511

correlation-based, 511

examples, 538

subspace search methods, 510–511

subspaces

bottom-up search, 510–511

cube space, 228–229

outliers in, 578–579

top-down search, 511

substitution matrices, 590

substructures, 243

sum of the squared error (SSE), 501

summary fact tables, 165

superset checking, 263

supervised learning, 24, 330

supervised outlier detection, 549–550

challenges, 550

support, 21

association rule, 21

group-based, 286

reduced, 285, 286

uniform, 285–286

support, rule, 245, 246

support vector machines (SVMs), 393, 408–415,

437

interest in, 408

maximum marginal hyperplane, 409, 412

nonlinear, 413–415

for numeric prediction, 408

with sigmoid kernel, 415

support vectors, 411

for test tuples, 412–413

training/testing speed improvement, 415

support vectors, 411, 437

illustrated, 411

SVM ﬁnding, 412

supremum distance, 73–74

surface web, 597

survival analysis, 600

SVMs. See support vector machines

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 702

#30

702

Index

symbolic sequences, 586, 588

applications, 589

sequential pattern mining in, 588–589

symmetric binary dissimilarity, 70

synchronous generalization, 175

tables, 9

attributes, 9

contingency, 95

dimension, 136

fact, 165

tuples, 9

Document Outline

Front Cover
Data Mining: Concepts and Techniques
Copyright
Dedication
Table of Contents
Foreword
Foreword to Second Edition
Preface
Acknowledgments
About the Authors
Chapter 1. Introduction
- 1.1 Why Data Mining?
- 1.2 What Is Data Mining?
- 1.3 What Kinds of Data Can Be Mined?
- 1.4 What Kinds of Patterns Can Be Mined?
- 1.5 Which Technologies Are Used?
- 1.6 Which Kinds of Applications Are Targeted?
- 1.7 Major Issues in Data Mining
- 1.8 Summary
- 1.9 Exercises
- 1.10 Bibliographic Notes
Chapter 2. Getting to Know Your Data
- 2.1 Data Objects and Attribute Types
- 2.2 Basic Statistical Descriptions of Data
- 2.3 Data Visualization
- 2.4 Measuring Data Similarity and Dissimilarity
- 2.5 Summary
- 2.6 Exercises
- 2.7 Bibliographic Notes
Chapter 3. Data Preprocessing
- 3.1 Data Preprocessing: An Overview
- 3.2 Data Cleaning
- 3.3 Data Integration
- 3.4 Data Reduction
- 3.5 Data Transformation and Data Discretization
- 3.6 Summary
- 3.7 Exercises
- 3.8 Bibliographic Notes
Chapter 4. Data Warehousing and Online Analytical Processing
- 4.1 Data Warehouse: Basic Concepts
- 4.2 Data Warehouse Modeling: Data Cube and OLAP
- 4.3 Data Warehouse Design and Usage
- 4.4 Data Warehouse Implementation
- 4.5 Data Generalization by Attribute-Oriented Induction
- 4.6 Summary
- 4.7 Exercises
- 4.8 Bibliographic Notes
Chapter 5. Data Cube Technology
- 5.1 Data Cube Computation: Preliminary Concepts
- 5.2 Data Cube Computation Methods
- 5.3 Processing Advanced Kinds of Queries by Exploring Cube Technology
- 5.4 Multidimensional Data Analysis in Cube Space
- 5.5 Summary
- 5.6 Exercises
- 5.7 Bibliographic Notes
Chapter 6. Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
- 6.1 Basic Concepts
- 6.2 Frequent Itemset Mining Methods
- 6.3 Which Patterns Are Interesting?—Pattern Evaluation Methods
- 6.4 Summary
- 6.5 Exercises
- 6.6 Bibliographic Notes
Chapter 7. Advanced Pattern Mining
- 7.1 Pattern Mining: A Road Map
- 7.2 Pattern Mining in Multilevel, Multidimensional Space
- 7.3 Constraint-Based Frequent Pattern Mining
- 7.4 Mining High-Dimensional Data and Colossal Patterns
- 7.5 Mining Compressed or Approximate Patterns
- 7.6 Pattern Exploration and Application
- 7.7 Summary
- 7.8 Exercises
- 7.9 Bibliographic Notes
Chapter 8. Classification: Basic Concepts
- 8.1 Basic Concepts
- 8.2 Decision Tree Induction
- 8.3 Bayes Classification Methods
- 8.4 Rule-Based Classification
- 8.5 Model Evaluation and Selection
- 8.6 Techniques to Improve Classification Accuracy
- 8.7 Summary
- 8.8 Exercises
- 8.9 Bibliographic Notes
Chapter 9. Classification: Advanced Methods
- 9.1 Bayesian Belief Networks
- 9.2 Classification by Backpropagation
- 9.3 Support Vector Machines
- 9.4 Classification Using Frequent Patterns
- 9.5 Lazy Learners (or Learning from Your Neighbors)
- 9.6 Other Classification Methods
- 9.7 Additional Topics Regarding Classification
- 9.8 Summary
- 9.9 Exercises
- 9.10 Bibliographic Notes
Chapter 10. Cluster Analysis: Basic Concepts and Methods
- 10.1 Cluster Analysis
- 10.2 Partitioning Methods
- 10.3 Hierarchical Methods
- 10.4 Density-Based Methods
- 10.5 Grid-Based Methods
- 10.6 Evaluation of Clustering
- 10.7 Summary
- 10.8 Exercises
- 10.9 Bibliographic Notes
Chapter 11. Advanced Cluster Analysis
- 11.1 Probabilistic Model-Based Clustering
- 11.2 Clustering High-Dimensional Data
- 11.3 Clustering Graph and Network Data
- 11.4 Clustering with Constraints
- 11.5 Summary
- 11.6 Exercises
- 11.7 Bibliographic Notes
Chapter 12. Outlier Detection
- 12.1 Outliers and Outlier Analysis
- 12.2 Outlier Detection Methods
- 12.3 Statistical Approaches
- 12.4 Proximity-Based Approaches
- 12.5 Clustering-Based Approaches
- 12.6 Classification-Based Approaches
- 12.7 Mining Contextual and Collective Outliers
- 12.8 Outlier Detection in High-Dimensional Data
- 12.9 Summary
- 12.10 Exercises
- 12.11 Bibliographic Notes
Chapter 13. Data Mining Trends and Research Frontiers
- 13.1 Mining Complex Data Types
- 13.2 Other Methodologies of Data Mining
- 13.3 Data Mining Applications
- 13.4 Data Mining and Society
- 13.5 Data Mining Trends
- 13.6 Summary
- 13.7 Exercises
- 13.8 Bibliographic Notes
Bibliography
Index

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 335 336 337 338 339 340 341 342 343