Data Mining. Concepts and Techniques, 3rd Edition

HAN 22-ind-673-708-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	337/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 333 334 335 336 337 338 339 340 ... 343

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 688

#16

688

Index

frequent patterns (Continued)

closed, 262–264, 280

concepts, 243–244

constraint-based, 281

dimensions, 281

diversity, 280

exploration, 313–319

growth, 257–259, 272

max, 262–264, 280

mining, 243–244, 279–325

mining constraints or criteria, 281

number of dimensions involved in, 281

semantic annotation of, 313–317

sequential, 243

strong associations, 437

structured, 243

trees, 257–259

types of values in, 281

frequent subgraphs, 591

front-end client layer, 132

full materialization, 159, 179, 234

fuzzy clustering, 499–501, 538

data set for, 506

with EM algorithm, 505–507

example, 500

expectation step (E-step), 505

ﬂexibility, 501

maximization step (M-step), 506–507

partition matrix, 499

as soft clusters, 501

fuzzy logic, 428

fuzzy sets, 428–429, 437, 499

evaluation, 500–501

example, 499

G

gain ratio, 340

C4.5 use of, 340

formula, 341

maximum, 341

gateways, 131

gene expression, 513–514

generalization

attribute, 169–170

attribute, control, 170

attribute, threshold control, 170

in multimedia data mining, 596

process, 172

results presentation, 174

synchronous, 175

generalized linear models, 599–600

generalized relations

attribute-oriented induction, 172

presentation of, 174

threshold control, 170

generative model, 467–469

genetic algorithms, 426–427, 437

genomes, 15

geodesic distance, 525–526, 539

diameter, 525

eccentricity, 525

measurements based on, 526

peripheral vertex, 525

radius, 525

geographic data warehouses, 595

geometric projection visualization, 58–60

Gini index, 341

binary enforcement, 332

binary indexes, 341

CART use of, 341

decision tree induction using,

342–343

minimum, 342

partitioning and, 342

global constants, for missing values, 88

global outliers, 545, 581

detection, 545

example, 545

Google

Flu Trends, 2

popularity of, 619–620

gradient descent strategy, 396–397

algorithms, 397

greedy hill-climbing, 397

as iterative, 396–397

graph and network data clustering, 497,

522–532, 539

applications, 523–525

bipartite graph, 523

challenges, 523–525, 530

cuts and clusters, 529–530

generic method, 530–531

geodesic distance, 525–526

methods, 528–532

similarity measures, 525–528

SimRank, 526–528

social network, 524–525

web search engines, 523–524

See also cluster analysis

graph cuts, 539

graph data, 14

graph index structures, 591

graph pattern mining, 591–592, 612–613

graphic displays

data presentation software, 44–45

histogram, 54, 55

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 689

#17

Index

689

quantile plot, 51–52

quantile-quantile plot, 52–54

scatter plot, 54–56

greedy hill-climbing, 397

greedy methods, attribute subset selection,

104–105

grid-based methods, 450, 479–483, 491

CLIQUE, 481–483

STING, 479–481

See also cluster analysis

grid-based outlier detection, 562–564

CELL method, 562, 563

cell properties, 562

cell pruning rules, 563

See also outlier detection

group-based support, 286

group-by

clause, 231

grouping attributes, 231

grouping variables, 231

Grubb’s test, 555

H

hamming distance, 431

hard constraints, 534, 539

example, 534

handling, 535–536

harmonic mean, 369

hash-based technique, 255

heterogeneous networks, 592

classiﬁcation of, 593

clustering of, 593

ranking of, 593

heterogeneous transfer learning, 436

hidden Markov model (HMM), 590, 591

hierarchical methods, 449, 457–470, 491

agglomerative, 459–461

algorithmic, 459, 461–462

Bayesian, 459

BIRCH, 458, 462–466

Chameleon, 458, 466–467

complete linkages, 462, 463

distance measures, 461–462

divisive, 459–461

drawbacks, 449

merge or split points and, 458

probabilistic, 459, 467–470

single linkages, 462, 463

See also cluster analysis

hierarchical visualization, 63

treemaps, 63, 65

Worlds-with-Worlds, 63, 64

high-dimensional data, 301

clustering, 447

data distribution of, 560

frequent pattern mining, 301–307

outlier detection in, 576–580, 582

row enumeration, 302

high-dimensional data clustering, 497, 508–522,

538, 553

biclustering, 512–519

dimensionality reduction methods, 510,

519–522

example, 508–509

problems, challenges, and methodologies,

508–510

subspace clustering methods, 509,

510–511

See also cluster analysis

HilOut algorithm, 577–578

histograms, 54, 106–108, 116

analysis by discretization, 115–116

attributes, 106

binning, 106

construction, 559

equal-frequency, 107

equal-width, 107

example, 54

illustrated, 55, 107

multidimensional, 108

as nonparametric model, 559

outlier detection using, 558–560

holdout method, 370, 386

holistic measures, 145

homogeneous networks, 592

classiﬁcation of, 593

clustering of, 593

Hopkins statistic, 484–485

horizontal data format, 259

hybrid OLAP (HOLAP), 164–165, 179

hybrid-dimensional association rules,

288

IBM Intelligent Miner, 603, 606

iceberg condition, 191

iceberg cubes, 160, 179, 190, 235

BUC construction, 201

computation, 160, 193–194, 319

computation and storage, 210–211

computation with Star-Cubing algorithm,

204–210

materialization, 319

speciﬁcation of, 190–191

See also data cubes

icon-based visualization, 60

Chernoff faces, 60–61

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 333 334 335 336 337 338 339 340 ... 343