Data Mining. Concepts and Techniques, 3rd Edition

HAN 22-ind-673-708-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	341/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 335 336 337 338 339 340 341 342 343

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 696

#24

696

Index

partitioning (Continued)

recursive, 335

tuples, 334

Partitioning Around Medoids (PAM) algorithm,

455–457

partitioning methods, 448, 451–457, 491

centroid-based, 451–454

global optimality, 449

iterative relocation techniques, 448

k-means, 451–454

k-medoids, 454–457

k-modes, 454

object-based, 454–457

See also cluster analysis

path-based similarity, 594

pattern analysis, in recommender systems,

282

pattern clustering, 308–310

pattern constraints, 297–300

pattern discovery, 601

pattern evaluation, 8

pattern evaluation measures, 267–271

all conﬁdence

, 268

comparison, 269–270

cosine, 268

Kulczynski, 268

max conﬁdence

, 268

null-invariant, 270–271

See also measures

pattern space pruning, 295

pattern-based classiﬁcation, 282, 318

pattern-based clustering, 282, 516

Pattern-Fusion, 302–307

characteristics, 304

core pattern, 304–305

initial pool, 306

iterative, 306

merging subpatterns, 306

shortcuts identiﬁcation, 304

See also colossal patterns

pattern-guided mining, 30

patterns

actionable, 22

co-location, 319

colossal, 301–307, 320

combined signiﬁcance, 312

constraint-based generation, 296–301

context modeling of, 314–315

core, 304–305

distance, 309

evaluation methods, 264–271

expected, 22

expressed, 309

frequent, 17

hidden meaning of, 314

interesting, 21–23, 33

metric space, 306–307

negative, 280, 291–294, 320

negatively correlated, 292, 293

rare, 280, 291–294, 320

redundancy between, 312

relative signiﬁcance, 312

representative, 309

search space, 303

strongly negatively correlated, 292

structural, 282

type speciﬁcation, 15–23

unexpected, 22

See also frequent patterns

pattern-trees, 264

Pearson’s correlation coefﬁcient, 222

percentiles, 48

perception-based classiﬁcation (PBC), 348

illustrated, 349

as interactive visual approach, 607

pixel-oriented approach, 348–349

split screen, 349

tree comparison, 350

phylogenetic trees, 590

pivot (rotate) operation, 148

pixel-oriented visualization, 57

planning and analysis tools, 153

point queries, 216, 217, 220

pool-based approach, 433

positive correlation, 55, 56

positive tuples, 364

positively skewed data, 47

possibility theory, 428

posterior probability, 351

postpruning, 344–345, 346

power law distribution, 592

precision measure, 368–369

predicate sets

frequent, 288–289

k, 289

predicates

repeated, 288

variables, 295

prediction, 19

classiﬁcation, 328

link, 593–594

loan payment, 608–609

with naive Bayesian classiﬁcation, 353–355

numeric, 328, 385

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 697

#25

Index

697

prediction cubes, 227–230, 235

example, 228–229

Probability-Based Ensemble, 229–230

predictive analysis, 18–19

predictive mining tasks, 15

predictive statistics, 24

predictors, 328

prepruning, 344, 346

prime relations

contrasting classes, 175, 177

deriving, 174

target classes, 175, 177

principle components analysis (PCA), 100, 102–103

application of, 103

correlation-based clustering with, 511

illustrated, 103

in lower-dimensional space extraction, 578

procedure, 102–103

prior probability, 351

privacy-preserving data mining, 33, 621, 626

distributed, 622

k-anonymity method, 621–622

l-diversity method, 622

as mining trend, 624–625

randomization methods, 621

results effectiveness, downgrading, 622

probabilistic clusters, 502–503

probabilistic hierarchical clustering, 467–470

agglomerative clustering framework, 467,

469

algorithm, 470

drawbacks of using, 469–470

generative model, 467–469

interpretability, 469

understanding, 469

See also hierarchical methods

probabilistic model-based clustering, 497–508, 538

expectation-maximization algorithm, 505–508

fuzzy clusters and, 499–501

product reviews example, 498

user search intent example, 498

See also cluster analysis

probability

estimation techniques, 355

posterior, 351

prior, 351

probability and statistical theory, 601

Probability-Based Ensemble (PBE), 229–230

PROCLUS, 511

proﬁles, 614

proximity measures, 67

for binary attributes, 70–72

for nominal attributes, 68–70

for ordinal attributes, 74–75

proximity-based methods, 552, 560–567, 581

density-based, 564–567

distance-based, 561–562

effectiveness, 552

example, 552

grid-based, 562–564

types of, 552, 560

See also outlier detection

pruning

cost complexity algorithm, 345

data space, 300–301

decision trees, 331, 344–347

in k-nearest neighbor classiﬁcation, 425

network, 406–407

pattern space, 295, 297–300

pessimistic, 345

postpruning, 344–345, 346

prepruning, 344, 346

rule, 363

search space, 263, 301

sets, 345

shared dimensions, 205

sub-itemset, 263

pyramid algorithm, 101

Q

quality control, 600

quantile plots, 51–52

quantile-quantile plots, 52

example, 53–54

illustrated, 53

See also graphic displays

quantitative association rules, 281, 283, 288,

320

clustering-based mining, 290–291

data cube-based mining, 289–290

exceptional behavior disclosure, 291

mining, 289

quartiles, 48

ﬁrst, 49

third, 49

queries, 10

intercuboid expansion, 223–225

intracuboid expansion, 221–223

language, 10

OLAP, 129, 130

point, 216, 217, 220

processing, 163–164, 218–227

range, 220

relational operations, 10

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 335 336 337 338 339 340 341 342 343