Data Mining. Concepts and Techniques, 3rd Edition

HAN 22-ind-673-708-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	330/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 326 327 328 329 330 331 332 333 ... 343

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 674

#2

674

Index

applications (Continued)

targeted, 27–28

telecommunications industry, 611

Web search engines, 28

application-speciﬁc outlier detection, 548–549

approximate patterns, 281

mining, 307–312

Apriori algorithm, 248–253, 272

dynamic itemset counting, 256

efﬁciency, improving, 254–256

example, 250–252

hash-based technique, 255

join step, 249

partitioning, 255–256

prune step, 249–250

pseudocde, 253

sampling, 256

transaction reduction, 255

Apriori property, 194, 201, 249

antimonotonicity, 249

in Apriori algorithm, 298

Apriori pruning method, 194

arrays

3-D for dimensions, 196

sparse compression, 198–199

association analysis, 17–18

association rules, 245

approximate, 281

Boolean, 281

compressed, 281

conﬁdence, 21, 245, 246, 416

constraint-based, 281

constraints, 296–297

correlation, 265, 272

discarded, 17

ﬁttest, 426

frequent patterns and, 280

generation from frequent itemsets, 253, 254

hybrid-dimensional, 288

interdimensional, 288

intradimensional, 287

metarule-guided mining of, 295–296

minimum conﬁdence threshold, 18, 245

minimum support threshold, 245

mining, 272

multidimensional, 17, 287–289, 320

multilevel, 281, 283–287, 320

near-match, 281

objective measures, 21

offspring, 426

quantitative, 281, 289, 320

redundancy-aware top-k, 281

single-dimensional, 17, 287

spatial, 595

strong, 264–265, 272

support, 21, 245, 246, 417

top-k, 281

types of values in, 281

associative classiﬁcation, 415, 416–419, 437

CBA, 417

CMAR, 417–418

CPAR, 418–419

rule conﬁdence, 416

rule support, 417

steps, 417

asymmetric binary dissimilarity, 71

asymmetric binary similarity, 71

attribute construction, 112

accuracy and, 105

multivariate splits, 344

attribute selection measures, 331, 336–344

CHAID, 343

gain ratio, 340–341

Gini index, 341–343

information gain, 336–340

Minimum Description Length (MDL),

343–344

multivariate splits, 343–344

attribute subset selection, 100, 103–105

decision tree induction, 105

forward selection/backward elimination

combination, 105

greedy methods, 104–105

stepwise backward elimination, 105

stepwise forward selection, 105

attribute vectors, 40, 328

attribute-oriented induction (AOI), 166–178, 180

algorithm, 173

for class comparisons, 175–178

for data characterization, 167–172

data generalization by, 166–178

generalized relation, 172

implementation of, 172–174

attributes, 9, 40

abstraction level differences, 99

behavioral, 546, 573

binary, 41–42, 79

Boolean, 41

categorical, 41

class label, 328

contextual, 546, 573

continuous, 44

correlated, 54–56

dimension correspondence, 10

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 675

#3

Index

675

discrete, 44

generalization, 169–170

generalization control, 170

generalization threshold control, 170

grouping, 231

interval-scaled, 43, 79

of mixed type, 75–77

nominal, 41, 79

numeric, 43–44, 79

ordered, 103

ordinal, 41, 79

qualitative, 41

ratio-scaled, 43–44, 79

reducts of, 427

removal, 169

repetition, 346

set of, 118

splitting, 333

terminology for, 40

type determination, 41

types of, 39

unordered, 103

audio data mining, 604–607, 624

automatic classiﬁcation, 445

AVA. See all-versus-all

AVC-group, 347

AVC-set, 347

average()

, 215

background knowledge, 30–31

backpropagation, 393, 398–408, 437

activation function, 402

algorithm illustration, 401

biases, 402, 404

case updating, 404

efﬁciency, 404

epoch updating, 404

error, 403

functioning of, 400–403

hidden layers, 399

input layers, 399

input propagation, 401–402

interpretability and, 406–408

learning, 400

learning rate, 403–404

logistic function, 402

multilayer feed-forward neural network,

398–399

network pruning, 406–407

neural network topology deﬁnition, 400

output layers, 399

sample learning calculations, 404–406

sensitivity analysis, 408

sigmoid function, 402

squashing function, 403

terminating conditions, 404

unknown tuple classiﬁcation, 406

weights initialization, 401

See also classiﬁcation

bagging, 379–380

algorithm illustration, 380

boosting versus, 381–382

in building random forests, 383

bar charts, 54

base cells, 189

base cuboids, 111, 137–138, 158

Basic Local Alignment Search Tool (BLAST), 591

Baum-Welch algorithm, 591

Bayes’ theorem, 350–351

Bayesian belief networks, 393–397, 436

algorithms, 396

components of, 394

conditional probability table (CPT),

394, 395

directed acyclic graph, 394–395

gradient descent strategy, 396–397

illustrated, 394

mechanisms, 394–396

problem modeling, 395–396

topology, 396

training, 396–397

See also classiﬁcation

Bayesian classiﬁcation

basis, 350

Bayes’ theorem, 350–351

class conditional independence, 350

naive, 351–355, 385

posterior probability, 351

prior probability, 351

BCubed precision metric, 488, 489

BCubed recall metric, 489

behavioral attributes, 546, 573

believability, data, 85

BI (business intelligence), 27

biases, 402, 404

biclustering, 512–519, 538

application examples, 512–515

enumeration methods, 517, 518–519

gene expression example, 513–514

methods, 517–518

optimization-based methods, 517–518

recommender system example, 514–515

types of, 538

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 326 327 328 329 330 331 332 333 ... 343