Data Mining. Concepts and Techniques, 3rd Edition


HAN 22-ind-673-708-9780123814791



Yüklə 7,95 Mb.
Pdf görüntüsü
səhifə330/343
tarix08.10.2017
ölçüsü7,95 Mb.
#3817
1   ...   326   327   328   329   330   331   332   333   ...   343

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 674

#2

674

Index

applications (Continued)

targeted, 27–28

telecommunications industry, 611

Web search engines, 28

application-specific outlier detection, 548–549

approximate patterns, 281

mining, 307–312

Apriori algorithm, 248–253, 272

dynamic itemset counting, 256

efficiency, improving, 254–256

example, 250–252

hash-based technique, 255

join step, 249

partitioning, 255–256

prune step, 249–250

pseudocde, 253

sampling, 256

transaction reduction, 255

Apriori property, 194, 201, 249

antimonotonicity, 249

in Apriori algorithm, 298

Apriori pruning method, 194

arrays


3-D for dimensions, 196

sparse compression, 198–199

association analysis, 17–18

association rules, 245

approximate, 281

Boolean, 281

compressed, 281

confidence, 21, 245, 246, 416

constraint-based, 281

constraints, 296–297

correlation, 265, 272

discarded, 17

fittest, 426

frequent patterns and, 280

generation from frequent itemsets, 253, 254

hybrid-dimensional, 288

interdimensional, 288

intradimensional, 287

metarule-guided mining of, 295–296

minimum confidence threshold, 18, 245

minimum support threshold, 245

mining, 272

multidimensional, 17, 287–289, 320

multilevel, 281, 283–287, 320

near-match, 281

objective measures, 21

offspring, 426

quantitative, 281, 289, 320

redundancy-aware top-k, 281

single-dimensional, 17, 287

spatial, 595

strong, 264–265, 272

support, 21, 245, 246, 417

top-k, 281

types of values in, 281

associative classification, 415, 416–419, 437

CBA, 417

CMAR, 417–418

CPAR, 418–419

rule confidence, 416

rule support, 417

steps, 417

asymmetric binary dissimilarity, 71

asymmetric binary similarity, 71

attribute construction, 112

accuracy and, 105

multivariate splits, 344

attribute selection measures, 331, 336–344

CHAID, 343

gain ratio, 340–341

Gini index, 341–343

information gain, 336–340

Minimum Description Length (MDL),

343–344


multivariate splits, 343–344

attribute subset selection, 100, 103–105

decision tree induction, 105

forward selection/backward elimination

combination, 105

greedy methods, 104–105

stepwise backward elimination, 105

stepwise forward selection, 105

attribute vectors, 40, 328

attribute-oriented induction (AOI), 166–178, 180

algorithm, 173

for class comparisons, 175–178

for data characterization, 167–172

data generalization by, 166–178

generalized relation, 172

implementation of, 172–174

attributes, 9, 40

abstraction level differences, 99

behavioral, 546, 573

binary, 41–42, 79

Boolean, 41

categorical, 41

class label, 328

contextual, 546, 573

continuous, 44

correlated, 54–56

dimension correspondence, 10



HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 675

#3

Index

675

discrete, 44

generalization, 169–170

generalization control, 170

generalization threshold control, 170

grouping, 231

interval-scaled, 43, 79

of mixed type, 75–77

nominal, 41, 79

numeric, 43–44, 79

ordered, 103

ordinal, 41, 79

qualitative, 41

ratio-scaled, 43–44, 79

reducts of, 427

removal, 169

repetition, 346

set of, 118

splitting, 333

terminology for, 40

type determination, 41

types of, 39

unordered, 103

audio data mining, 604–607, 624

automatic classification, 445

AVA. See all-versus-all

AVC-group, 347

AVC-set, 347

average()

, 215


B

background knowledge, 30–31

backpropagation, 393, 398–408, 437

activation function, 402

algorithm illustration, 401

biases, 402, 404

case updating, 404

efficiency, 404

epoch updating, 404

error, 403

functioning of, 400–403

hidden layers, 399

input layers, 399

input propagation, 401–402

interpretability and, 406–408

learning, 400

learning rate, 403–404

logistic function, 402

multilayer feed-forward neural network,

398–399


network pruning, 406–407

neural network topology definition, 400

output layers, 399

sample learning calculations, 404–406

sensitivity analysis, 408

sigmoid function, 402

squashing function, 403

terminating conditions, 404

unknown tuple classification, 406

weights initialization, 401



See also classification

bagging, 379–380

algorithm illustration, 380

boosting versus, 381–382

in building random forests, 383

bar charts, 54

base cells, 189

base cuboids, 111, 137–138, 158

Basic Local Alignment Search Tool (BLAST), 591

Baum-Welch algorithm, 591

Bayes’ theorem, 350–351

Bayesian belief networks, 393–397, 436

algorithms, 396

components of, 394

conditional probability table (CPT),

394, 395


directed acyclic graph, 394–395

gradient descent strategy, 396–397

illustrated, 394

mechanisms, 394–396

problem modeling, 395–396

topology, 396

training, 396–397

See also classification

Bayesian classification

basis, 350

Bayes’ theorem, 350–351

class conditional independence, 350

naive, 351–355, 385

posterior probability, 351

prior probability, 351

BCubed precision metric, 488, 489

BCubed recall metric, 489

behavioral attributes, 546, 573

believability, data, 85

BI (business intelligence), 27

biases, 402, 404

biclustering, 512–519, 538

application examples, 512–515

enumeration methods, 517, 518–519

gene expression example, 513–514

methods, 517–518

optimization-based methods, 517–518

recommender system example, 514–515

types of, 538




Yüklə 7,95 Mb.

Dostları ilə paylaş:
1   ...   326   327   328   329   330   331   332   333   ...   343




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə