Data Mining. Concepts and Techniques, 3rd Edition


HAN 22-ind-673-708-9780123814791



Yüklə 7,95 Mb.
Pdf görüntüsü
səhifə342/343
tarix08.10.2017
ölçüsü7,95 Mb.
#3817
1   ...   335   336   337   338   339   340   341   342   343

HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 698

#26

698

Index

queries (Continued)

subcube, 216, 217–218

top-k, 225–227

query languages, 31

query models, 149–150

query-driven approach, 128

querying function, 433



R

rag bag criterion, 488

RainForest, 385

random forests, 382–383

random sampling, 370, 386

random subsampling, 370

random walk, 526

similarity based on, 527

randomization methods, 621

range, 48

interquartile, 49

range queries, 220

ranking

cubes, 225–227, 235



dimensions, 225

function, 225

heterogeneous networks, 593

rare patterns, 280, 283, 320

example, 291–292

mining, 291–294

ratio-scaled attributes, 43–44, 79

reachability density, 566

reachability distance, 565

recall measure, 368–369

recognition rate, 366–367

recommender systems, 282, 615

advantages, 616

biclustering for, 514–515

challenges, 617

collaborative, 610, 615, 616, 617, 618

content-based approach, 615, 616

data mining and, 615–618

error types, 617–618

frequent pattern mining for, 319

hybrid approaches, 618

intelligent query answering, 618

memory-based methods, 617

use scenarios, 616

recursive partitioning, 335

reduced support, 285, 286

redundancy

in data integration, 94

detection by correlations analysis, 94–98

redundancy-aware top-patterns, 281, 311, 320

extracting, 310–312

finding, 312

strategy comparison, 311–312

trade-offs, 312

refresh, in back-end tools/utilities, 134

regression, 19, 90

coefficients, 105–106

example, 19

linear, 90, 105–106

in statistical data mining, 599

regression analysis, 19, 328

in time-series data, 587–588

relational databases, 9

components of, 9

mining, 10

relational schema for, 10

relational OLAP (ROLAP), 132, 164, 165, 179

relative significance, 312

relevance analysis, 19

repetition, 346

replication, 347

illustrated, 346

representative patterns, 309

retail industry, 609–611

RIPPER, 359, 363

robustness, classification, 369

ROC curves, 374, 386

classification models, 377

classifier comparison with, 373–377

illustrated, 376, 377

plotting, 375

roll-up operation, 11, 146

rough set approach, 428–429, 437

row enumeration, 302

rule ordering, 357

rule pruning, 363

rule quality measures, 361–363

rule-based classification, 355–363, 386

IF-THEN rules, 355–357

rule extraction, 357–359

rule induction, 359–363

rule pruning, 363

rule quality measures, 361–363

rules for constraints, 294



S

sales campaign analysis, 610

samples, 218

cluster, 108–109

data, 219



HAN

22-ind-673-708-9780123814791

2011/6/1

3:27

Page 699

#27

Index

699

simple random, 108

stratified, 109–110

sampling


in Apriori efficiency, 256

as data redundancy technique, 108–110

methods, 108–110

oversampling, 384–385

random, 386

with replacement, 380–381

uncertainty, 433

undersampling, 384–385

sampling cubes, 218–220, 235

confidence interval, 219–220

framework, 219–220

query expansion with, 221

SAS Enterprise Miner, 603, 604

scalability

classification, 369

cluster analysis, 446

cluster methods, 445

data mining algorithms, 31

decision tree induction and, 347–348

dimensionality and, 577



k-means, 454

scalable computation, 319

SCAN. See Structural Clustering Algorithm for

Networks


core vertex, 531

illustrated, 532

scatter plots, 54

2-D data set visualization with, 59

3-D data set visualization with, 60

correlations between attributes, 54–56

illustrated, 55

matrix, 56, 59

schemas

integration, 94



snowflake, 140–141

star, 139–140

science applications, 611–613

search engines, 28

search space pruning, 263, 301

second guess heuristic, 369

selection dimensions, 225

self-training, 432

semantic annotations

applications, 317, 313, 320–321

with context modeling, 316

from DBLP data set, 316–317

effectiveness, 317

example, 314–315

of frequent patterns, 313–317

mutual information, 315–316

task definition, 315

Semantic Web, 597

semi-offline materialization, 226

semi-supervised classification, 432–433,

437

alternative approaches, 433



cotraining, 432–433

self-training, 432

semi-supervised learning, 25

outlier detection by, 572

semi-supervised outlier detection, 551

sensitivity analysis, 408

sensitivity measure, 367

sentiment classification, 434

sequence data analysis, 319

sequences, 586

alignment, 590

biological, 586, 590–591

classification of, 589–590

similarity searches, 587

symbolic, 586, 588–590

time-series, 586, 587–588

sequential covering algorithm, 359

general-to-specific search, 360

greedy search, 361

illustrated, 359

rule induction with, 359–361

sequential pattern mining, 589

constraint-based, 589

in symbolic sequences, 588–589

shapelets method, 590

shared dimensions, 204

pruning, 205

shared-sorts, 193

shared-partitions, 193

shell cubes, 160

shell fragments, 192, 235

approach, 211–212

computation algorithm, 212, 213

computation example, 214–215

precomputing, 210

shrinking diameter, 592

sigmoid function, 402

signature-based detection, 614

significance levels, 373

significance measure, 312

significance tests, 372–373, 386

silhouette coefficient, 489–490

similarity

asymmetric binary, 71

cosine, 77–78



Yüklə 7,95 Mb.

Dostları ilə paylaş:
1   ...   335   336   337   338   339   340   341   342   343




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə