488
INDEX
False negative, 88, 99, 227
False positive, 88, 99, 140, 227
Family of functions, 100
Fang, M., 238
Fayyad, U.M., 280
Feature, 266, 312–314
Feature selection, 446
Feature vector, 440, 480
Fetterly, D., 71
Fikes, A., 70
File, 23, 24, 209, 227
Filtering, 139
Fingerprint, 113
First-price auction, 293
Fixedpoint, 102, 192
Flajolet, P., 162
Flajolet-Martin Algorithm, 143, 395
Flow graph, 41
Fortunato, S., 402
Fotakis, D., 402
French, J.C., 280
Frequent bucket, 219, 221
Frequent itemset, 4, 202, 212, 214, 358,
439
Frequent pairs, 213
Frequent-items table, 214
Freund, Y., 484
Friends, 344
Friends relation, 52
Frieze, A.M., 129
Frobenius norm, 409, 423
Furnas, G.W., 436
Gaber, M.M., 18
Ganti, V., 129, 280
Garcia-Molina, H., 18, 200, 238, 280,
403
Garofalakis, M., 162
Gaussian elimination, 168
Gehrke, J., 162, 280
Generalization, 445
Generated subgraph, 357
Genre, 312, 324, 338
GFS, see Google file system
Ghemawat, S., 70, 71
Gibbons, P.B., 162, 403
Gionis, A., 129, 162
Girvan, M., 403
Girvan-Newman Algorithm, 351
Global minimum, 330
GN Algorithm, see Girvan-Newman Al-
gorithm
Gobioff, H., 71
Golub, G.H., 436
Google, 164, 175, 290
Google file system, 24
Google+, 344
Gradient descent, 17, 336, 373, 467
Granzow, M., 437
Graph, 45, 57, 343, 344, 380, 387
Greedy algorithm, 284, 285, 288, 292
GRGPF Algorithm, 266
Grouping, 26, 34, 37
Grouping attribute, 34
Groupon, 347
Gruber, R.E., 70
Guha, S., 280
Gunda, P.K., 71
Gyongi, Z., 200
Hadoop, 24, 71
Hadoop distributed file system, 24
Hamming distance, 67, 96, 104
Harris, M., 338
Harshman, R., 436
Hash function, 9, 79, 83, 88, 137, 140,
143
Hash key, 9, 300
Hash table, 9, 10, 12, 211, 218, 221,
222, 300, 302, 381
Haveliwala, T.H., 200
HDFS, see Hadoop distributed file sys-
tem
Head, 392
Heavy hitter, 381
Henzinger, M., 129
Hierarchical clustering, 243, 245, 263,
326, 349
Hinge loss, 466
HITS, 192
INDEX
489
Hive, 70, 71
Hopcroft, J.E., 393
Horn, H., 71
Howe, B., 70
Hsieh, W.C., 70
Hub, 192
Hyperlink-induced topic search, see HITS
Hyperplane, 461
Hyracks, 41
Identical documents, 118
Identity matrix, 407
IDF, see Inverse document frequency
Image, 133, 313, 314
IMDB, see Internet Movie Database
Imielinski, T., 238
Immediate subset, 230
Immorlica, N., 129
Important page, 164
Impression, 282
In-component, 169
Inaccessible page, 187
Independent rows or columns, 419
Index, 10, 381
Indyk, P., 129, 162
Initialize clusters, 255
Input, 57
Insertion, 95
Instance-based learning, 443
Interest, 206
Internet Movie Database, 312, 338
Interpolation, 476
Intersection, 33, 36, 40, 77
Into Thin Air, 311
Inverse document frequency, see TF.IDF,
8
Inverted index, 164, 282
Ioannidis, Y.E., 403
IP packet, 133
Isard, M., 71
Isolated component, 170
Item, 202, 204, 205, 308, 324, 325
Item profile, 312, 315
Itemset, 202, 210, 212
Jaccard distance, 92, 94, 100, 313, 479
Jaccard similarity, 74, 82, 92, 187
Jacobsen, H.-A., 70
Jagadish, H.V., 162
Jahrer, M., 341
Jeh, G., 403
Joachims, T., 484
Join, see Natural join, see Multiway
join, see Star join, 383
Join task, 43
K-means, 254
K-partite graph, 347
Kahan, W., 436
Kalyanasundaram, B., 306
Kamm, D., 338
Kang, U., 403
Kannan, R., 436
Karlin, A., 286
Kaushik, R., 129
Kautz, W.H., 162
Kernel function, 473, 477
Key component, 137
Key-value pair, 25–27
Keyword, 291, 319
Kleinberg, J.M., 200
Knuth, D.E., 18
Koren, Y., 341
Kosmix, 24
Krioukov, A., 71
Kumar, R., 18, 71, 200, 403
Kumar, V., 18
Label, 344, 440
Lagrangean multipliers, 51
Landauer, T.K., 436
Lang, K.J., 403
Laplacian matrix, 364
LCS, see Longest common subsequence
Leaf, 352
Learning-rate parameter, 448
Leiser, N, 71
Length, 146, 387
Length indexing, 119
Leskovec, J., 402–404
Leung, S.-T., 71
490
INDEX
Likelihood, 369
Lin, S., 129
Linden, G., 341
Linear equations, 168
Linear separability, 447, 451
Link, 33, 164, 178
Link matrix of the Web, 193
Link spam, 183, 187
Littlestone, N., 484
Livny, M., 280
Local minimum, 330
Locality, 344
Locality-sensitive family, 104
Locality-sensitive function, 99
Locality-sensitive hashing, 88, 99, 314,
479
Log likelihood, 374
Logarithm, 12
Long tail, 204, 308, 309
Longest common subsequence, 96
Lower bound, 61
LSH, see Locality-sensitive hashing
Machine learning, 2, 17, 318, 439
Maggioni, M., 436
Maghoul, F., 18, 200
Mahalanobis distance, 261
Mahoney, M.W., 403, 436
Main memory, 209, 210, 218, 243
Malewicz, G, 71
Malik, J., 403
Manber, U., 129
Manhattan distance, 93
Manning, C.P., 18
Many-many matching, 113
Many-many relationship, 57, 202
Many-one matching, 113
Map task, 25, 27
Map worker, 28, 29
Mapping schema, 58
MapReduce, 15, 21, 24, 30, 177, 179,
229, 275, 383, 390, 458
Margin, 461
Market basket, 4, 16, 201, 202, 209
Markov process, 167, 170, 377
Martin, G.N., 162
Master controller, 25, 26, 28
Master node, 24
Matching, 287
Matias, Y., 162
Matrix, 31, see Transition matrix of
the Web, see Stochastic ma-
trix, see Substochastic matrix,
177, 192, see Utility matrix,
328, see Adjacency matrix, see
Degree matrix, see Laplacian
matrix, see Symmetric matrix
Matrix multiplication, 38, 39, 62
Matrix of distances, 417
Matthew effect, 14
Maximal itemset, 212
Maximal matching, 287
Maximum-likelihood estimation, 369
McAuley, J., 404
Mean, see Average
Mechanical Turk, 446
Median, 144
Mehta, A., 306
Melnik, S., 403
Merging clusters, 246, 249, 260, 264,
269, 273
Merton, P., 18
Miller, G.L., 403
Minhashing, 81, 91, 94, 101, 314
Minicluster, 258
Minsky, M., 484
Minutiae, 113
Mirrokni, V.S., 129
Mirror page, 75
Mitzenmacher, M., 129
ML, see Machine learning
MLE, see Maximum-likelihood estima-
tion
Model, 369
Moments, 145
Monotonicity, 212
Montavon, G., 483
Moore-Penrose pseudoinverse, 429
Most-common elements, 157
Motwani, R., 129, 162, 238, 280
INDEX
491
Mueller, K.-R., 483
Multiclass classification, 440, 455
Multidimensional index, 478
Multihash Algorithm, 222
Multiplication, 31, see Matrix multi-
plication, 177, 192
Multiset, see Bag
Multistage Algorithm, 220
Multiway join, 49, 383
Mumick, I.S., 162
Mutation, 98
Name node, see Master node
Natural join, 34, 37, 38, 48
Naughton, J.F., 71
Navathe, S.B., 238
Near-neighbor search, see Locality-sens-
itive hashing
Nearest neighbor, 17, 444, 472, 481
Negative border, 230
Negative example, 447
Neighbor, 376
Neighborhood, 387, 395
Neighborhood profile, 387
Netflix challenge, 2, 310, 337
Network, see Social network
Neural net, 443
Newman, M.E.J., 403
Newspaper articles, 115, 301, 310
Non-Euclidean distance, 252, see Co-
sine distance, see Edit distance,
see Hamming distance, see Jac-
card distance
Non-Euclidean space, 266, 268
Norm, 93
Normal distribution, 257
Normalization, 321, 323, 334
Normalized cut, 363
NP-complete problem, 357
Numerical feature, 440, 480
O’Callaghan, L., 280
Off-line algorithm, 284
Olston, C., 71
Omiecinski, E., 238
On-line advertising, see Advertising
On-line algorithm, 16, 284
On-line learning, 445
On-line retailer, 204, 282, 308, 309
Open directory, 184, 446
OR-construction, 101
Orr, G.B., 483
Orthogonal vectors, 244, 410
Orthonormal matrix, 419, 424
Orthonormal vectors, 411, 414
Out-component, 169
Outlier, 243
Output, 57
Overfitting, 319, 336, 443, 444, 457,
481
Overlapping Communities, 369
Overture, 291
Own pages, 188
Paepcke, A., 129
Page, L., 163, 200
PageRank, 3, 16, 31, 32, 42, 163, 165,
177
Pairs, see Frequent pairs
Palmer, C.R., 403
Pan, J.-Y., 403
Papert, S., 484
Parent, 351
Park, J.S., 238
Partition, 361
Pass, 210, 213, 221, 226
Path, 387
Paulson, E., 71
PCA, see Principal-component analy-
sis
PCY Algorithm, 218, 221, 222
Pedersen, J., 200
Perceptron, 17, 439, 443, 447, 481
Perfect matching, 287
Permutation, 82, 87
PIG, 70
Pigeonhole principle, 357
Piotte, M., 341
Pivotal condensation, 407
Plagiarism, 75, 205
492
INDEX
Pnuts, 70
Point, 241, 271
Point assignment, 243, 254, 350
Polyzotis, A., 70
Position indexing, 121, 122
Positive example, 447
Positive integer, 156
Powell, A.L., 280
Power Iteration, 407
Power iteration, 408
Power law, 13
Predicate, 318
Prefix indexing, 119, 121, 122
Pregel, 45
Principal eigenvector, 167, 407
Principal-component analysis, 405, 412
Priority queue, 249
Priors, 371
Privacy, 284
Probe string, 121
Profile, see Item profile, see User pro-
file
Projection, 33, 36
Pruhs, K.R., 306
Pseudoinverse, see Moore-Penrose pseu-
doinverse
Puz, N., 70
Quadratic programming, 467
Query, 134, 153, 275
Query example, 473
R-tree, 280
Rack, 22
Radius, 251, 253, 387
Raghavan, P., 18, 200, 403
Rahm, E., 403
Rajagopalan, S., 18, 200, 403
Ramakrishnan, R., 70, 280
Ramsey, W., 305
Random hyperplanes, 105, 314
Random surfer, 164, 165, 170, 184, 376
Randomization, 226
Rank, 418
Rarest-first order, 301
Rastogi, R., 162, 280
Rating, 308, 311
Reachability, 389
Recommendation system, 16, 307
Recursion, 42
Recursive doubling, 391
Reduce task, 25, 27
Reduce worker, 28, 30
Reducer, 27
Reducer size, 54, 60
Reed, B., 71
Reflexive and transitive closure, 389
Regression, 440, 477, 481
Regularization parameter, 466
Reichsteiner, A., 437
Reina, C., 280
Relation, 33
Relational algebra, 32, 33
Replication, 24
Replication rate, 54, 61
Representation, 266
Representative point, 263
Representative sample, 137
Reservoir sampling, 162
Restart, 377
Retained set, 258
Revenue, 292
Ripple-carry adder, 156
RMSE, see Root-mean-square error
Robinson, E., 71
Rocha, L.M., 437
Root-mean-square error, 310, 329, 423
Rosa, M., 402
Rosenblatt, F., 484
Rounding data, 323
Row, see Tuple
Row-orthonormal matrix, 424
Rowsum, 266
Royalty, J., 71
S-curve, 89, 99
Saberi, A., 306
Salihoglu, S., 70
Sample, 226, 230, 233, 235, 255, 263,
267
INDEX
493
Sampling, 136, 150
Savasere, A., 238
SCC, see Strongly connected compo-
nent
Schapire, R.E., 484
Schema, 33
Schutze, H., 18
Score, 111
Search ad, 282
Search engine, 175, 191
Search query, 133, 164, 186, 282, 300
Second-price auction, 293
Secondary storage, see Disk
Selection, 33, 35
Sensor, 133
Sentiment analysis, 447
Set, 81, 118, see Itemset
Set difference, see Difference
Shankar, S., 71
Shawe-Taylor, J., 483
Shi, J., 403
Shim, K., 280
Shingle, 77, 91, 116
Shivakumar, N., 238
Shopping cart, 204
Shortest paths, 45
Siddharth, J., 129
Signature, 80, 83, 91
Signature matrix, 83, 88
Silberschatz, A., 162
Silberstein, A., 70
Similarity, 4, 15, 74, 201, 314, 322
Similarity join, 55, 61
Simrank, 376
Singleton, R.C., 162
Singular value, 419, 423, 424
Singular-value decomposition, 328, 405,
418, 428
Six degrees of separation, 389
Sketch, 106
Skew, 28
Sliding window, 134, 150, 157, 271
Smart transitive closure, 392
Smith, B., 341
SNAP, 402
Social Graph, 344
Social network, 17, 343, 344, 405
SON Algorithm, 228
Source, 386
Space, 92, 93, 241
Spam, see Term spam, see Link spam,
346, 445
Spam farm, 187, 190
Spam mass, 190, 191
Sparse matrix, 31, 81, 83, 177, 178, 308
Spectral partitioning, 361
Spider trap, 170, 173, 193
Splitting clusters, 269
SQL, 22, 33, 70
Squares, 385
Srikant, R., 238
Srivastava, U., 70, 71
Standard deviation, 259, 261
Standing query, 134
Stanford Network Analysis Platform,
see SNAP
Star join, 53
Stata, R., 18, 200
Statistical model, 1
Status, 301
Steinbach, M., 18
Stochastic gradient descent, 336, 471
Stochastic matrix, 167, 407
Stop clustering, 247, 251, 253
Stop words, 8, 79, 116, 205, 313
Stream, see Data stream
Strength of membership, 374
String, 118
Striping, 32, 177, 179
Strong edge, 346
Strongly connected component, 169, 393
Strongly connected graph, 167, 388
Substochastic matrix, 170
Suffix length, 123
Summarization, 3
Summation, 156
Sun, J., 437
Supercomputer, 21
Superimposed code, see Bloom filter,
161
494
INDEX
Supermarket, 204, 226
Superstep, 46
Supervised learning, 439, 441
Support, 202, 227, 228, 230, 232
Support vector, 462
Support-vector machine, 17, 439, 444,
461, 481
Supporting page, 188
Suri, S., 403
Surprise number, 146
SVD, see Singular-value decomposition
SVM, see Support-vector machine
Swami, A., 238
Symmetric matrix, 365, 406
Szegedy, M., 162
Tag, 314, 347
Tail, 392
Tail length, 143, 395
Tan, P.-N., 18
Target, 386
Target page, 188
Tarjan, R.E., 393
Task, 23
Taxation, 170, 173, 188, 193
Taylor expansion, 12
Taylor, M., 305
Telephone call, 346
Teleport set, 184, 185, 190, 377
Teleportation, 174
Tendril, 169
Term, 164
Term frequency, see TF.IDF, 8
Term spam, 164, 187
Test set, 444, 451
TF, see Term frequency
TF.IDF, 7, 8, 313, 443
Theobald, M., 129
Thrashing, 179, 218
Threshold, 89, 159, 202, 228, 232, 447,
453
TIA, see Total Information Awareness
Timestamp, 151, 272
Toivonen’s Algorithm, 230
Toivonen, H., 238
Tomkins, A., 18, 71, 200, 403
Tong, H., 403
Topic-sensitive PageRank, 183, 190
Toscher, A., 341
Total Information Awareness, 5
Touching the Void, 311
Training example, 440
Training rate, 451
Training set, 439, 440, 446, 456
Transaction, see Basket
Transition matrix, 377
Transition matrix of the Web, 166, 177,
178, 180, 405
Transitive closure, 43, 389
Transitive reduction, 393
Transpose, 193
Transposition, 98
Tree, 248, 266, 267, see Decision tree
Triangle, 380
Triangle inequality, 93
Triangular matrix, 211, 220
Tripartite graph, 347
Triples method, 211, 220
TrustRank, 190
Trustworthy page, 190
Tsourakakis, C.E., 403
Tube, 170
Tuple, 33
Tuzhilin, A., 340
Twitter, 17, 301, 344
Ullman, J.D., 18, 70, 71, 238, 280, 402
Undirected graph, see Graph
Union, 33, 36, 40, 77
Unit vector, 406, 411
Universal set, 118
Unsupervised learning, 439
User, 308, 324, 325
User profile, 316
Utility matrix, 308, 311, 328, 405
UV-decomposition, 328, 338, 405, 471
VA file, 478
Valduriez, P., 403
Van Loan, C.F., 436
INDEX
495
Vapnik, V.N., 484
Variable, 146
Vassilvitskii, S., 403
Vazirani, U., 306
Vazirani, V., 306
Vector, 31, 93, 97, 167, 177, 192, 193,
242
Vigna, S., 402
Vitter, J., 162
Volume (of a set of nodes), 363
von Ahn, L., 315, 341
von Luxburg, U., 403
Voronoi diagram, 473
Wall, M.E., 437
Wall-clock time, 49
Wallach, D.A., 70
Wang, J., 338
Wang, W., 129
Weak edge, 346
Weaver, D., 70
Web structure, 169
Weight, 447
Weiner, J., 18, 200
Whizbang Labs, 2
Widom, J., 18, 71, 162, 280, 403
Wikipedia, 346, 446
Window, see Sliding window, see De-
caying window
Windows, 12
Winnow Algorithm, 451
Word, 205, 242, 313
Word count, 26
Worker process, 28
Workflow, 41, 43, 47
Working store, 132
Xiao, C., 129
Xie, Y., 437
Yahoo, 291, 314
Yang, J., 403, 404
Yerneni, R., 70
York, J., 341
Yu, J.X., 129
Yu, P.S., 238
Yu, Y., 71
Zhang, H., 437
Zhang, T., 280
Zipf’s law, 15, see Power law
Zoeter, O., 305
Dostları ilə paylaş: |