Modeling and Optimizing Hypertextual Search Engines Based on the Reasearch of Larry Page and Sergey Brin

Yüklə 1,66 Mb.

Pdf görüntüsü

tarix	08.08.2018
ölçüsü	1,66 Mb.
	#61861

Modeling and Optimizing Hypertextual Search Engines

Based on the Reasearch of Larry Page and Sergey Brin

Michael E. Karpeles

Department of Computer Science

University of Vermont

mkarpeles@acm.org

April 15, 2009

mek (UVM)

Search Engines

April 15, 2009

1 / 57

Abstract Overview

As the volume of information available to the public increases exponentially,

it is crucial that data storage, management, classiﬁcation, ranking, and

reporting techniques improve as well.

The purpose of this paper is to discuss how search engines work and what

modiﬁcations can potentially be made to make the engines work more quickly

and accurately.

In order to do this, we must consider the properties of the search domain (the

web) diﬃcult to mine.

Finally, we want to ensure that our optimizations we induce will be scalable,

aﬀordable, maintainable, and reasonable to implement.

mek (UVM)

Search Engines

April 15, 2009

2 / 57

Background - Section I - Outline

Larry Page and Sergey Brin

Their Main Ideas

Networking Background

Mathematical Background

mek (UVM)

Search Engines

April 15, 2009

3 / 57

Larry Page and Sergey Brin

Larry Page was Google’s founding CEO

and grew the company to more than 200

employees and proﬁtability before

moving into his role as president of

products in April 2001.

Brin, a native of Moscow, received a

B.S. degree with honors in mathematics

and CS from the University of Maryland

at College Park. During his graduate

program at Stanford, Sergey met Larry

Page and worked on the project that

became Google.

Main Idea

This presentation is based on the research paper, ”The Anatomy of a Large-Scale

Hypertextual Web Search Engine” written by Larry Page and Sergey Brin at

Stanford University.

mek (UVM)

Search Engines

April 15, 2009

4 / 57

”The Anatomy of a Large-Scale Hypertextual Web Search

Engine”

The paper by Larry Page and Sergey Brin focuses mainly on:

Design Goals of the Google Search Engine

The Infrastructure of Search Engines

Crawling, Indexing, and Searching the Web

Link Analysis and the PageRank Algorithm

Results and Performance

Future Work

These topics will be mentioned in this presentation.

mek (UVM)

Search Engines

April 15, 2009

5 / 57

Networking Background

This presentation requires that you know the diﬀerence between the Web and

the Internet.

The notion of TCP and UDP connections, sockets, and ports.

How web servers are conﬁgured to be publicly viewable.

How CGI (The Common Gateway Interface) is used to transmit data via

POST and GET

I planned on leaving myself extra time so if someone doesn’t one of the areas

above, I encourage you to ask away!

mek (UVM)

Search Engines

April 15, 2009

6 / 57

Mathematical Background

The PageRank Algorithm requires previous knowledge of many key topics in

Linear Algebra, such as:

Matrix Addition and Subtraction

Eigenvectors and Eigenvalues

Power iterations

Dot Products and Cross Products

I will try to avoid linear algebra when possible but some of these components are

crucial to understanding PageRank.

mek (UVM)

Search Engines

April 15, 2009

7 / 57

Introduction - Section II - Outline

Terms and Deﬁnitions

Brief Search Engine History

How Search Engines Work

Search Engine Design Goals

mek (UVM)

Search Engines

April 15, 2009

8 / 57

Terms and Deﬁnitions

Hypertext

hypertext - is interactive text which is encoded with a destination address. When

the hypertext link is activate (usually by a mouse click, forward arrow, or enter

key) the user is redirected to the link’s speciﬁed destination.

Hypertextual Search Engine

A hypertextual search engine - is a system wherein inquiries for data are parsed

into sets of rules, criteria, and constraints, and then algorithmically applied to a

population of data in order to ultimately isolate data candidates (search results)

that best ﬁt the original search criteria. These results are then ’reported’ or

displayed back to the user in hypertextual format.

mek (UVM)

Search Engines

April 15, 2009

9 / 57

Terms and Deﬁnitions, Cont’d

SERP - Search Engine Results Page

SERP - is an acronym that stands for Search Engine Result Pages. This can either

relate to the physical look and feel of the results page (generated by the engine)

or the process in which search engine results are categorized.

SEO - Search Engine Optimization

SEO - stands for Search Engine Optimization. SEO can either relate to search

accuracy or search speed (eﬃciency).

mek (UVM)

Search Engines

April 15, 2009

10 / 57

Brief Search Engine History

In 1990, Alan Emtage, Bill Heelan, and Peter Deutsch developed an engine

called Archie for indexing FTP (File Transfer Protocol) archives (”The First

Search Engine, Archie”, Wei Li).

While Archie was a crucial ﬁrst step in creating a directory listing of FTP

sites, Archie did not index the contents of these sites. Archie was followed by

two search engines which followed the Gopher protocol (Port 70); Veronica

(1992) and Jughead (1993).

JumpStation (December 1993) used both a web crawling agent for searching

and indexing web pages. It provided a hypertextual web form as a interface

for queries. JumpStation is considered the, ”ﬁrst WWW resource-discovery

tool to combine the three essential features of a web search engine (crawling,

indexing, and searching)” (wikipedia)

In 1994, Brian Pinkerton (University of Washington) released WebCrawler;

the ﬁrst search engine that provided searching of entire document content.

mek (UVM)

Search Engines

April 15, 2009

11 / 57

How Search Engines Work

First the user inputs a query for data. he search is submitted to a back-end

server via the CGI (Common Gateway Interface) Protocol. Google uses GET,

as opposed to POST, for reasons that will be explained...

mek (UVM)

Search Engines

April 15, 2009

12 / 57

How Search Engines Work, Cont’d

The server uses regex (regular expressions) to parse the user’s inquiry for

data. The strings submitted can be permuted, and re-arranged to test for

spelling errors, and pages containing closely related content. (speciﬁcs on

google’s querying will be shown later)

The search engine searches it’s db for documents which closely relate to the

user’s input.

In order to generate meaningful results, the search engine utilizes a variety of

algorithms which work together to describe the relative importance of any

speciﬁc search result.

Finally, the engine returns results back to the user.

mek (UVM)

Search Engines

April 15, 2009

13 / 57

Google Query Evaluation

1. Query is parsed

2. Words are converted into wordIDs

3. Seek to the start of the doclist in the short barrel for every word.

4. Scan through the doclists until there is a document that matches all the

search terms.

5. Compute the rank of that document for the query.

6. If we are in the short barrels and at the end of any doclist, seek to the

start of the doclist in the full barrel for every word and go to step 4.

7. If we are not at the end of any doclist go to step 4.

8. Sort the documents that have matched by rank and return the top k.

mek (UVM)

Search Engines

April 15, 2009

14 / 57

Single Word Query Ranking

Hitlist is retrieved for single word

Each hit can be one of several types: title, anchor, URL, large font, small

font, etc.

Each hit type is assigned its own weight

Type-weights make up vector of weights

Number of hits of each type is counted to form count-weight vector

Dot product of type-weight and count-weight vectors is used to compute IR

score

IR score is combined with PageRank to compute ﬁnal rank

mek (UVM)

Search Engines

April 15, 2009

15 / 57

Multi-Word Query Ranking

Similar to single-word ranking except now must analyze proximity of words in

a document

Hits occurring closer together are weighted higher than those farther apart

Each proximity relation is classiﬁed into 1 of 10 bins ranging from a .phrase

match. to .not even close.

Each type and proximity pair has a type-prox weight

Counts converted into count-weights

Take dot product of count-weights and type-prox weights to computer for IR

score

mek (UVM)

Search Engines

April 15, 2009

16 / 57

Search Engine Design Goals

Scalability with web growth

Improved Search Quality

Decrease number of irrelevant results

Incorporate feedback systems to account for user approval

Too many pages for people to view – some heuristic must be used to rank

sites’ importance for the users.

Improved Search Speed

Even as the domain space rapidly increases

Take into consideration the types of documents hosted

mek (UVM)

Search Engines

April 15, 2009

17 / 57

Search Engine Infrastructure - Section III - Outline

Resolving and Web Crawling

Indexing and Searching

Google’s Infrastructural Model

mek (UVM)

Search Engines

April 15, 2009

18 / 57

URL Resolving and Web Crawling

Before a search engine can respond to user inquiries, it must ﬁrst generate a

database of URLs (or Uniform Resource Locators) which describe where web

servers (and their ﬁles) are located. URLs or web addresses are pieces of data that

specify the location of a ﬁle and the service that can be used to access it.

The URL Server’s job is to keep track of URL’s that have and need to be crawled.

In order to obtain a current mapping of web servers and their ﬁle trees, google’s

URL Server routinely invokes a series of web crawling agent called Googlebots.

Web users can also manually request for their URL’s to be added to Google’s

URLServer.

mek (UVM)

Search Engines

April 15, 2009

19 / 57

URL Resolving and Web Crawling

Web Crawlers: When a web page is ’crawled’ it has been eﬀectively downloaded.

Googlebots are Google’s web crawling agents/scripts (written in python) which

spawn hundreds of connections (approximately 300 parallel connections at once)

to diﬀerent well connected servers in order to, ”build a searchable index for

Google’s search engine” (wikipedia).

Brin and Page commented that DNS (Domain NameSpace) lookups were an

expensive. Gave crawling agents DNS caching abilities.

Googlebot is known as a well-behaved spider: sites avoid be crawled by adding

< metaname = ”Googlebot”content = ”nofollow ”/ > to the head of the doc (or

by adding a robots.txt ﬁle)

mek (UVM)

Search Engines

April 15, 2009

20 / 57

Indexing

Indexing the Web involves three main things:

Parsing – parsers must handle typos in meta-tags and missing data in

trasnactions. Instead of using YACC to generate a CFG (Context Free

Grammar) parser, Brin and Page used ﬂex to perform lexical analysis.

Indexing Documents into Barrels – After each document is parsed, every

word is assigned a wordID. These words and wordID pairs are used to

construct an in-memory hash table (the lexicon).

Barrelling Bottleneck – Indexing Phase is diﬃcult to make parallelized

because the lexicon needs to be shared. They solved this by writing a log of

14 million words that were not in their lexicon (removed need for shared

lexicon). This allowed multiple indexers to run in parallel. The log ﬁle of 14

million words is then indexed seperately.

Sorting – the sorter takes each of the forward barrels and sorts it by wordID

to produce an inverted barrel for title and anchor hits and a full text inverted

barrel. This process happens one barrel at a time, thus requiring little

temporary storage.

mek (UVM)

Search Engines

April 15, 2009

21 / 57

Searching

The article didn’t specify any speed eﬃciency issues with searching. Instead they

focused on making searches more accurate. During the time the paper was

written, Google queries returned 40,000 results. I only got approximately 39,000

results when I Google-seached their paper... But the results fetch only took 0.13

seconds. (Not bad!)

mek (UVM)

Search Engines

April 15, 2009

22 / 57

Google’s Infrastructure Overview

Google’s architecture includes 14 major components: an URL Server, multiple

Web Crawlers, a Store Server, a Hypertextual Document Repository, an Anchors

database, an URL Resolver, a Hypertextual Document Indexer, a Lexicon, multiple

short and long Barrels, a Sorter Service, a Searcher Service, and a PageRank

Service. These systems were implemented in C and C++ on Linux and Solaris

systems.

mek (UVM)

Search Engines

April 15, 2009

23 / 57

Infrastructure Part I

mek (UVM)

Search Engines

April 15, 2009

24 / 57

Infrastructure Part II

mek (UVM)

Search Engines

April 15, 2009

25 / 57

Infrastructure Part III

mek (UVM)

Search Engines

April 15, 2009

26 / 57

Search Engine Optimizations - Section IV - Outline

Signiﬁcance of SEO’s

Elementary Ranking Schemes

What Makes Ranking Optimization Hard?

mek (UVM)

Search Engines

April 15, 2009

27 / 57

The Signiﬁcance of SEO’s

Too many sites for humans to maintain ranking

Humans are biased – have diﬀerent ideas of what ”good” and ”bad” are.

With a search space as a large as the web, optimizing order of operations and

data structures have huge consequences.

Concise and well developed heuristics lead to more accurate and quicker

results

Diﬀerent methods and algorithms can be combined (see: RANK MERGING)

to increase overall eﬃciency.

mek (UVM)

Search Engines

April 15, 2009

28 / 57

Elementary SEO’s for Ranking

Word Frequency Analysis within Pages

Implicit Rating Systems - The search engine considers how many times a

page has been visited or how long a user has remained on a site.

Explicit Rating Systems - The search engine asks for your feedback after

visiting a site.

Most feedback systems have severe ﬂaws (but can be useful if implemented

correctly and used with other methods)

More sophisticated: Weighted Heuristic Page Analysis, Rank Merging, and

Manipulation Prevention Systems

mek (UVM)

Search Engines

April 15, 2009

29 / 57

What Makes Ranking Optimization Hard?

Link Spamming

Keyword Spamming

Page hijacking and URL redirection

Intentionally inaccurate or misleading anchor text

Accurately targeting people’s expectations

mek (UVM)

Search Engines

April 15, 2009

30 / 57

PageRank - Section V - Outline

Link Analysis and Anchors

Introduction to PageRank

Calculating Naive PR

Example

Calculating PR using Linear Algebra

Implementing PageRank

Problems with PR

Improving PageRank

mek (UVM)

Search Engines

April 15, 2009

31 / 57

Link Analysis and Anchors

Hypertextual Links are convenient to users and represent physical citations on

the Web.

Anchor Text Analysis:

< ahref = ”http : //www .google.com” >Anchor Text< /a >

Can be more accurate description of target site than target sites text itself

Can point at non-HTTP or non-text; such as images, videos, databases,

pdf’s, ps’s, etc.

Also, anchors make it possible for non-crawled pages to be discovered.

mek (UVM)

Search Engines

April 15, 2009

32 / 57

Introduction to PageRank

Rights belong to Google, patent belongs to Stanford University

Top 10 IEEE ICDM data mining algorithm

Algorithm used the rank the relative importance of pages within a network.

PageRank idea based on the elements of democrating voting and citations.

The PR Algorithm uses logarithmic scaling; the total PR of a network is 1.

mek (UVM)

Search Engines

April 15, 2009

33 / 57

Introduction to PageRank

Rights belong to Google, patent belongs to Stanford University PageRank is

a link analysis algorithm that ranks the relative importance of all web pages

within a network. It does this by looking at three web page features:

1. Outgoing Links - the number of links found in a page

2. Incoming Links - the number of times other pages have sited this page

3. Rank - A value representing the page’s relative importance in the network

mek (UVM)

Search Engines

April 15, 2009

34 / 57

Calculating Naive PageRank

The Page Rank Equation

PR(A) = (1 − d ) + d (

PR(T

)

C (T

)

+ ... +

PR(T

)

C (T

)

(1)

PR(A) = The PageRank of page A

C(A) or L(A) = the total number of outgoing links from page A

d = The damping factor. Induces randomness to prevent certain pages from

gaining too much rank. (1-d) ensures adds the values lost by multiplying by the

damping factor to ensure the sum of all web pages in the network is 1. The

damping factor also enforces a random surﬁng model which is comparable to

Markov Chains.

mek (UVM)

Search Engines

April 15, 2009

35 / 57

Calculating Naive PageRank, Cont’d

The PageRank of a page A, denoted PR(A), is decided by the quality and

quantity of sites linking or citing it. Every page T

that links to page A is

essentially casting a vote, deeming page A important. By doing this, T

propogates some of it’s PR to page A.

How can we determine how much rank rank an individual page T

gives to A?

may contain many links – not just a single link to page A.

must propogate it’s page rank equally to it’s citations. Thus, we only want to

give page A a fraction of the PR(T

The amount of PR that T

gives to A is be expressed as the damping value times

the PR(T

) divided by the total number of outgoing links from T

mek (UVM)

Search Engines

April 15, 2009

36 / 57

Naive Example

mek (UVM)

Search Engines

April 15, 2009

37 / 57

Calculating PageRank using Linear Algebra

Typically PageRank computation is done by ﬁnding the principal eigenvector of

the Markov chain transition matrix. The vector is solved using the iterative power

method. Above is a simple Naive PageRank setup which expresses the network as

a link matrix.

More examples can be found at:

http://www.ianrogers.net/google-page-rank/

http://www.webworkshop.net/pagerank.html

http://www.math.uwaterloo.ca/ hdesterc/websiteW/ [...]

Data/presentations/pres2008/ChileApr2008.pdf

mek (UVM)

Search Engines

April 15, 2009

38 / 57

Calculating PageRank using Linear Algebra, Cont’d

For those interested in the actual PageRank Calculation and Implementation

process (involving heavier linear algebra), I invite you to view my ”Additional

Resources” slide.

mek (UVM)

Search Engines

April 15, 2009

39 / 57

Implementing PageRank

Convert all page URLs into unique integers (for IDs)

Sort link structure by Parent ID

Dangling links removed from link db

Several strategies: they choose iterate until convergence

Assign initial values to nodes (based on assumed importance)

Run the model until it converges. To ensure accuracy, repeat process with

correct initialized weights.

mek (UVM)

Search Engines

April 15, 2009

40 / 57

Disadvantages and Problems

Rank Sinks: Occur when pages get in inﬁnite link cycles.

Spider Traps: A group of pages is a spider trap if there are no links from

within the group to outside the group.

Dangling Links: A page contains a dangling link if the hypertext points to a

page with no outgoing links.

Dead Ends: are simply pages with no outgoing links.

- Solution to all of the above: By introducing a damping factor, the ﬁgurative

random surfer stops trying to traverse the sunk page(s) and will either follow

a link randomly or teleport to a random node in the network.

mek (UVM)

Search Engines

April 15, 2009

41 / 57

Improving PageRank

Based on, ”The PageRank Citation Ranking: Bringing Order to the Web”,

choosing initial values for pages in a network will not aﬀect the ﬁnal values,

just the rate of convergence.

By choosing initial values wisely, the convergence of the algorithm will be

quicker.

Local Graph Partitioning using PageRank Vectors *complicated*

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04031383

Distributed PR Comp. based on Iterative Aggregation-Dissagregation

Methods (IAD).

Alternate Infrastructure to avoid costly data-seeks (lookups)

mek (UVM)

Search Engines

April 15, 2009

42 / 57

Competition and Future Research - Section VI - Outline

Jon Kleinberg’s HITs

Lempel and Moran’s SALSA

Ask.com’s ExpertRank

Microsoft’s BrowserRank

Yahoo!’s TrustRank (also from Stanford!)

mek (UVM)

Search Engines

April 15, 2009

43 / 57

Jon Kleinberg’s HITs Algorithm

Hyperlink-Induced Topic Search (HITS) (also known as Hubs and authorities) is a

link analysis algorithm that rates Web pages, developed by Jon Kleinberg. It

determines two values for a page: its authority, which estimates the value of the

content of the page, and its hub value, which estimates the value of its links to

other pages.

It is processed on a small subset of .relevant. documents, not all documents as

was the case with PageRank.

mek (UVM)

Search Engines

April 15, 2009

44 / 57

Teoma aka Ask.com’s ExpertRank

Teomo was unique because of its link popularity algorithm. Unlike Google’s

PageRank, Teoma’s technology (Subject-Speciﬁc Popularity) analyzed links in

context to rank a web page’s importance within its speciﬁc subject. For instance,

a web page about ’baseball’ would rank higher if other web pages about ’baseball’

link to it.

mek (UVM)

Search Engines

April 15, 2009

45 / 57

Lampel and Moran’s SALSA

SALSA, a new stochastic approach for link-structure analysis, which examines

random walks on graphs derived from the link-structure. Read more at -

http://portal.acm.org/citation.cfm?id=383041

mek (UVM)

Search Engines

April 15, 2009

46 / 57

Microsoft’s BrowseRank

BrowseRank is based on the time a user spends on a site.

mek (UVM)

Search Engines

April 15, 2009

47 / 57

Yahoo!’s TrustRank (also from Stanford!)

TrustRank method calls for selecting a small set of seed pages to be evaluated by

an expert. Once the reputable seed pages are manually identiﬁed, a crawl

extending outward from the seed set seeks out similarly reliable and trustworthy

pages. TrustRank’s reliability diminishes as documents become further removed

from the seed set.

mek (UVM)

Search Engines

April 15, 2009

48 / 57

Conclusion - Section VII - Outline

Experimental Results (Benchmarking)

The Future of PageRank and Other Applications

Exam Questions

Bibliography

mek (UVM)

Search Engines

April 15, 2009

49 / 57

Benchmarking Convergence

convergence of the Power Method is FAST! 322 million links converge almost

as quickly as 161 million.

Doubling the size has very little eﬀect on the convergence time.

mek (UVM)

Search Engines

April 15, 2009

50 / 57

Experimental Results

Data structures obviously highly optimized for space

Infrastructure setup for high parallelization.

mek (UVM)

Search Engines

April 15, 2009

51 / 57

Benchmarking of Advanced PageRank Implementations

Parallel PageRank with PageRank Iterations versus Jacobi, GMRES, BiCG,

and GiCGSTAB:

http://www.stanford.edu/ dgleich/projects/ppagerank/index.html

Compares convergence and time for diﬀerent implementations:

http://www.stanford.edu/ dgleich/publications/

gleich-zhukov-sc-pprank-abstract.pdf

http://www.cs.cmu.edu/ yangboz/cikm05 pagerank.ppt

mek (UVM)

Search Engines

April 15, 2009

52 / 57

Future of PR

A version of PageRank has recently been proposed as a replacement for the

traditional Institute for Scientiﬁc Information (ISI) impact factor,[13] and

implemented at eigenfactor.org. Instead of merely counting total citation to a

journal, the ”importance” of each citation is determined in a PageRank

fashion.

Because PageRank roughly corresponds to a random web surver model, PR

can be used to estimate web traﬃc.

mek (UVM)

Search Engines

April 15, 2009

53 / 57

Final Exam Questions

(1) Please state the PageRank formula and describe it’s components

The Page Rank Equation

PR(A) = (1 − d ) + d (

PR(T

1

)

C (T

)

+ ... +

PR(T

)

C (T

)

(2)

PR(A) = The PageRank of page A

C(A) or L(A) = the total number of outgoing links from page A

d = The damping factor.

mek (UVM)

Search Engines

April 15, 2009

54 / 57

Final Exam Questions

(2) Given a network of ﬁve node:rank pairs [a:.2,b:.3,c:.1,d:.3,e:1], explain

how to generate the corresponding a Naive PR Link Matrix. Assume pages

can link to themselves (there are not zeros along the diaganol)

The answer will be a 5 by 5 matrix where the labels for the rows and columns are

the pages in the network. The ﬁrst element of every row belongs to the ﬁrst page,

..., and the ﬁfth and ﬁnal element in every row belongs to the the ﬁfth page.

Likewise the ﬁrst element in every column belongs to the ﬁrst page and the ﬁfth

element of every column belongs to the ﬁfth page. Each item in the matrix is thus

correlated with two pages – it’s row represents one page and it’s column

represents another. In this model, the value within any entry represents the

fraction of rank the column page provides the row page. To ﬁll in any particular

matrix entry, ﬁnd the node in the graph associated with the entry’s column and

count it’s number of outgoing links. The entry’s value is the column page’s rank

over this outgoing link count.

mek (UVM)

Search Engines

April 15, 2009

55 / 57

Final Exam Questions

(3) Explain the signiﬁance of the damping factor. What values are it

between? What problems does it solve? What factors does it account for?

The damping factor induces randomness to prevent certain pages from gaining too

much rank. The value (1-d) from the PR equation ensures that the values lost by

multiplying by the damping factor are added back to the system (such that the

network maintains an accumulative PR of 1). The damping factor also enforces a

random surﬁng model which is comparable to Markov Chains. The damping factor

handles dead ends and inﬁnit loops by oﬀering a probability of a random teleport

or link traversal.

mek (UVM)

Search Engines

April 15, 2009

56 / 57

Questions?

mek (UVM)

Search Engines

April 15, 2009

57 / 57

Additional Resources

http://cis.poly.edu/suel/papers/pagerank.pdf - PR via The SplitAccumulate

Algorithm, Merge-Sort, etc.

http://nlp.stanford.edu/ manning/papers/PowerExtrapolation.pdf -PR via

Power Extrapolation: includes benchmarking

http://www.webworkshop.net/pagerank calculator.php - neat little tool for

PR calculation with a matrix

http://www.miislita.com/information-retrieval-tutorial/ [...]

matrix-tutorial-3-eigenvalues-eigenvectors.html

mek (UVM)

Search Engines

April 15, 2009

58 / 57

Bibliography

http://www.math.uwaterloo.ca/ hdesterc/

websiteW/Data/presentations/pres2008/ChileApr2008.pdf

Infrastructure Diagram and explanations from last year’s slides

Google Query Steps from last year’s slides

http://portal.acm.org/citation.cfm?id=1099705

http://www.springerlink.com/content/

60u6j88743wr5460/fulltext.pdf?page=1

http://www.ianrogers.net/google-page-rank/

http://www.seobook.com/microsoft-search- browserank-research-reviewed

http://www.webworkshop.net/pagerank.html

http://en.wikipedia.org/wiki/PageRank

http://pr.efactory.de/e-pagerank-distribution.shtml

http://www.cs.helsinki.ﬁ/u/linden/teaching/irr06/

drafts/petteri huuhka google draft.pdf

http://www-db.stanford.edu/ backrub/pageranksub.ps

mek (UVM)

Search Engines

April 15, 2009

59 / 57

Yüklə 1,66 Mb.

Dostları ilə paylaş: