ArXiv Project is a repository of physics, math and computer science articles

with roughly 250,000 documents and a user community of

of over 40,000 researchers. We have done very preliminary experiments in

usage of Kleinberg's burst detection algorithm [1] applied to word

occurrences in arXiv titles, with tantalizing results, and wish to

extend this to an on-line navigational tool.
This project would involve slight refinement of the basic burst

algorithm to the textual case at hand, and extension to use in

conjunction with citation tree data. It will also involve

the development of visualization methods to would produce interactive

output for the web interface. The burst detection algorithm will be

used to compare a large number (tens of thousands) of time series in

parallel to identify clusters of scientific works in time that can be

assembled into a narrative description of progress in a field over

time, and to facilitate navigation of it. The intent is to identify

the most important temporal patterns, and implement visualization

methods that are fun, intuitive and informative.
Contact Person: Paul Houle (

Interested Faculty: Paul Ginsparg, Jon Kleinberg

Credit Hours: 3-6 (Negotiable)
