Problem
Root Cause
Configuration Details
# of Traces Collected
Internal corporate
Unsupported
Opera under certain Oses
13 bad traces
sites fail to load
authentication
fails to perform NTLM
with Opera only.
with opera.
protocol.
authentication correctly.
Internal corporate
Wrong browser
The browser tries to
60 bad traces.
sites unreachable
configuration.
use the proxy to reach
by any browser.
internal sites.
Certain websites display
Certain corporate proxies are
e.g. Yelp.com had
error “Forbidden: You dont
blocked by these websites,
blocked some subset of
64 traces via
have permission to access
possibly due to excessive
proxies in our setting.
good proxies,
this server.” But accessing
requests. Accessing the
But accessing Yelp.com
and 64 via
them via different proxies
websites via these blocked
via other proxies worked.
blocked proxies.
loads the website fine.
proxies displays the error.
Certain websites
Flash or Ad blockers installed
e.g. Pandora.com
92 traces
silently fail to load;
in the browser prevent loading
silently fails to load
with flash
no error is displayed
some components, which are
in Firefox when Flashblock
blocking and 92
to the user.
critical for loading these sites.
1.5.13 add-on is enabled.
without blocking.
Websites with popular
IEs InPrivate Filtering, when
Components of the websites
10 good and
third-party scripts fail to
enabled, blocks loading of
built using scripts such
10 bad traces
load in IE, making these
third-party scripts that are
as Google Analytics, or
only with IE.
websites unusable.
commonly found in websites.
recaptcha.net fail to load.
Internal corporate sites
VBScript not supported
Sites used to manage internal
10 good and
fail to load except in IE8.
in all browsers except IE8.
information heavily use VBScript.
33 bad traces.
Some websites fail
Wrong HTTPS proxy
Pandora.com fails to load
55 bad traces.
to load silently.
server configuration
some flash component, which
fails the entire website.
None of the websites
Firewall on the client
Firewall blocks port 80.
64 good and
load in the browser.
is blocking web browsing.
64 bad traces.
None of the websites
Wrong proxy
Configured a wrong proxy
15 good and
load in the browser.
configuration.
server in the browser.
64 bad traces.
Some websites fail
User types in a wrong URL
Entered wrong URL
64 bad traces.
to load.
into the address bar.
into the browser.
All websites fail
Wrong DNS server
Manually configured wrong
52 good and
to load.
configuration.
DNS server address.
52 bad traces
in Win 7 only.
Table 2:
Browser trace details.
Figure 2 and Figure 3 show the root causes for our
email and browser datasets, and the signatures that
Deja vu and the classifier learned for each root cause
(depicted through arrows pointing out from the root
cause boxes). The labels on the side of each signature
give the serial number with which we refer to the sig-
nature in this section, followed by the number of BAD
traces that contributed to learning that signature. Note
that one root cause could have multiple signatures; for
instance, an incorrect proxy setting could lead to dis-
tinct signatures with different browsers. Likewise, mul-
tiple root causes could share the same signature; for
instance, the absence of SYNACKs could be because
of a wrong server address being used or a wrong port
number.
We do not expect the reader to parse each signature
in detail. The graphics are only meant to convey the
similarities and differences between the two signature
sets, which we touch on through specific examples in
Section 5.2.1. Similar parts of Deja vu’s signatures and
the corresponding classifier signatures are in bold. Note
that the browser signatures are more detailed (involving
a larger number of features) and also more diverse (one-
to-many mapping between root causes and signatures)
than the email signatures. This is so for both Deja vu
and the classifier, partly because the browser data set
contained data from 5 different browsers.
Equivalence: To measure how equivalent the sig-
natures learned by Deja vu and the classifier are, we
compute a difference metric between the signature sets.
The intuition behind this metric is that even if a Deja
vu signature and the corresponding classifier signature
look very different, these might still be equivalent de-
pending on how they categorize the traces. For every
pair of BAD traces in the training set, we check to see if
both traces share a Deja vu signature and, separately,
whether they share a classifier signature. If a pair of
traces shares a signature in both cases or does not in ei-
ther case, that means that both the Deja vu signatures
and the classifier signature are equivalent in terms of
how they categorize the two traces. However, if the sig-
nature is shared in one case but not in the other, there
is a mismatch and so we increment the difference met-
ric. Finally, we normalize the metric by total number
of faulty trace pairs.
For the email traces, there were 120 BAD traces, of
which we consider a total of 6329 trace pairs, since some
of the traces were deemed as noisy by either Deja vu or
the classifier, and did not contribute to a signature. Of
these 6329 pairs, only 189 differed in the sense we have
described above, yielding a normalized difference met-
ric of 3%. For the browser dataset, there were a total of
307 BAD traces, of which we consider 40186 pairs. Of
these, only 1806 differed, giving us a normalized differ-
ence metric of 4.5%. Thus, despite operating with just
coarse-grained GOOD/BAD labels, Deja vu is able to