The algorithm therefore splits the first BAD category
from Iteration 3 into 2 sub-categories: the first has no
IP traffic, and the second has IP traffic, but no suc-
cessful DNS exchange. With these two sub-categories,
we repeat the procedure we described in Iteration 3 of
splitting the BAD traces into two sets, removing the
discerning features from them, removing the same fea-
tures from all GOOD traces, and inputting this new
data to C4.5.
Both these runs of C4.5 yield trivial, one-node trees
with no branches, which is the stopping condition for
the Deja vu algorithm. Also, the second run of C4.5 in
iteration 4 (corresponding to the second BAD category
obtained in Iteration 3) yields a trivial, one-node tree.
Hence, the signature tree is complete.
Having completed the process of growing the signa-
ture tree, we now prune the signatures to make them
more concise. Note that this step is not necessary for
correctness. However, in general, several features in sig-
nature tree constructed above could be redundant. For
instance, in the example in Figure 1, the inclusion of
NO HTTP304 RESPONSE immediately following NO
HTTP200 RESPONSE is redundant since it does not
help sub-classify BAD. Hence, we remove redundant
features such as HTTP304 RESPONSE from all signa-
tures.
Our final list of problem signatures corresponding to
Figure 1 is:
1. NO HTTP200 RESPONSE AND TCPSYN80 →
TCPSYNACK80 AND HTTP502 RESPONSE
2. NO HTTP200 RESPONSE AND
NO (TCP-
SYN80
→
TCPSYNACK80)
AND
NO
IP TRAFFIC
3. NO
HTTP200 RESPONSE
AND
NO
(TCPSYN80
→
TCPSYNACK80)
AND
IP TRAFFIC
AND
NO
(DNS QUERY
→
DNS SUCCESS RESPONSE)
These problem signatures are such that every bad
trace matches only one signature. This characteristic
is important in that it helps disambiguate between the
seemingly large number of error conditions that occur
in real-world applications.
If, instead of crafting these signatures through the
Deja vu algorithm, we use a simpler, rule-based problem
signature matching scheme which included the following
rules among others for example:
1. HTTP502 RESPONSE
2. HTTP401 RESPONSE
3. NO IP BACKGROUND TRAFFIC
we might not get the level of discernment that Deja vu
gives us. For instance, a web browser trace could con-
tain an HTTP 401 error (Unauthorized), at which point
the browser asks the user for a password. Once the user
enters the password, the connection succeeds. This case
demonstrates that just the presence of an HTTP 401 er-
ror does not necessarily indicate a problem. Our evalu-
ation in Section 5.2.1 describes instances where Deja vu
captured accurate problem signatures that a basic rule-
based engine would not have captured. In fact, even a
classifier-based approach with more fine-grained prob-
lem labels was unable to learn some problem signatures
that Deja vu learned.
4.2
Summary of Algorithm Steps
We summarize the main steps of the Deja vu algo-
rithm in the following 4 steps.
• Step 1: Extract all feature sets from the network
traces, and assigns a label – either GOOD or BAD
– to each feature set.
• Step 2: Input the labeled feature sets to the C4.5
algorithm, which yields us a decision tree.
• Step 3: If C4.5 does not give us a tree, stop the
algorithm. Else, find all BAD leaf nodes in the
tree.
• Step 4: For each BAD leaf, find all the features
and their values in the path from root to the leaf.
For all BAD feature sets that have these features
and the same values, and for all GOOD feature
sets, we remove these features. With these reduced
feature sets, we start from Step 2 again.
5.
EVALUATION
Our evaluation focuses on the effectiveness of the
signatures learned by Deja vu along two dimensions:
(a) how Deja vu’s signatures, learned just using the
coarse-grained GOOD/BAD labels, compare with those
learned by a classifier that has the benefit of fine-
grained, problem-specific labels, and (b) how effective
Deja vu is in categorizing data in a test set and learning
new signatures.
5.1
Data Collection
Since obtaining network traces from the field, to-
gether with ground truth information on the presence
and nature of failures, is challenging, we have evaluated
Deja vu by recreating real network problems in two dif-
ferent live environments – a corporate network and a
university network. This fault injection based strategy
is similar to evaluation of algorithms in previous net-
work diagnostics research [2, 6]. The failures that we
recreated (described in the following) were selected by
browser popular user forums and discussing with net-
work administrators.
We have evaluated Deja vu with web browsers run-
ning in the corporate network, and with email clients
running in a university network. We chose web browsers
and email clients as applications to evaluate since these
represent significant applications in the today’s enter-
prises. For instance, web browsers are used not just to
access the public web but also to access myriad intranet
services (e.g. HR, payroll).
For each application, we recreated a mix of problem
scenarios ranging from obvious and well-known prob-
lems to more subtle issues, and collected network traces
of these scenarios. We manually injected the failures by
either misconfiguring the applications, operating sys-
tems, or network components. For each application run,
we recorded a network packet trace and labeled it as
GOOD or BAD, depending on whether the application
run was successful or not. Note that for a number of
these problems, the root cause is not obvious to the user
just from the message that the application provides, jus-
tifying the coarse-grained labeling of GOOD and BAD.
In addition, to enable comparison with a classifier, we
recorded fine-grained labels indicating the root cause
of each failure. Note that these labels were not made
available to Deja vu.
Next, we describe the specifics of the data collection
procedure we used for the browser and email datasets.
5.1.1
Browser
We used five different browsers – Google Chrome
5.0.3, Safari 5.0.1, Firefox 3.6, Opera 10.53, and IE 8
– to collect traces for various browser-related problems
from within a corporate network. To collect the traces,
we ran these browsers on three machines, each with a
different OS — Windows 7, Ubuntu Linux 9.10, and
Mac OSX 10.5.8 — subject to the availability of each
browser on these OSes.
We reproduced each of the problems listed in Table 2
in our setting on each available browser + OS combina-
tion. For each such combination, we collected a set of
GOOD traces and BAD traces for each of the problem
scenarios listed in Table 2. In some cases, the problem
was applicable to only a specific browser, so we collected
BAD traces only for that particular browser. In all, we
collected 878 traces for this dataset: 307 GOOD traces
and 571 BAD traces. Note that, for lack of space, in
the following we discuss signatures only for 7 of 11 the
cases listed in table 2.
Features: We use our domain knowledge to determine
which protocols are relevant to browsers, and extract
features pertinent to this set of protocols. The feature
extractor summarizes each browser trace using features
specific to HTTP, TCP on port 80, all name resolution
protocols in use (DNS, Netbios, LLMNR), and generic
features that can help capture low-level problems, such
as the presence or absence of background IP traffic.
5.1.2
Email
We collected email related problem traces in a univer-
sity network using the Thunderbird 3.1.2 client running
on three machines, each with a different client OS: Mac
OSX, Windows XP, and Ubuntu 9.10. The Windows
XP and Ubuntu traces are from the same university net-
work, while Mac traces are from a machine connected
to the university’s residential network via a wireless ac-
cess point. The clients connected to one of two email
servers, each of which supported IMAP and SMTP over
SSL.
We used these configurations to collect network traces
for problem scenarios where the client was correctly
sending and receiving emails, and then for the faulty
scenarios listed in Table 3. We reproduced each of the
problems listed in the table on each OS by manually
configuring the email client to reproduce the problem.
We collected 5 samples for each of the problems on each
OS. Thus, we collected a total of 15 samples for each
problem. In all, we collected 150 traces of email prob-
lems: 30 GOOD and 120 BAD.
Note that we cleared the DNS cache before each run
to capture the complete network activity of the email
client. Had we had not done so, the small size of our
experimental setup (3 clients and 2 servers) would have
meant that DNS queries would have been largely ab-
sent, having been filtered out by the cache. However,
in a real setting, with a large number of clients and
servers, there would be DNS queries associated with at
least a fraction of the successful transactions. We seek
to recreate such instances despite the small size of our
setup, by clearing the DNS cache before each run.
Features:
The university uses IMAP and SMTP
over SSL, hence, the feature extractor summarizes each
email trace using features specific to SMTP over SSL,
IMAP over SSL, TCP on the respective ports, DNS
(this is the only name resolution protocol in use in the
university network), and generic features that can help
capture low-level problems, such as the presence or ab-
sence of background ARP and IP packets. Since all
traces involve the client machines connecting to one of
two mail servers, we also record aggregate features, as
discussed in Section 3, to capture server-specific behav-
ior.
5.2
Comparison with a Classifier
Our first evaluation concentrates on comparing the
signatures learned by Deja vu with those learned by a
conventional classifier. While we input traces labeled
as either GOOD or BAD to Deja vu, the classifier had
the added benefit of having each faulty trace labeled
with the root cause of the problem instead of just the
generic BAD label. To perform classification, we used
the C4.5 decision tree classifier, which has been used
often in prior work [2, 5].