in the feature set for the trace.
For example,
if a packet header indicates that it is an HTTP
200 Response (HTTP OK), we set the binary fea-
ture HTTP200 RESPONSE to 1.
If no HTTP
200 Response is present in the trace, this fea-
ture is set to 0.
We also set a special fea-
ture, NO HTTP RESPONSE, to 1 if no HTTP
response of any kind exists in the trace.
2. Sequence: The presence or absence of a partic-
ular sequence of packets in a trace could also be
symptomatic of a network problem. For example,
a faulty trace might contain a TCP SYN packet
but no corresponding TCP SYN/ACK packet,
which might be indicative of a problem. To cap-
ture such effects, we define composite binary fea-
tures, for example, one corresponding to the se-
quence TCP SYN → TCP SYNACK. We set this
feature to 1 if and only if a TCP SYN is followed
by a TCP SYNACK on the same connection. In
all other cases, including when no TCP SYN is
seen, it is set to 0.
We use our domain knowledge to trim the set of
such composite features.
For example, the se-
quence TCP RST → TCP SYN is not meaningful
and therefore we do not record it.
3. Aggregate features: As we manually debugged
site-specific problems by looking at traffic ex-
changed with a small set of websites, we found
that the network traces for each site have very uni-
form characteristics in terms of aggregate quanti-
ties such as number of successful TCP connections,
number of successful HTTP connections, number
of bytes transferred, etc. While these features may
be unimportant for generic networking problems,
they are are particularly useful in the context of
site-specific problem signatures for certain (pop-
ular) remote sites or services. For example, the
www.cnn.com webpage has specific characteristics
in terms of its layout and the number of objects
on the page. The aggregate features can thus help
capture the problem symptoms specific to this site.
Also, the popularity of this site might make it
worthwhile to learn site-specific signatures for it.
In our implementation, we have written feature ex-
tractors for 12 protocols using Microsoft Network Mon-
itor’s Parser API [1]. Table 1 shows the set of protocols
and the number of features we capture from each pro-
tocol. For each application, we choose a subset of these
protocols to determine what features to extract. Fea-
ture extractors for some protocols take as input a port
number so that they extract features specific to traffic to
and from that port. For example, when analyzing traces
from browsers, we extract TCP-level features specific
Protocol
No. of Features
ARP
3
DNS
4
HTTP
42
IKE/AUTHIP
4
IP
1
LLMNR
4
NBNS
4
RWS
4
SMB/SMB2
45
TCP
48
TLS
10
UDP
52
Table 1:
Number of features for each protocol.
to port 80. We extract more features for higher-level
protocols such as HTTP and SMB because these pro-
tocols provide richer error information in their headers
than lower level protocols like TCP and IP. The fea-
ture extraction algorithm traverses each trace once to
extract all the features and is therefore is not compute-
intensive.
4.
DEJA VU ALGORITHM
The Deja vu algorithm is used to identify signatures
corresponding to the various problems experienced by
an application. The algorithm takes as input the sets of
features from multiple application runs, each of which is
labeled as GOOD (if the run was successful) or BAD (if
the run was unsuccessful). The goal, and the key chal-
lenge, is to work with these coarse labels to decompose
the set of BAD runs into meaningful categories corre-
sponding to the different problems experienced by the
application.
This challenge sets the problem apart from stan-
dard problems of classification and clustering in ma-
chine learning.
Unlike with classification, the cate-
gories of faults are not known in advance and hence we
do not have (fine-grained) labels corresponding to the
categories. Unlike with clustering, the coarse-grained
GOOD/BAD labels do matter since our goal, specifi-
cally, is to sub-categorize the BAD runs. In contrast,
clustering on the BAD runs to find categories might
bunch together runs based on some extraneous features
(e.g., the presence or absence of background noise).
These extraneous features may be similarly present or
absent in GOOD runs too, which means they are, in
essence, inconsequential and should have no bearing on
categorizing different kinds of BAD runs.
Consider the following example. Say half the traces
for an application (both GOOD and BAD) have ARP
requests, and the other half do not since the end-host
caches ARP responses. Therefore, the presence or ab-
sence of a ARP request is not a symptom of a prob-
lem. However a conventional clustering algorithm on
Figure 1: A signature tree learnt by Deja vu
the BAD traces will divide these traces into two cate-
gories based on the presence or absence of this feature.
On the other hand, including the GOOD traces in our
learning procedure will help us learn that the presence
of the ARP request is inconsequential since half of the
GOOD as well as the BAD traces have this feature, and
the other half do not.
To address the challenge of learning the categories of
problems from coarse-grained labels, we have developed
a novel algorithm for Deja vu. We start by illustrating
the algorithm with an example in Section 4.1, and then
specify it in Section 4.2.
4.1
Example
We now illustrate the operation of the Deja vu al-
gorithm by demonstrating how an example signature
tree is constructed for a browser application. The main
intent is to extract multiple problem signatures from
the input feature sets. Toward this goal, the algorithm
makes use of the C4.5 decision tree classifier([12]) in
multiple iterations. The signature tree captures all the
problem signatures for the application. Figure 1 shows
an example signature tree.
In the first iteration, the Deja vu algorithm inputs the
feature sets corresponding to all runs, labeled GOOD or
BAD, to the C4.5 decision tree classifier. The classifier
outputs the tree shown within the box titled “Iteration
1” in Figure 1. This tree tells us that:
If a trace does not have an HTTP 200 Response, it is
BAD.
While this statement tells us something about a
generic symptom that all the bad traces have, it does
not help categorize the bad traces into different sub-
categories or problems. So, we force the decision tree
classifier to learn an alternative way of classifying the
bad traces by removing the HTTP200 RESPONSE fea-
ture from all good and bad traces, and inputting the
pruned feature sets to C4.5 in Iteration 2. The output
of this iteration is shown in the box marked “Iteration
2” in Figure 1. This tells us that:
If the trace does not have an HTTP 304 Response, it
is BAD.
However, in this iteration too, the decision tree
classifies all bad traces into one category because it
found yet another (novel) feature that is common to
all bad traces. Hence, for iteration 3, we remove the
HTTP304 RESPONSE feature from all good and bad
feature sets and learn a decision tree again. In iteration
3, the tree has 2 leaf nodes labeled BAD:
If the trace (a) has no TCP SYN/SYNACK exchange
on port 80, OR (b) has a TCP SYN/SYNACK
exchange on port 80 AND an HTTP 502 Response
(proxy error), it is BAD.
Hence, by iteratively removing features that exist in
all BAD traces, the algorithm has succeeded in learn-
ing two different problem signatures and sub-categorizes
the bad traces across them. The absence of a TCP con-
nection characterizes the first signature (marked (a)),
while an HTTP 502 response (proxy error) character-
izes the second (marked (b)).
In iteration 4, we divide the BAD feature sets into two
categories: one each for the two leaves labeled BAD
in Iteration 3 above. The first category has no TCP
SYN/SYNACK exchange on port 80, whereas the sec-
ond category does but also includes an HTTP 502 re-
sponse. With each category of bad feature sets, we sep-
arately repeat the procedure described in the previous
iterations. Therefore, from the BAD feature sets in the
first category as well as from all GOOD feature sets,
we exclude the “TCP SYN → TCP SYNACK on port
80” feature and then rerun C4.5. On the other hand,
from the BAD feature sets in the second category as
well as from all GOOD feature sets, we exclude both
the “TCP SYN → TCP SYNACK on port 80” feature
and the “HTTP 502 Response” feature, and then rerun
C4.5.
Iteration 4 for the first BAD category obtained in
Iteration 3 gives us the tree shown in the box marked
“Iteration 4” in Figure 1. This tree tells us that
If the trace (a) has no IP traffic,OR (b) has IP traffic
but no Successful DNS Query, it is BAD.