Figure 2:
Root causes and corresponding Deja vu signatures and classifier signatures for email traces.
993 (IMAP over SSL) [TCPSYN993-TCPSYNACK993
= 0] whereas the classifier signature does not give us
this information.
4. Sometimes, Deja vu has multiple signatures corre-
sponding to the same root cause, whereas the classifier
does not. Since Deja vu does not have access to fine-
grained labels, it sometimes creates noisy splits in the
signature which the classifier avoids. Such noisy splits
can be avoided to some extent by techniques noted in
Section 7.
5.3
Signature Stability and Adaptability
Next, we turn to the question of how stable Deja
vu’s signatures are, and how effective the algorithm is
at learning new signatures on-the-fly. Would the signa-
tures learned from a training set still apply to a test set
gathered at a later time? Does the algorithm learn new
signatures when required?
To answer these questions, we collected a test dataset
in the corporate network for two browsers – IE and Fire-
fox – approximately 2 months after we had collected the
training dataset. We collected training data on Win-
dows 7, Mac OSX, and Ubuntu systems, and the test
dataset on a Windows XP machine. The test dataset
included 10 BAD traces (5 each for IE and Firefox) for
each of 6 root causes, giving us a total of 60 bad traces.
We could not collect data for the “Misconfigured outgo-
ing firewall” root cause because Windows XP does not
allow the configuration of outgoing firewall rules.
For 5 out of the 6 root causes, an overwhelming ma-
jority (95%) of the traces in the test dataset matched
the signatures of the same root cause that had been
learnt earlier from the training dataset. This demon-
strates the stability of the Deja vu signatures for these
5 root causes. However, Deja vu misclassified all 10
traces for the “Wrong proxy” root cause, marking them
either as “internal site authentication error” or “name
resolution error”.
To investigate this, we relearned the Deja vu signa-
tures by adding these 10 BAD traces to the initial train-
ing set of 878 traces. We found that Deja vu learned an
additional, new signature for the “Wrong proxy” root
cause, which all 10 new traces contributed to:
[HT T P R200 = 0] AND [N HT T P Q = 0] AND
[HT T P R502 = 0] AND [HT T P R500 = 0] AND
[HT T P R504 = 1]
To create the “Wrong proxy” root cause, we always
set the proxy to a non-existent IP address that we
were confident would not respond (e.g., 5.1.1.1). How-
ever, the new signature noted above indicates that not
only did requests to this IP address complete a success-
ful TCP handshake, it even responded with an HTTP
Gateway error! We communicated this to the relevant
network administrators, who investigated the matter
and then informed us that this strange behavior was
the result of some recent routing configuration changes
made on the corporate network that directed traffic to
some non-existent IP addresses to a set of misconfig-
ured servers that were responding to the requests with
a gateway error.
Figure 3:
Root causes and corresponding Deja vu signatures and classifier signatures for browser traces.
This interesting anecdote shows that Deja vu signa-
tures are not just useful for failure classification, but
can also be a component of a network problem diag-
nosis tool. Whenever Deja vu learns a new problem
signature, the tool can alert the administrators so they
can investigate it to see if the signature reveals anything
untoward.
6.
APPLICATIONS
Deja vu provides a way to characterize network prob-
lems with a compact fingerprint.
Such a fingerprint
has several applications. A fingerprint could be used
to search through a large dataset to find instances of
a particular problem. It could also be used to recall
and match against previously seen instances of a prob-
lem. We briefly describe applications in each of these
categories.
6.1
Search Tool
Packet tracing tools such as tcpdump and netmon pro-
vide a way to apply filters to find specific packet types of
interest, either from a live capture or from a recorded
trace. However, what if we are interested in search-
ing for problem events rather than for specific pack-
ets? For instance, we might wish to find instances in
a trace where a secure webpage access failed because
the firewall blocked port 443 traffic. To provide this ca-
pability, we have built a simple search tool using Deja
vu. The target trace is sliced into windows, either slid-
ing windows or jumping windows. Features extracted
from each slice are then fed into the signature tree con-
structed by Deja vu for the problem of interest.
One question is how wide a slice should be. Ideally,
the slice should be wide enough to accommodate the
problem event of interest but no wider. For instance,
consider a problem signature that comprises a successful
DNS request-response exchanged followed by a success-
ful TCP SYN handshake followed, in turn, by an HTTP
request that fails to elicit a response. To be able to cap-
ture the full signature, the slice must be wide enough
to span all of the above packet exchanges. However, if
the slice were too wide, then it risks being polluted by
noise in the form of features from packets belonging to
an unrelated transaction.
In our implementation of the search tool, we use a
slice size of 30 seconds. We tested the tool on a 4MB
network trace collected over a period of 40 minutes.
We recreated 5 different problems from the set of root
causes shown in Figure 3. The search tool was success-
ful in finding 3 of these problems. It missed catching
one problem because the window size was too large and
background noise polluted it, and it missed the second
one because the window size was too small to capture
an important feature later in the trace. This indicates
that such a search tool should ideally use windows of
varying sizes to catch all problems in the trace.
6.2
Help Desk Application
A second application of Deja vu is in the context of
a help desk tool. Our help desk application uses the
problem signatures generated by Deja vu to automati-
cally match the problem being experienced by the user
against a database of known issues, i.e., ones for which
there is a known fix. Whenever a failure is encountered
(e.g., a browser error), the Deja vu component on the
client machine extracts features from the packet trace
in the recent past (tracing is an ongoing background
activity) and sends these to the Deja vu server. At the
server, these features are fed into the application’s sig-
nature tree and thereby matched against a known cate-
gory of failures. The problem notes associated with this
category would then guide the diagnosis and resolution
procedure.
A more sophisticated version of the help desk appli-
cation could use Deja vu signatures to index WikiDo [9]
tasks instead of just indexing manually crafted notes.
7.
DISCUSSION
Noisy traces:
We discuss the impact of noisy traces
on Deja vu’s signatures. Noise refers to packets that are
extraneous to the application of interest. Such noise
could arise from the network communication of other
applications or even other hosts, depending on where
the packet trace is captured. Deja vu’s feature extrac-
tor would then extract features from such background
traffic and include these with the (correct) features cor-
responding to the traffic of interest.
Such noisy features could be problematic in two ways:
(a) these could lead Deja vu to learn incorrect signa-
tures for problems, and (b) these could cause an incor-
rect match when an attempt is made to match the noisy
features against the signatures generated by Deja vu.
Deja vu’s use of GOOD/BAD labels helps mitigate
problem (a) because the noisy features are likely to
be uncorrelated with the success (GOOD) or failure
(BAD) of the application of interest and hence are
likely to be disregarded by Deja vu’s signature con-
struction algorithm. However, a noisy feature extracted
from background traffic (e.g., a successful DNS request-
response exchange) could still cause problems, as ex-
plained above.
To alleviate the above problem, we could leverage
prior work on application traffic fingerprinting (e.g., [11,
7, 15] to separate out just the subset of traffic in a
packet trace that corresponds to the application of in-
terest. Performing such separation thoroughly would
require the tracing to be performed on the end hosts,
so that traffic could be unambiguously tied to specific
applications.
Another source of inaccuracy in the traces is misla-
beling of GOOD and BAD traces. Previous work [2]
has shown that the C4.5 decision tree algorithm is ro-
bust to a certain degree of mislabeling in the context
of network diagnostics. However, no learning algorithm
can withstand large amounts of mislabeling. Applica-
tions that use Deja vu have to be designed in a way so
that the chances of mislabeling stays low. A discussion
of such application-level techniques are out of scope of
this paper.
Scalability:
In our experiments, the Deja vu algo-
rithm took less than one second to complete processing
all the traces. For the applications we have discussed,
we expect practitioners to run Deja vu with a frequency
of approximately once a day, and we believe the current
performance is suitable for this design point. It is, how-
ever, possible that as the problem traces become more
diverse, Deja vu may learn a considerable number of
problem signatures in a single run. In such cases, sig-
natures can be prioritized based on the confidence that
the C4.5 algorithm assigns onto them. Signatures that
are seen more often can be bubbled to the top of the
priority list, thereby allowing an administrator or sup-
port engineer to look at the more predominant problems
first.
8.
RELATED WORK
8.1
Network Traffic Analysis
Analysis of network traffic has been used to fin-
gerprint applications and infer the behavior of proto-
cols [15, 11]. While such analysis has used supervised
learning on coarse features such as packet size and flow
length to distinguish between applications, Deja vu op-
erates on more fine-grained features (e.g., features spe-
cific to DNS, TCP, HTTP, etc.) but with coarse-grained
GOOD vs. BAD labels.
Such analysis has also been used to discover the
session-level structure of applications [7], e.g., to dis-
cover that in an FTP session, a control connection is of-
ten followed by one or more data connections. However,
to our knowledge, such session structure has not been
used for constructing signatures for network problems.
Furthermore, discovering session structure is only semi-
automated, requiring the involvement of a human ex-
pert to actually reconstruct the session structure. Hu-
man involvement in Deja vu is limited to labeling train-
ing runs as GOOD or BAD, a much less onerous task.
Finally, such analysis has also been used to perform
network anomaly detection (e.g., [18]). The typical ap-
proach has been to construct a model of normal behav-
iors based on past traffic history and then look for sig-
nificant changes in short-term behavior based that are
inconsistent with the model. While anomaly detection
has focused on aggregate behavior, Deja vu focuses on
the network behavior of an individual application run.
8.2
Fingerprinting Problems
DebugAdvisor [3] is a tool to search through source
control systems and bug databases to aid debugging.
Unlike Deja vu, it uses a standard text search tool over
call stack information and bug reports. Deja vu is closer
in spirit to work on automating the diagnosis of system
problems, which involves extracting signatures from in-
formation such as system call traces (e.g., [16]). The
approach is to employ supervised learning (e.g., SVM)
on a fully labeled database of known problems. In a
similar vein, Clarify [5] is a system that improves error
reporting by classifying application behavior. Clarify
generate a behavior profile, i.e., a summary of the pro-
gram’s execution history, which is then labeled by a
human expert to enable learning-based classification.
In comparison with the above approaches, which re-
quire a human expert to perform full labeling, Deja vu
operates only with coarse-grained labels. Also, since
Deja vu focuses on network problems, there are a num-
ber of domain-specific choices it incorporates, including
for feature selection.
8.3
State-based Diagnosis
STRIDER [14] and PeerPressure [13] analyze state
information in the Windows registry, to identify fea-
tures (e.g., registry key settings) that are indicative of
a problem. Unlike with Deja vu, the goal of this body
of work was not to develop problem-specific signatures
based on the behavior of the system. Rather it is to de-
tect anomalous state by performing state differencing
between a health machine and a sick machine. Also,
the features (e.g., registry key settings) were treated
as opaque entities whereas Deja vu uses networking
domain-specific knowledge to define features.
Similarly, NetPrints [2] analyzes network configura-
tion information to diagnose home network problems.
While being largely state-based, NetPrints also made
limited use of network problem signatures to address
the issue of hidden configurations that are not available
to the state-based analysis.
Compared to the above, Deja vu is not intrusive since
it operates on network traffic and hence does not require
any tracing to be performed on the end system itself.
8.4
Active Probing
While Deja vu seeks to extract network problem
signatures from existing application traffic, there is a
large body of work on characterizing network problems
through active probing [10, 17, 4, 8].
Active prob-
ing with a carefully-crafted set of tests enables detailed
characterization of a range of problems, often enabling
diagnosis. In contrast, Deja vu strives to produce a
problem fingerprint based on the traffic that the appli-
cation generates anyway. These fingerprints may not
contain the detail to directly enable diagnostics. Nev-
ertheless, these provide a generic way to match a prob-
lem instance with a previously seen instance, thereby
enabling diagnostics, as noted in Section 6.2.
9.
CONCLUSION
Deja vu is a tool to associate a compact signature
with each category of network problem experienced
by an application. It uses a novel algorithm to learn
the signatures from coarse-grained GOOD/BAD la-
bels. Our experimental evaluation, including compar-
ison with a standard classifier (which has the benefit
of knowing fine-grained labels) and a user study, has
demonstrated the effectiveness of Deja vu signatures.
10.
REFERENCES
[1] Microsoft network monitor. URL
“http://www.microsoft.com/downloads/en/netmon”.
[2] B. Aggarwal, R. Bhagwan, T. Das, S. Eswaran,
V. Padmanabhan, and G. Voelker. NetPrints: Diagnosing
Home Network Misconfigurations using Shared Knowledge. In
NSDI, 2009.
[3] B. Ashok, J. Joy, H. Liang, S. Rajamani, G. Srinivasa, and
V. Vangala. DebugAdvisor: A Recommender System for
Debugging. In FSE, 2009.
[4] M. Dischinger, M. Marcon, S. Guha, K. P. Gummadi,
R. Mahajan, and S. Saroiu. Glasnost: Enabling End Users to
Detect Traffic Differentiation. In Networked Systems Design
and Implementation, 2010.
[5] J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan,
D. E. Porter, D. L. Chen, and E. Witchel. Improved Error
Reporting for Software that Uses Black-box Components. In
PLDI, 2007.
[6] S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, and
J. Padhye. Detailed diagnosis in computer networks. In
Sigcomm. ACM, 2010.
[7] J. Kannan, J. Jung, V. Paxson, and C. E. Koksal.
Semi-Automated Discovery of Application Session Structure. In
IMC, 2006.
[8] C. Kreibich, N. Weaver, B. Nechaev, and V. Paxson. Netalyzr:
Illuminating The Edge Network. In IMC, 2010.
[9] N. Kushman, M. Brodsky, S. Branavan, D. Katabi, R. Barzilay,
and M. Rinard. WikiDo. In HotNets, 2009.
[10] R. Mahajan, N. Spring, D. Wetherall, and T. Anderson.
User-level Internet Path Diagnosis. In SOSP, October 2003.
[11] A. Moore and K. Papagiannaki. Toward the Accurate
Identification of Network Applications. In PAM, 2005.
[12] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan
Kauffman, 1993.
[13] H. Wang, J. Platt, Y. Chen, R. Zhang, and Y. Wang.
Automatic Misconfiguration Troubleshooting with
PeerPressure. In OSDI, 2004.
[14] Y.-M. Wang, C. Verbowski, J. Dunagan, Y. Chen, H. J. Wang,
C. Yuan, and Z. Zhang. STRIDER: A Black-box, State-based
Approach to Change and Configuration Management and
Support. In LISA, 2003.
[15] C. V. Wright, F. Monrose, and G. M. Masson. On Inferring
Application Protocol Behaviors in Encrypted Network Traffic.
J. Machine Learning Research, Dec 2006.
[16] C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and
W.-Y. Ma. Automated Known Problem Diagnosis with Event
Traces. In EuroSys, 2006.
[17] Y. Zhang, Z. M. Mao, and M. Zhang. Effective Diagnosis of
Routing Disruptions from End Systems. In Networked Systems
Design and Implementation, 2008.
[18] Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund.
Sketch-based Change Detection: Methods, Evaluation, and
Applications. In IMC, 2004.
Dostları ilə paylaş: |