Deja vu: Fingerprinting Network Problems

Yüklə 210 Kb.

Pdf görüntüsü

səhifə	1/6
tarix	17.09.2017
ölçüsü	210 Kb.
	#354

1 2 3 4 5 6

Deja vu: Fingerprinting Network Problems

Bhavish Aggarwal

, Ranjita Bhagwan

∗

, Lorenzo De Carli

†

,

Venkat Padmanabhan

∗

, Krishna Puttaswamy

‡

∗

Microsoft Research India

†

University of California, Santa Barbara

‡

University of Wisconsin, Madison

Olacabs.com

ABSTRACT

We ask the question: can network problems experienced

by applications be identiﬁed based on symptoms contained

in a network packet trace? An answer in the afﬁrmative

would open the doors to many opportunities, including non-

intrusive monitoring of such problems on the network and

matching a problem with past instances of the same prob-

lem.

To this end, we present Deja vu, a tool to condense the

manifestation of a network problem into a compact signa-

ture, which could then be used to match multiple instances

of the same problem. Deja vu uses as input a network-level

packet trace of an application’s communication and extracts

from it a set of features. During the training phase, each

application run is manually labeled as GOOD or BAD, de-

pending on whether the run was successful or not. Deja vu

then employs a novel learning technique to build a signa-

ture tree not only to distinguish between GOOD and BAD

runs but to also sub-classify the BAD runs, revealing the dif-

ferent classes of failures. The novelty lies in performing the

sub-classiﬁcation without requiring any failure class-speciﬁc

labels.

We evaluate Deja vu in the context of the multiple web

browsers in a corporate environment and an email appli-

cation in a university environment, with promising results.

The signature generated by Deja vu based on the limited

GOOD/BAD labels is as effective as one generated using

full-blown classiﬁcation with knowledge of the actual prob-

lem types.

INTRODUCTION

Network communication is an integral part of many

applications. Therefore, network problems often impact

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

ACM CoNEXT 2011, December 6–9 2011, Tokyo, Japan.

10.00.

application behavior. The impact on network commu-

nication depends on the nature of the problem. If the

local name server is down, DNS requests will be sent

but no responses will be received. On the other hand, if

the ﬁrewall at the edge of a corporate network is block-

ing the https port, then SYN packets would be seen

but not any SYNACKs.

We ask the question: can network problems experi-

enced by applications be identiﬁed based on symptoms

contained in the application’s network packet trace?

There are several advantages to looking for symptoms

of network problems in a network packet trace. First,

it is not intrusive unlike tracing on an end system it-

self (e.g., system call tracing). So we could monitor the

health of applications running on several hosts without

requiring access to the hosts themselves. Second, net-

work communication represents the “narrow waist” of

network applications. Many versions of an application

(e.g., browser) and even OSes running on the end sys-

tems could exhibit consistent behavior at the level of

network protocol messages, thereby leading to similar

symptoms of problems at the network layer.

To answer the above question, we develop Deja vu, a

tool to condense the manifestation of a network prob-

lem into a compact signature. Each signature encapsu-

lates the symptoms corresponding to a particular prob-

lem. For instance, for a browser application that might

encounter the problems noted above, there would be

one signature corresponding to the local name server

problem and a diﬀerent one corresponding to the ﬁre-

wall problem. Although it might be tempting, based

on these simple examples, to employ a rule-based ap-

proach to constructing signatures, such an approach

suﬀers from the limitation of not being general enough

to accommodate new applications or even existing ap-

plications whose behavior is not fully understood or

documented.

Therefore, Deja vu uses a learning-based approach to

constructing signatures. We extract a set of features

from packet traces, using our domain knowledge to in-

form this. The features extracted correspond to proto-

cols such as DNS, IP, TCP, HTTP, etc. For instance,

there are features corresponding to the presence of a

DNS request, DNS reply, HTTP error code, etc.

Once these features have been extracted, designing

an algorithm to learn signatures is a key challenge. A

standard classiﬁcation approach, such as decision trees,

would require labeled training data. Generating a train-

ing set with problem type-speciﬁc labels is onerous and

could even be infeasible when the failure cause for a

training run is unknown (e.g., a failure could occur in a

remote network component). At the same time, an un-

supervised learning approach, such as clustering, would

be vulnerable to noisy data. For instance, features ex-

tracted from unrelated background traﬃc might still get

picked for clustering.

To address this challenge, Deja vu employs a novel

approach. For training, we only assume coarse-grained

labels: GOOD when the training run of an application

was successful and BAD otherwise. These labels can be

determined based on the exhibited behavior of an appli-

cation, without the need to know, in the case of BAD,

the problem category. Then, by iteratively applying a

decision-tree learning algorithm, Deja vu automatically

learns diﬀerent problem signatures for diﬀerent cate-

gories of problems.

We evaluate the eﬀectiveness of Deja vu in generat-

ing problem signatures for two classes of applications:

multiple web browsers and an email client. For each ap-

plication, we generate a training set by creating various

error conditions. Similarly we generate a test set. We

ﬁnd that the problem signatures constructed by Deja vu

based on the training set are able to classify the traces

in the test set with 95% accuracy. In fact, the classiﬁ-

cation performed by Deja vu using just the GOOD and

BAD labels is within 4.5% accuracy to that by a deci-

sion tree classiﬁer operating with the beneﬁt of problem

category labels attached to traces. We also show how

Deja vu learns new non-trivial problem signatures on-

the-ﬂy, which a rule-based approach would have missed.

Finally we show the eﬀectiveness of Deja vu’s signatures

in helping a human administrator match network packet

traces to problems.

DESIGN OVERVIEW AND SCOPE

The input to Deja vu is a set of network packet traces,

each coarsely labeled as GOOD or BAD. A GOOD trace

corresponds to a working application run while a BAD

trace corresponds to a non-working run. We believe

that not assuming more ﬁne-grained labeling is the right

choice because we have found that applications often fail

giving the same error messages for diﬀerent networking

problems, thereby not allowing a user to correctly dif-

ferentiate between diﬀerent bad runs. In our work, the

GOOD/BAD labeling is performed by us in the lab, but

we touch on alternative strategies in Section 7.

The coarsely-labeled traces are fed to Deja vu’s fea-

ture extractor, which uses domain knowledge to extract

a set of features, as discussed in Section 3. These feature

sets, together with the GOOD/BAD labels, are then fed

to Deja vu’s signature construction algorithm discussed

in Section 4. The novelty of this algorithm is that, al-

though it is just given the coarse GOOD/BAD labels

as input, it infers a sub-categorization of BAD corre-

sponding to the diﬀerent categories of problems that an

application encounters.

Once Deja vu has learnt and associated signatures

with problems, these could be used in a range of appli-

cations, helping to match the problems in a test trace to

ones that have previously been seen and assigned signa-

tures. We discuss two simple applications in Section 6.

Note that the extracted signatures can only be as

good as the data input to the algorithm. The quality

of the signatures therefore depends signiﬁcantly on the

choice of features, and the accuracy of the value of the

features. Also, the scope of Deja vu is limited to prob-

lems that manifest themselves in network traces. There

are several problems that applications experience which

may not show as abnormalities in network traces. Deja

vu does not address these problems. Consequently, the

input features to our algorithm are extracted only from

network traces, as we discuss in the next section.

FEATURES

In this section we describe what information we ex-

tract from the raw network traces and input to the Deja

vu algorithm. As with any machine learning algorithm,

Deja vu requires as input a set of features. The fea-

ture set extractor reduces a network packet trace to

a compact set of features that summarizes the essential

characteristics of the trace. This process also makes the

input to Deja vu less noisy (e.g., features correspond-

ing to unrelated background traﬃc are excluded) and

strips it of privacy-sensitive information. For example,

the actual packet payload is discarded except for some

speciﬁc header ﬁelds in protocols such as HTTP and

SMB.

The choice of features is key. Features that are too

detailed often suﬀer from a lack of generality. To de-

termine what kind of features to extract, we manually

scrutinized and debugged traces for several networking

problems. Using our domain knowledge and experience,

we settled on the following broad categories of features

to extract:

1. Packet types: Often, problems manifest them-

selves as the presence or absence of packets of

a certain type.

To capture this, we use bi-

nary features to record the presence or absence

of certain packet types, where type is determined

based on the packet header ﬁelds.

By exam-

ining the headers of the packets contained in a

trace, we set the corresponding binary features

Yüklə 210 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6