Deja vu: Fingerprinting Network Problems

Yüklə 210 Kb.

Pdf görüntüsü

səhifə	6/6
tarix	17.09.2017
ölçüsü	210 Kb.
	#354

1 2 3 4 5 6

Figure 2:

Root causes and corresponding Deja vu signatures and classiﬁer signatures for email traces.

993 (IMAP over SSL) [TCPSYN993-TCPSYNACK993

= 0] whereas the classiﬁer signature does not give us

this information.

4. Sometimes, Deja vu has multiple signatures corre-

sponding to the same root cause, whereas the classiﬁer

does not. Since Deja vu does not have access to ﬁne-

grained labels, it sometimes creates noisy splits in the

signature which the classiﬁer avoids. Such noisy splits

can be avoided to some extent by techniques noted in

Section 7.

5.3

Signature Stability and Adaptability

Next, we turn to the question of how stable Deja

vu’s signatures are, and how eﬀective the algorithm is

at learning new signatures on-the-ﬂy. Would the signa-

tures learned from a training set still apply to a test set

gathered at a later time? Does the algorithm learn new

signatures when required?

To answer these questions, we collected a test dataset

in the corporate network for two browsers – IE and Fire-

fox – approximately 2 months after we had collected the

training dataset. We collected training data on Win-

dows 7, Mac OSX, and Ubuntu systems, and the test

dataset on a Windows XP machine. The test dataset

included 10 BAD traces (5 each for IE and Firefox) for

each of 6 root causes, giving us a total of 60 bad traces.

We could not collect data for the “Misconﬁgured outgo-

ing ﬁrewall” root cause because Windows XP does not

allow the conﬁguration of outgoing ﬁrewall rules.

For 5 out of the 6 root causes, an overwhelming ma-

jority (95%) of the traces in the test dataset matched

the signatures of the same root cause that had been

learnt earlier from the training dataset. This demon-

strates the stability of the Deja vu signatures for these

5 root causes. However, Deja vu misclassiﬁed all 10

traces for the “Wrong proxy” root cause, marking them

either as “internal site authentication error” or “name

resolution error”.

To investigate this, we relearned the Deja vu signa-

tures by adding these 10 BAD traces to the initial train-

ing set of 878 traces. We found that Deja vu learned an

additional, new signature for the “Wrong proxy” root

cause, which all 10 new traces contributed to:

[HT T P R200 = 0] AND [N HT T P Q = 0] AND

[HT T P R502 = 0] AND [HT T P R500 = 0] AND

[HT T P R504 = 1]

To create the “Wrong proxy” root cause, we always

set the proxy to a non-existent IP address that we

were conﬁdent would not respond (e.g., 5.1.1.1). How-

ever, the new signature noted above indicates that not

only did requests to this IP address complete a success-

ful TCP handshake, it even responded with an HTTP

Gateway error! We communicated this to the relevant

network administrators, who investigated the matter

and then informed us that this strange behavior was

the result of some recent routing conﬁguration changes

made on the corporate network that directed traﬃc to

some non-existent IP addresses to a set of misconﬁg-

ured servers that were responding to the requests with

a gateway error.

Figure 3:

Root causes and corresponding Deja vu signatures and classiﬁer signatures for browser traces.

This interesting anecdote shows that Deja vu signa-

tures are not just useful for failure classiﬁcation, but

can also be a component of a network problem diag-

nosis tool. Whenever Deja vu learns a new problem

signature, the tool can alert the administrators so they

can investigate it to see if the signature reveals anything

untoward.

APPLICATIONS

Deja vu provides a way to characterize network prob-

lems with a compact ﬁngerprint.

Such a ﬁngerprint

has several applications. A ﬁngerprint could be used

to search through a large dataset to ﬁnd instances of

a particular problem. It could also be used to recall

and match against previously seen instances of a prob-

lem. We brieﬂy describe applications in each of these

categories.

6.1

Search Tool

Packet tracing tools such as tcpdump and netmon pro-

vide a way to apply ﬁlters to ﬁnd speciﬁc packet types of

interest, either from a live capture or from a recorded

trace. However, what if we are interested in search-

ing for problem events rather than for speciﬁc pack-

ets? For instance, we might wish to ﬁnd instances in

a trace where a secure webpage access failed because

the ﬁrewall blocked port 443 traﬃc. To provide this ca-

pability, we have built a simple search tool using Deja

vu. The target trace is sliced into windows, either slid-

ing windows or jumping windows. Features extracted

from each slice are then fed into the signature tree con-

structed by Deja vu for the problem of interest.

One question is how wide a slice should be. Ideally,

the slice should be wide enough to accommodate the

problem event of interest but no wider. For instance,

consider a problem signature that comprises a successful

DNS request-response exchanged followed by a success-

ful TCP SYN handshake followed, in turn, by an HTTP

request that fails to elicit a response. To be able to cap-

ture the full signature, the slice must be wide enough

to span all of the above packet exchanges. However, if

the slice were too wide, then it risks being polluted by

noise in the form of features from packets belonging to

an unrelated transaction.

In our implementation of the search tool, we use a

slice size of 30 seconds. We tested the tool on a 4MB

network trace collected over a period of 40 minutes.

We recreated 5 diﬀerent problems from the set of root

causes shown in Figure 3. The search tool was success-

ful in ﬁnding 3 of these problems. It missed catching

one problem because the window size was too large and

background noise polluted it, and it missed the second

one because the window size was too small to capture

an important feature later in the trace. This indicates

that such a search tool should ideally use windows of

varying sizes to catch all problems in the trace.

6.2

Help Desk Application

A second application of Deja vu is in the context of

a help desk tool. Our help desk application uses the

problem signatures generated by Deja vu to automati-

cally match the problem being experienced by the user

against a database of known issues, i.e., ones for which

there is a known ﬁx. Whenever a failure is encountered

(e.g., a browser error), the Deja vu component on the

client machine extracts features from the packet trace

in the recent past (tracing is an ongoing background

activity) and sends these to the Deja vu server. At the

server, these features are fed into the application’s sig-

nature tree and thereby matched against a known cate-

gory of failures. The problem notes associated with this

category would then guide the diagnosis and resolution

procedure.

A more sophisticated version of the help desk appli-

cation could use Deja vu signatures to index WikiDo [9]

tasks instead of just indexing manually crafted notes.

DISCUSSION

Noisy traces:

We discuss the impact of noisy traces

on Deja vu’s signatures. Noise refers to packets that are

extraneous to the application of interest. Such noise

could arise from the network communication of other

applications or even other hosts, depending on where

the packet trace is captured. Deja vu’s feature extrac-

tor would then extract features from such background

traﬃc and include these with the (correct) features cor-

responding to the traﬃc of interest.

Such noisy features could be problematic in two ways:

(a) these could lead Deja vu to learn incorrect signa-

tures for problems, and (b) these could cause an incor-

rect match when an attempt is made to match the noisy

features against the signatures generated by Deja vu.

Deja vu’s use of GOOD/BAD labels helps mitigate

problem (a) because the noisy features are likely to

be uncorrelated with the success (GOOD) or failure

(BAD) of the application of interest and hence are

likely to be disregarded by Deja vu’s signature con-

struction algorithm. However, a noisy feature extracted

from background traﬃc (e.g., a successful DNS request-

response exchange) could still cause problems, as ex-

plained above.

To alleviate the above problem, we could leverage

prior work on application traﬃc ﬁngerprinting (e.g., [11,

7, 15] to separate out just the subset of traﬃc in a

packet trace that corresponds to the application of in-

terest. Performing such separation thoroughly would

require the tracing to be performed on the end hosts,

so that traﬃc could be unambiguously tied to speciﬁc

applications.

Another source of inaccuracy in the traces is misla-

beling of GOOD and BAD traces. Previous work [2]

has shown that the C4.5 decision tree algorithm is ro-

bust to a certain degree of mislabeling in the context

of network diagnostics. However, no learning algorithm

can withstand large amounts of mislabeling. Applica-

tions that use Deja vu have to be designed in a way so

that the chances of mislabeling stays low. A discussion

of such application-level techniques are out of scope of

this paper.

Scalability:

In our experiments, the Deja vu algo-

rithm took less than one second to complete processing

all the traces. For the applications we have discussed,

we expect practitioners to run Deja vu with a frequency

of approximately once a day, and we believe the current

performance is suitable for this design point. It is, how-

ever, possible that as the problem traces become more

diverse, Deja vu may learn a considerable number of

problem signatures in a single run. In such cases, sig-

natures can be prioritized based on the conﬁdence that

the C4.5 algorithm assigns onto them. Signatures that

are seen more often can be bubbled to the top of the

priority list, thereby allowing an administrator or sup-

port engineer to look at the more predominant problems

ﬁrst.

RELATED WORK

8.1

Network Trafﬁc Analysis

Analysis of network traﬃc has been used to ﬁn-

gerprint applications and infer the behavior of proto-

cols [15, 11]. While such analysis has used supervised

learning on coarse features such as packet size and ﬂow

length to distinguish between applications, Deja vu op-

erates on more ﬁne-grained features (e.g., features spe-

ciﬁc to DNS, TCP, HTTP, etc.) but with coarse-grained

GOOD vs. BAD labels.

Such analysis has also been used to discover the

session-level structure of applications [7], e.g., to dis-

cover that in an FTP session, a control connection is of-

ten followed by one or more data connections. However,

to our knowledge, such session structure has not been

used for constructing signatures for network problems.

Furthermore, discovering session structure is only semi-

automated, requiring the involvement of a human ex-

pert to actually reconstruct the session structure. Hu-

man involvement in Deja vu is limited to labeling train-

ing runs as GOOD or BAD, a much less onerous task.

Finally, such analysis has also been used to perform

network anomaly detection (e.g., [18]). The typical ap-

proach has been to construct a model of normal behav-

iors based on past traﬃc history and then look for sig-

niﬁcant changes in short-term behavior based that are

inconsistent with the model. While anomaly detection

has focused on aggregate behavior, Deja vu focuses on

the network behavior of an individual application run.

8.2

Fingerprinting Problems

DebugAdvisor [3] is a tool to search through source

control systems and bug databases to aid debugging.

Unlike Deja vu, it uses a standard text search tool over

call stack information and bug reports. Deja vu is closer

in spirit to work on automating the diagnosis of system

problems, which involves extracting signatures from in-

formation such as system call traces (e.g., [16]). The

approach is to employ supervised learning (e.g., SVM)

on a fully labeled database of known problems. In a

similar vein, Clarify [5] is a system that improves error

reporting by classifying application behavior. Clarify

generate a behavior proﬁle, i.e., a summary of the pro-

gram’s execution history, which is then labeled by a

human expert to enable learning-based classiﬁcation.

In comparison with the above approaches, which re-

quire a human expert to perform full labeling, Deja vu

operates only with coarse-grained labels. Also, since

Deja vu focuses on network problems, there are a num-

ber of domain-speciﬁc choices it incorporates, including

for feature selection.

8.3

State-based Diagnosis

STRIDER [14] and PeerPressure [13] analyze state

information in the Windows registry, to identify fea-

tures (e.g., registry key settings) that are indicative of

a problem. Unlike with Deja vu, the goal of this body

of work was not to develop problem-speciﬁc signatures

based on the behavior of the system. Rather it is to de-

tect anomalous state by performing state diﬀerencing

between a health machine and a sick machine. Also,

the features (e.g., registry key settings) were treated

as opaque entities whereas Deja vu uses networking

domain-speciﬁc knowledge to deﬁne features.

Similarly, NetPrints [2] analyzes network conﬁgura-

tion information to diagnose home network problems.

While being largely state-based, NetPrints also made

limited use of network problem signatures to address

the issue of hidden conﬁgurations that are not available

to the state-based analysis.

Compared to the above, Deja vu is not intrusive since

it operates on network traﬃc and hence does not require

any tracing to be performed on the end system itself.

8.4

Active Probing

While Deja vu seeks to extract network problem

signatures from existing application traﬃc, there is a

large body of work on characterizing network problems

through active probing [10, 17, 4, 8].

Active prob-

ing with a carefully-crafted set of tests enables detailed

characterization of a range of problems, often enabling

diagnosis. In contrast, Deja vu strives to produce a

problem ﬁngerprint based on the traﬃc that the appli-

cation generates anyway. These ﬁngerprints may not

contain the detail to directly enable diagnostics. Nev-

ertheless, these provide a generic way to match a prob-

lem instance with a previously seen instance, thereby

enabling diagnostics, as noted in Section 6.2.

CONCLUSION

Deja vu is a tool to associate a compact signature

with each category of network problem experienced

by an application. It uses a novel algorithm to learn

the signatures from coarse-grained GOOD/BAD la-

bels. Our experimental evaluation, including compar-

ison with a standard classiﬁer (which has the beneﬁt

of knowing ﬁne-grained labels) and a user study, has

demonstrated the eﬀectiveness of Deja vu signatures.

10.

REFERENCES

[1] Microsoft network monitor. URL

“http://www.microsoft.com/downloads/en/netmon”.

[2] B. Aggarwal, R. Bhagwan, T. Das, S. Eswaran,

V. Padmanabhan, and G. Voelker. NetPrints: Diagnosing

Home Network Misconﬁgurations using Shared Knowledge. In

NSDI, 2009.

[3] B. Ashok, J. Joy, H. Liang, S. Rajamani, G. Srinivasa, and

V. Vangala. DebugAdvisor: A Recommender System for

Debugging. In FSE, 2009.

[4] M. Dischinger, M. Marcon, S. Guha, K. P. Gummadi,

R. Mahajan, and S. Saroiu. Glasnost: Enabling End Users to

Detect Traﬃc Diﬀerentiation. In Networked Systems Design

and Implementation, 2010.

[5] J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan,

D. E. Porter, D. L. Chen, and E. Witchel. Improved Error

Reporting for Software that Uses Black-box Components. In

PLDI, 2007.

[6] S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, and

J. Padhye. Detailed diagnosis in computer networks. In

Sigcomm. ACM, 2010.

[7] J. Kannan, J. Jung, V. Paxson, and C. E. Koksal.

Semi-Automated Discovery of Application Session Structure. In

IMC, 2006.

[8] C. Kreibich, N. Weaver, B. Nechaev, and V. Paxson. Netalyzr:

Illuminating The Edge Network. In IMC, 2010.

[9] N. Kushman, M. Brodsky, S. Branavan, D. Katabi, R. Barzilay,

and M. Rinard. WikiDo. In HotNets, 2009.

[10] R. Mahajan, N. Spring, D. Wetherall, and T. Anderson.

User-level Internet Path Diagnosis. In SOSP, October 2003.

[11] A. Moore and K. Papagiannaki. Toward the Accurate

Identiﬁcation of Network Applications. In PAM, 2005.

[12] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan

Kauﬀman, 1993.

[13] H. Wang, J. Platt, Y. Chen, R. Zhang, and Y. Wang.

Automatic Misconﬁguration Troubleshooting with

PeerPressure. In OSDI, 2004.

[14] Y.-M. Wang, C. Verbowski, J. Dunagan, Y. Chen, H. J. Wang,

C. Yuan, and Z. Zhang. STRIDER: A Black-box, State-based

Approach to Change and Conﬁguration Management and

Support. In LISA, 2003.

[15] C. V. Wright, F. Monrose, and G. M. Masson. On Inferring

Application Protocol Behaviors in Encrypted Network Traﬃc.

J. Machine Learning Research, Dec 2006.

[16] C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and

W.-Y. Ma. Automated Known Problem Diagnosis with Event

Traces. In EuroSys, 2006.

[17] Y. Zhang, Z. M. Mao, and M. Zhang. Eﬀective Diagnosis of

Routing Disruptions from End Systems. In Networked Systems

Design and Implementation, 2008.

[18] Y. Zhang, S. Singh, S. Sen, N. Duﬃeld, and C. Lund.

Sketch-based Change Detection: Methods, Evaluation, and

Applications. In IMC, 2004.

Yüklə 210 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6