Deja vu: Fingerprinting Network Problems

Yüklə 210 Kb.

Pdf görüntüsü

səhifə	2/6
tarix	17.09.2017
ölçüsü	210 Kb.
	#354

1 2 3 4 5 6

in the feature set for the trace.

For example,

if a packet header indicates that it is an HTTP

200 Response (HTTP OK), we set the binary fea-

ture HTTP200 RESPONSE to 1.

If no HTTP

200 Response is present in the trace, this fea-

ture is set to 0.

We also set a special fea-

ture, NO HTTP RESPONSE, to 1 if no HTTP

response of any kind exists in the trace.

2. Sequence: The presence or absence of a partic-

ular sequence of packets in a trace could also be

symptomatic of a network problem. For example,

a faulty trace might contain a TCP SYN packet

but no corresponding TCP SYN/ACK packet,

which might be indicative of a problem. To cap-

ture such eﬀects, we deﬁne composite binary fea-

tures, for example, one corresponding to the se-

quence TCP SYN → TCP SYNACK. We set this

feature to 1 if and only if a TCP SYN is followed

by a TCP SYNACK on the same connection. In

all other cases, including when no TCP SYN is

seen, it is set to 0.

We use our domain knowledge to trim the set of

such composite features.

For example, the se-

quence TCP RST → TCP SYN is not meaningful

and therefore we do not record it.

3. Aggregate features: As we manually debugged

site-speciﬁc problems by looking at traﬃc ex-

changed with a small set of websites, we found

that the network traces for each site have very uni-

form characteristics in terms of aggregate quanti-

ties such as number of successful TCP connections,

number of successful HTTP connections, number

of bytes transferred, etc. While these features may

be unimportant for generic networking problems,

they are are particularly useful in the context of

site-speciﬁc problem signatures for certain (pop-

ular) remote sites or services. For example, the

www.cnn.com webpage has speciﬁc characteristics

in terms of its layout and the number of objects

on the page. The aggregate features can thus help

capture the problem symptoms speciﬁc to this site.

Also, the popularity of this site might make it

worthwhile to learn site-speciﬁc signatures for it.

In our implementation, we have written feature ex-

tractors for 12 protocols using Microsoft Network Mon-

itor’s Parser API [1]. Table 1 shows the set of protocols

and the number of features we capture from each pro-

tocol. For each application, we choose a subset of these

protocols to determine what features to extract. Fea-

ture extractors for some protocols take as input a port

number so that they extract features speciﬁc to traﬃc to

and from that port. For example, when analyzing traces

from browsers, we extract TCP-level features speciﬁc

Protocol

No. of Features

ARP

DNS

HTTP

IKE/AUTHIP

IP

1

LLMNR

NBNS

RWS

SMB/SMB2

TCP

TLS

UDP

Table 1:

Number of features for each protocol.

to port 80. We extract more features for higher-level

protocols such as HTTP and SMB because these pro-

tocols provide richer error information in their headers

than lower level protocols like TCP and IP. The fea-

ture extraction algorithm traverses each trace once to

extract all the features and is therefore is not compute-

intensive.

DEJA VU ALGORITHM

The Deja vu algorithm is used to identify signatures

corresponding to the various problems experienced by

an application. The algorithm takes as input the sets of

features from multiple application runs, each of which is

labeled as GOOD (if the run was successful) or BAD (if

the run was unsuccessful). The goal, and the key chal-

lenge, is to work with these coarse labels to decompose

the set of BAD runs into meaningful categories corre-

sponding to the diﬀerent problems experienced by the

application.

This challenge sets the problem apart from stan-

dard problems of classiﬁcation and clustering in ma-

chine learning.

Unlike with classiﬁcation, the cate-

gories of faults are not known in advance and hence we

do not have (ﬁne-grained) labels corresponding to the

categories. Unlike with clustering, the coarse-grained

GOOD/BAD labels do matter since our goal, speciﬁ-

cally, is to sub-categorize the BAD runs. In contrast,

clustering on the BAD runs to ﬁnd categories might

bunch together runs based on some extraneous features

(e.g., the presence or absence of background noise).

These extraneous features may be similarly present or

absent in GOOD runs too, which means they are, in

essence, inconsequential and should have no bearing on

categorizing diﬀerent kinds of BAD runs.

Consider the following example. Say half the traces

for an application (both GOOD and BAD) have ARP

requests, and the other half do not since the end-host

caches ARP responses. Therefore, the presence or ab-

sence of a ARP request is not a symptom of a prob-

lem. However a conventional clustering algorithm on

Figure 1: A signature tree learnt by Deja vu

the BAD traces will divide these traces into two cate-

gories based on the presence or absence of this feature.

On the other hand, including the GOOD traces in our

learning procedure will help us learn that the presence

of the ARP request is inconsequential since half of the

GOOD as well as the BAD traces have this feature, and

the other half do not.

To address the challenge of learning the categories of

problems from coarse-grained labels, we have developed

a novel algorithm for Deja vu. We start by illustrating

the algorithm with an example in Section 4.1, and then

specify it in Section 4.2.

4.1

Example

We now illustrate the operation of the Deja vu al-

gorithm by demonstrating how an example signature

tree is constructed for a browser application. The main

intent is to extract multiple problem signatures from

the input feature sets. Toward this goal, the algorithm

makes use of the C4.5 decision tree classiﬁer([12]) in

multiple iterations. The signature tree captures all the

problem signatures for the application. Figure 1 shows

an example signature tree.

In the ﬁrst iteration, the Deja vu algorithm inputs the

feature sets corresponding to all runs, labeled GOOD or

BAD, to the C4.5 decision tree classiﬁer. The classiﬁer

outputs the tree shown within the box titled “Iteration

1” in Figure 1. This tree tells us that:

If a trace does not have an HTTP 200 Response, it is

BAD.

While this statement tells us something about a

generic symptom that all the bad traces have, it does

not help categorize the bad traces into diﬀerent sub-

categories or problems. So, we force the decision tree

classiﬁer to learn an alternative way of classifying the

bad traces by removing the HTTP200 RESPONSE fea-

ture from all good and bad traces, and inputting the

pruned feature sets to C4.5 in Iteration 2. The output

of this iteration is shown in the box marked “Iteration

2” in Figure 1. This tells us that:

If the trace does not have an HTTP 304 Response, it

is BAD.

However, in this iteration too, the decision tree

classiﬁes all bad traces into one category because it

found yet another (novel) feature that is common to

all bad traces. Hence, for iteration 3, we remove the

HTTP304 RESPONSE feature from all good and bad

feature sets and learn a decision tree again. In iteration

3, the tree has 2 leaf nodes labeled BAD:

If the trace (a) has no TCP SYN/SYNACK exchange

on port 80, OR (b) has a TCP SYN/SYNACK

exchange on port 80 AND an HTTP 502 Response

(proxy error), it is BAD.

Hence, by iteratively removing features that exist in

all BAD traces, the algorithm has succeeded in learn-

ing two diﬀerent problem signatures and sub-categorizes

the bad traces across them. The absence of a TCP con-

nection characterizes the ﬁrst signature (marked (a)),

while an HTTP 502 response (proxy error) character-

izes the second (marked (b)).

In iteration 4, we divide the BAD feature sets into two

categories: one each for the two leaves labeled BAD

in Iteration 3 above. The ﬁrst category has no TCP

SYN/SYNACK exchange on port 80, whereas the sec-

ond category does but also includes an HTTP 502 re-

sponse. With each category of bad feature sets, we sep-

arately repeat the procedure described in the previous

iterations. Therefore, from the BAD feature sets in the

ﬁrst category as well as from all GOOD feature sets,

we exclude the “TCP SYN → TCP SYNACK on port

80” feature and then rerun C4.5. On the other hand,

from the BAD feature sets in the second category as

well as from all GOOD feature sets, we exclude both

the “TCP SYN → TCP SYNACK on port 80” feature

and the “HTTP 502 Response” feature, and then rerun

C4.5.

Iteration 4 for the ﬁrst BAD category obtained in

Iteration 3 gives us the tree shown in the box marked

“Iteration 4” in Figure 1. This tree tells us that

If the trace (a) has no IP traﬃc,OR (b) has IP traﬃc

but no Successful DNS Query, it is BAD.

Yüklə 210 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6