A systematic Characterization of Application Sensitivity to Network Performance

Yüklə 0,74 Mb.

Pdf görüntüsü

səhifə	37/51
tarix	15.10.2018
ölçüsü	0,74 Mb.
	#74178

1 ... 33 34 35 36 37 38 39 40 ... 51

Chapter 6 Investigating Overhead Reduction
Restructure the application.
Change the communication protocol.
SPINE Approach to Overhead Reduction
Add functional units.

98
Using the SPEC data and the results of this study, it is relatively straightforward to deduce the pa-
rameters of the queuing model for a speciﬁc conﬁguration from the published SFS curves. A word
of caution is needed when using this approach: mixing parameters for different hardware/software
conﬁgurations, particularly overhead, can be quite inaccurate.
Deconstructing the NetApp F630 and AlphaServer 4000 5/466 using the data from the
SPEC webpages is an instructive exercise. They both have roughly the same CPU (500 MHz Al-
pha), but the Alphaserver has twice the main memory and disks as the NetApp box. The NetApp
box however, has half the base response time, a much lower slope, and a higher saturation point.
Putting the results into the context of this work, we can conclude that Network Appliance was quite
successful in their bid to reduce overhead via a specialized operating system [51]. Another approach
to obtaining a higher saturation point is to add processors, demonstrated by the 18 CPU Sun system.
Such an approach would not reduce the base response time, however, unless the operating system
can parallelize a single NFS operation.

99
Chapter 6
Investigating Overhead Reduction
There is no such thing as a failed experiment, only more data. —Max Headroom.
The previous three chapters have shown that of all the LogGP parameters, most applica-
tions exhibit considerable sensitivity to overhead. That results points to overhead reduction as a
promising avenue for improving application performance. In this chapter, we present preliminary
work on a novel software architecture, called SPINE [40], which was constructed with overhead re-
duction as a speciﬁc design goal. SPINE allows the application developer to reduce overhead by
partitioning the application between the host CPU and network interface. The potential advantage
of the SPINE approach is that the network interface may be able to reduce overall data movement
and control transfers, both of which impact
ë
, at a cost of an inﬂated gap and latency. The key to this
overhead-reduction technique is to limit the inﬂation of the other parameters.
There are many potential ways which to reduce overhead. Fortunately, they can be classi-
ﬁed into three general methods:
ì
Restructure the application. In this approach, the application is changed to reduce and/or
aggregate communication. An example of this approach can be seen in the successive versions
of the EM3D application presented in [28]. The simplest versions are quite sensitive to
ë
, but a
series of progressively more complex transformations alters the application until it is primarily
sensitive to
í
.
ì
Change the communication protocol. A straightforward method of reducing overhead is
to use a faster communications layer. For example, NFS was initially built on UDP instead of
TCP for exactly this reason [70]. The tradeoff is that the application may have to re-implement
functionality in the higher overhead layer in order to use the other one.

100
Memory
CPU
Network
Interface
$
Network
Interface
Data
Control
System
Interconnect
$
CPU
Network
Interface
Memory
Network
Interface
System
Interconnect
Control
Data
(a)
(b)
Figure 6.1: SPINE Approach to Overhead Reduction
This ﬁgure shows the basic overhead reduction technique used in SPINE. In a normal system, shown
in ﬁgure (a), the CPU must handle control and/or data from the network interface. Control and data
ﬂow for messages in SPINE, as shown in ﬁgure (b), can avoid the main CPU entirely. A unique aspect
of SPINE is the ability to safely run arbitrary application code on the network processor.
î
Add functional units. A familiar approach in the architecture community, this approach has
been spurned in the network community in recent years. The basic idea is to partition the prob-
lem such that multiple hardware units can pipeline packet processing. As we reduce
ï
, we hope
that the additional cost in terms of
ð
,
ñ
and
ò
will not be too high—or may even be less. A
DMA engine is a well known example of an added functional unit that reduces
ï
and often
greatly improves
ò
. Although exploiting parallelism in communication has been explored in
the context of Symmetric Multiprocessors [91], there has been surprisingly little work in more
specialized support.
In this section, we will explore a combination of restructuring and adding functional units
to reduce overhead in an IP router. In the terminology of this thesis, we are trying to push the appli-
cation “work” into the other LogGP parameters. This approach has been tried in the past in many
different I/O contexts. Figure 6.1 shows the basic method behind this approach. In the context of
networking, most work has used off-board protocol processors. A few designs have added a com-
bination DMA/checksum engine [33]. More aggressive designs implemented demultiplexing and
segmentation/reassembly [12].
The dangers of adding functional units to assist the main processor, an thus reduce over-
head, are widely known [26, 50]. The strongest objections tend to be that assist device is “slower”
in some manner. However, “slower” is often ill-deﬁned. More precise deﬁnitions would include in-
creased latency, reduced throughput, or even increased overhead because of added synchronization

Yüklə 0,74 Mb.

Dostları ilə paylaş:

1 ... 33 34 35 36 37 38 39 40 ... 51