A systematic Characterization of Application Sensitivity to Network Performance

Yüklə 0,74 Mb.

Pdf görüntüsü

səhifə	29/51
tarix	15.10.2018
ölçüsü	0,74 Mb.
	#74178

1 ... 25 26 27 28 29 30 31 32 ... 51

NPB Sensitivity to Bandwidth and Gap
4.3.2 Application Behavior
BW(1/G) FT IS MG
NPB Predicted vs. Measured Run Times Varying Bulk Gap
4.3.3 Network Architecture
4.3.4 Modeling
Chapter 5 NFS Sensitivity

73
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
5
10
15
20
25
30
35
40
Slowdown
r
MB/s
FT
IS
MG
0
1
2
3
4
5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Slowdown
r
Gap (usec/byte)
FT
IS
MG
(a)
(b)
Figure 4.6: NPB Sensitivity to Bandwidth and Gap
This ﬁgure plots slowdown as a function of maximum available network bandwidth (a) as well as Gap
(b). Bandwidth is a more intuitive measure, but it is difﬁcult to visualize the sensitivity to bandwidth
from the Figure (a) because as we scale
s
in a linear manner we are plotting a
t
u
bandwidth curve.
Figure (b) shows FT and MG slow down in a hyper-linear fashion as we scale
s
linearly.
this problem would be to add delays into the MPI layer directly.
As a side effect of slowing down GAM instead of MPI, the apparatus introduces a some-
what artiﬁcial sensitivity to
v
. For example, many machines, (e.g., Cray T3D, Intel Paragon, Meiko
CS-2 [7, 31]), do not introduce extra overhead on a per-fragment basis. However, a number of TCP/IP
stacks do exhibit per-fragment overheads [26, 60, 61]; such per-fragment overheads form a visible
“sawtooth” line in per-byte costs. Our per-fragment overhead is thus a reasonable, if somewhat ex-
aggerated, approximation of this class of software overhead.
4.3.2
Application Behavior
Turing our attention to the application behavior area, the NPB we examined in this chapter
have a communication structure that is dominated by infrequent, large and bursty communication.
The resulting sensitivity to
s
is quite intuitive. They are also sensitive to
v
somewhat, although our
results are somewhat inﬂated due to the apparatus construction mentioned previously. Sensitivity to
w
was almost very low, even non-existent for FT.
The structure of the NPB shows their origins quite clearly: the codes were developed on

74
BW(1/G)
FT
IS
MG
(MB/s)
measure
predict
measure
predict
measure
predict
37
173.3
173.3
18.7
18.7
17.8
17.8
19
177.0
181.6
22.1
19.7
23.5
18.5
15
171.8
186.2
19.6
18.8
19.5
18.9
11.2
186.2
193.5
23.0
21.2
19.9
19.5
4.6
225.6
235.2
22.6
26.4
25.5
23.0
1.2
660.9
437.6
-
-
60.8
40.2
Table 4.3: NPB Predicted vs. Measured Run Times Varying Bulk Gap
This table demonstrates how well our model for sensitivity to Bulk Gap predicts observed slowdown
for the 32 node runs. For each application, the column labeled
xyz!{|$}%y
is the measured runtime,
while the column labeled
~$}%y!02
is the runtime predicted by our model. The model accurately pre-
dicts measured run times for bandwidths greater than 10Mb/s.
MPP machines as the iPSC and Delta which have very high message passing costs [32]. The high
communication costs, yet ample bandwidth, on these machines lead to a design where communica-
tion is avoided as much as possible. When communication is necessary, it is packed into a few large
messages. Given this history, the rather low sensitivities should not be too surprising.
It is instructive to compare the structure and resulting sensitivity of the NPB codes to the
Split-C/AM codes. The Split-C/AM were developed in the context of very low-overhead machines,
e.g., the CM-5, and Berkeley NOW. Thus, these programs assumed low-overhead and so show quite
a high sensitivity to it. This raises the a classic engineering analysis question: are our results merely
the result of a historic accident, or is there something more fundamental going on? We shall explore
this question in greater detail in Chapter 7.
4.3.3
Network Architecture
Our architectural conclusions for the NPB are rather meager. Primarily, they are sensitive
to per-byte network bandwidth. In particular, as we saw from the communication balance graphs, a
machine’s bisection bandwidth will be an issue for two of these benchmarks. However, unlike the
Split-C/AM applications, communication is so infrequent on the machine sizes studied that for the
NPB, improvements in per-node processor performance, as opposed to network performance, will
yield the largest beneﬁts.

75
4.3.4
Modeling
In the modeling area, we ﬁnd that the 3 benchmarks exhibit stronger sensitivities to low-
performance networks than simple models describe. However, this should not cause too much con-
cern. The fact that the models fail to describe this class of low performance networks should not be
surprising given that we are scaling the apparatus by an order of magnitude. The only conclusion
one can draw is that in spite of their highly optimized communication, very cheap, low performance
networks are not suitable for the NPB.

76
Chapter 5
NFS Sensitivity
... but just running a lot of simulations and seeing what happens is a frustrating and
ﬁnally unproductive exercise unless you can somehow create a ”model of the model”
that lets you understand what is going on. — Paul Krugman, from a talk given to the
European Association for Evolutionary Political Economy.
In this chapter, we examine the sensitivity of Sun’s Network File System (NFS) to network
performance. Our motivation is driven by the fact that previous work shows that 60%-70% of LAN
trafﬁc is ﬁlesystem related [48, 76]. We apply the same basic methodology used in the previous
two chapters. The NFS application parameter space, however, is much larger than he Split-C/AM
programs or the NPB. In the previous two chapters, run-time was the simple ﬁgure of merit. In the
NFS case, there can be many different metrics, e.g. read bandwidth, write bandwidth, and response
time.
Our method of ﬁxing the class of inputs that are the traditional characteristics of NFS work-
loads, e.g. the mix of reads/writes/lookups, is to use the SPECsfs benchmark [94]. The SPECsfs
benchmark is an industry-standard benchmark used to evaluate NFS servers. The networking pa-
rameters are the same LogGP parameters used throughout this thesis.
The output of the SPECsfs benchmark is a two-dimensional curve of response time vs.
throughput, for a ﬁxed mix of operations, as opposed to a point-metric such as run-time. Because
of the two-dimensional nature the SFS curve, our results are presented differently than in previous
chapters. Instead of a ﬁxed slowdown line, the results are three dimensional: throughput vs. re-
sponse time vs. change in network performance. While we could plot a single 3-D graph of these
parameters, it is more informative to plot a series of 2-D graphs. The SFS curve has important struc-
tures that would be difﬁcult to discern in a single 3-D graph. We detail the important parts of the

Yüklə 0,74 Mb.

Dostları ilə paylaş:

1 ... 25 26 27 28 29 30 31 32 ... 51