A systematic Characterization of Application Sensitivity to Network Performance

Yüklə 0,74 Mb.

Pdf görüntüsü

səhifə	43/51
tarix	15.10.2018
ölçüsü	0,74 Mb.
	#74178

1 ... 39 40 41 42 43 44 45 46 ... 51

115
Of the applications studied, the NPB represent the least sensitive to all the LogGP param-
eters. This is partly due to the combination of scaling rule (ﬁxed problem size) and machine size
used. On a 32 processor machine the communication to computation ratio is quite small, even with
the larger class B problem size. Given ﬁxed problem-size scaling, only on very large conﬁgurations,
e.g. 512 processors, will any of the LogGP parameters have much impact on overall performance.
A second reason the NPB are quite insensitive to communication is because these codes
have been extensively studied at the algorithmic level. Over the past 10 years much attention has
been focused on how to minimize communication costs in these codes. Given that the early parallel
machines that the NPB were developed on (e.g. the nCUBE/2 and the iPSC/1) had very large com-
munication costs (5,000+ cycles) it is not too surprising that much attention was given to minimizing
the costs of communication. In fact, literature from the 1980’s often models all communication as
pure overhead, because message passing machines at the time provided little opportunity for over-
lap [42].
It is interesting to conjecture if the application behavior observed is fundamental to the ap-
plication, or simply a historical accident. The developers of each suite certainly had an architectural
model in mind when designing the applications measured in this thesis, and this is reﬂected in the
structure of the applications. For the Split-C/AM benchmarks, the model was low-overhead parallel
machines and clusters. The NPB were designed in the context of previous generation high-overhead
hypercubes. NFS was developed in the LAN context. Perhaps the only certain claim we can make
about the “fundamental” properties of these applications is that with the passage of time, program-
mers will invent new ways to tolerate latency and avoid overhead. The clever application designer
is rewarded for shifting the sensitivity of the application away from

, avoiding

, and towards

and

wherever possible. A common pattern is that as applications age, they ﬁrst lose sensitivity to
latency, then to overhead, and ﬁnally end with some sensitivity to a form of bandwidth (either

or

).
7.3
Architecture
The primary architectural result of this thesis is that software overheads for communica-
tion performance are still too high. Of all the LogGP parameters, the sensitivity to

cannot be over-
stated. This is because many of the latency-tolerating techniques are still sensitive to overhead. For
example, work overlapping and communication pipelining techniques still incur a cost of

on every
message, even though they can mask latency.

116
Even for Split-C/AM applications, which were developed on low-overhead machines, over-
head is still a limiting factor. Sensitivities slopes of 1-2 were common for the Split-C/AM programs.
For NFS, we observed a sensitivity slope of -1.5 in overhead vs. throughput. These sensitivities also
do not have ﬂat regions, implying that further reductions in overhead will have immediate beneﬁts.
We observed some of the beneﬁts of reduced overhead for NFS in the Network Appliance box; that
machine can sustain a much higher throughput than a comparable box running OSF/1 by using a
specialized operating system.
The NAS Parallel Benchmarks did not exhibit much sensitivity to overhead. This result is
clearly explainable by observing the applications’ structure; few messages are sent for the machine
size (32) and problem size (class B) used in this study. Our results might be different if we extrap-
olated to an order-of-magnitude change in machine size, i.e., a 512-node machine. However, such
machines are the uncommon case. Much as they have in the past, “small” conﬁgurations of 32 and
64 nodes will continue to dominate the ﬁeld of parallel computing.
Almost all of the latency tolerating techniques of applications shift the sensitivity from
latency to some form of bandwidth, either per-message as in

or per-byte,

. The pressures that
latency tolerating techniques place on bandwidth are not unique to networking, they have been ob-
served in the CPU regime as well [20]. Although maintaining a high per-byte bandwidth is quite
tractable, obtaining high per-message bandwidths is still an architectural challenge.
Our results on application behavior lead us to the somewhat counterintuitive architectural
conclusion that future architectures do not need to continue to optimize for network transit latency.
Instead, designers should focus on reducing overhead. Programmers are adept at using latency-tol-
erating techniques, thus architectures should focus on enabling programmers to better tolerate la-
tency. From an architectural perspective, building machines to tolerate latency is easier than reducing
the entire end-to-end path. In practice, it means designers should concentrate on improving access
to the network interface while maintaining a high message rate; many latency tolerating techniques
are still sensitive to overhead and gap.
A host of novel techniques exist to reduce both overhead and gap in the network inter-
face hardware/software combination. However, the problem with non-standard techniques is that
they ignore a very large existing infrastructure which is unlikely to change for the foreseeable fu-
ture. Powerful non-technical forces will continue to cause large software overheads. Intuitively, the
network interface is where three vendors’ software and hardware must work together: the operating
system vendor, the switch/hub vendor, and the network interface hardware vendor. Immutable stan-
dards for connecting all three are thus inevitable. For example, porting NFS to any of the alternative

Yüklə 0,74 Mb.

Dostları ilə paylaş:

1 ... 39 40 41 42 43 44 45 46 ... 51