A systematic Characterization of Application Sensitivity to Network Performance

Yüklə 0,74 Mb.

Pdf görüntüsü

səhifə	41/51
tarix	15.10.2018
ölçüsü	0,74 Mb.
	#74178

1 ... 37 38 39 40 41 42 43 44 ... 51

SPINE LogGP Parameters.
Chapter 7 Conclusions

110
Platform
¢
(
£
s)
¤
(
£
s)
¥
(
£
s)
MB/s
¦¨§
©
SPINE IP Router
0.0
95
155
17.0
USC/ISI IP Router
80
80
-
16.7
UltraSPARC GAM
2.9
5.8
5.0
38
Table 6.2: SPINE LogGP Parameters.
This table shows the LogGP performance of the SPINE IP router, the USC/ISI IP router, and the
Berkeley GAM system.

is at 2 KB packet size; larger sizes are possible for the USC/ISI router. Both
the USC/ISI and SPINE routers use identical hardware, but much different software architectures.
The gap of the USC/ISI router is equal to the overhead, because the CPU is the bottleneck for small
packets. The gap in the SPINE router is limited by the internal scheduling algorithm of the SPINE
I/O run-time. Latency results were not reported for the USC/ISI router. The table shows the SPINE
safety and functionality services in the LANai signiﬁcantly increase the
¤
and
¥
terms over the basic
GAM parameters.
concurrent events is substantial. In terms of our pipeline model, these occupancies show up in the
ﬁxed cost per packet, i.e., the occupancy. Each packet requires 29 events to process, resulting in a
long occupancy of 100
£
s per packet. Many of these events are checks for events that never occur.
For example, polls to queues that are empty. However, many are more insidious forms of occupancy,
such as events to manage the concurrency of the host-DMA engine.
A somewhat disappointing result is that in spite of aggressive overlap, the occupancies of
the LANai processor greatly lengthen the period of packet processing. Observe how the box outlin-
ing the “DMA packet 1” in Figure 6.4 is lengthened by 10
£
s due to the polls to the input queues
during the period between 75-85
£
s. If the LANai processor had better PCI messaging support, the
period could be reduced.
The net result is that the pipeline formed by the SPINE IP router has a
¤
of within 15
£
s and
a slightly better

than a router built from the same 200 MHz Pentium Pro processor, the same LANai
cards and a modiﬁed BSD TCP/IP stack [105].
1
Table 6.2 summarizes the LogGP parameters of the
two routers. Although the SPINE architecture can obtain close to the same performance in terms of
gap and latency with a weaker CPU, it is not clear that without additional architectural support, such
as a faster embedded processor or additional messaging support, if the overhead reducing techniques
of the SPINE architecture are worth the additional software complexity.
In the server context where high CPU utilization due to I/O results in unacceptable per-
formance degradations, such as in Global Memory Systems [102], SPINE-like architectures make

The device driver copies only the packet headers into host memory. Special code in the device driver and LCP does
a direct LANai-to-LANai DMA after the regular host OS makes the forwarding decision.

111
sense. However, in the more general case advanced I/O architectures should improve gap and la-
tency as well. Architectures more radical than SPINE are needed to deliver an order of magnitude
performance improvement for single communication streams. A slew of novel software protocols
can deliver this kind of overhead reduction [25, 72, 81, 103]. However, an unfortunate problem with
these protocols is that in order to obtain their performance one loses connectivity to a vast body of
applications. An open question is thus if new I/O architectures can obtain an order-of-magnitude
performance improvement over traditional designs while maintaining connectivity to common pro-
tocols stacks.

112
Chapter 7
Conclusions
Caltrans spent $1 billion to replace the old Cypress freeway. It spent millions more
to widen Interstate 80. But ... the commute hasn’t gotten any better. The problem is
not the new Cypress freeway – it’s getting to it from Berkeley and beyond. —Catherine
Bowman, SF Chronicle, Feb. 8, 1999.
This 980 thing has been ridiculous. —David E. Culler, SF Chronicle, Oct. 1, 1998.
This chapter concludes the thesis. We organize our conclusions around the four areas of
contributions: performance analysis, observed application behavior, architecture and modeling. Each
section also provides some perspective about how this thesis ﬁts into the wider context of computer
science and the sciences in general.
We conclude this chapter with a short analogy in the hope that it will help the reader re-
member our results. We then present some open questions and promising areas of research. We end
with some ﬁnal thoughts for the reader to contemplate.
7.1
Performance Analysis
This thesis demonstrates that performing application-centric sensitivity emulation experi-
ments validated with analytic modeling is a powerful strategy for understanding complex computer
systems. The fundamental premise of the method is that by introducing precision delays in key
components we can understand their importance to overall system performance. Our perturbation
method is surprising simple, almost to the point of seeming uninteresting. However, system design-
ers use a similar style of analysis all the time in analytic modeling—so much so that the style has a
name: bottleneck analysis. What makes the method in this thesis unique is that we have applied a
similar methodology to real systems as opposed to analytic models or simulations.

Yüklə 0,74 Mb.

Dostları ilə paylaş:

1 ... 37 38 39 40 41 42 43 44 ... 51