A mapReduce Skeleton for Skandium

Yüklə 427,52 Kb.

Pdf görüntüsü

səhifə	13/19
tarix	05.03.2018
ölçüsü	427,52 Kb.
	#30153

1 ... 9 10 11 12 13 14 15 16 ... 19

Chapter 5

Performance Evaluation

So far, we have presented the programming interface of the MapReduce skeleton as

well as its internal structure. Built on top of Skandium, our system completely hides

threading issues from the programmer. With the addition of the predeﬁned muscles into

the Skandium library, the system also hides from the programmer complex function-

ality that is independent of the problem in hand. The user can focus on implementing

only the map and reduce muscles and optionally, the split and merge muscles. The

latter makes the skeleton more ﬂexible: it can be used to process any input and it can

produce any output. At the same time the skeleton’s interface is similar to that of the

other Skandium skeletons so it can be used with minimum disruption to the Skandium

programmer. Moreover, the skeleton’s implementation naturally ﬁts the existing Skad-

nium infrastructure and its modular design provides a clear solution.

The next question is: How well does our solution perform? Performance usually

conﬂicts with the ease of use and so this chapter is dedicated to present the evalua-

tion of the skeleton’s performance. This chapter begins with the evaluation of the four

different schemes that we described in the previous chapter. The schemes are evalu-

ated against 5 different applications on two different multi-core platforms. The best

performing scheme is then evaluated further and compared in terms of programmabil-

ity and performance with traditional parallel programming techniques based on Java

threads.

5.1

Experimental Method

This section describes the experimental methodology we used to evaluate the MapRe-

duce skeleton.

Chapter 5. Performance Evaluation

System 1

System 2

Hardware Settings

CPU Type

Intel Xeon E7320

Intel Xeon E5645

Clock Freq

2.13 GHz

2.40 GHz

Processor Count 4

Cores/Processor

L1 Cache

32KB, 8-way SA, 64-byte line size

32KB, 8-way SA, 64-byte line

size

L2 Cache

2 x 2MB 8-way SA, 64-byte line size 6 x 256KB 8-way SA, 64-byte

line size

L3 Cache

12MB

Software Settings

Operating System Scientiﬁc Linux 5.5

Red Hat Enterprise Linux Server

release 5.4

Java

OpenJDK 64-Bit Server VM

HotSpot 64-Bit Server VM 2

Table 5.1: The characteristics of the two systems that were used for the skeleton’s

evaluation

5.1.1

Shared Memory Systems

One of the most important factors that affect the model’s performance on shared mem-

ory platforms is the machine’s memory hierarchy [5]. As we want to investigate how

this parameter affects our implementation, we run our experiments on two machines

with considerably different memory sub-systems. The two systems that we used are

described in Table 5.1. System1 is a 4-processor machine with four cores per pro-

cessor and System2 is a 2-processor machine with six cores per processor. The ﬁrst

system has two L2 caches shared by the four cores while the second system has a

separate L2 cache per core and a common L3 cache shared by all six cores.

5.1.2

Applications

In chapter 2, we presented some typical MapReduce applications that we used for

the performance evaluation of the skeleton. Table 5.2 describes the data sets that

were used for each application as well as the characteristics of each workload i.e. the

number and the size of the emitted pairs as well as the number of the distinct keys that

as we will see in the next section, considerably affect the skeleton’s performance.

Chapter 5. Performance Evaluation

Data Set

No of Pairs Average Pair Size (Bytes) No of distinct keys

Word Count

text 100MB

18 M

150 K

Matrix Multiply 2 matrices 1600*1600

2.5 M

Inverted Index text 100MB, 10 words to

be searched

38

10

Histogram

bitmap 100MB

100M

48

768

KMeans

1M points, 3 dimensions,

100 clusters

1 M

100

Table 5.2: The characteristics of the ﬁve applications that were used for the skeleton’s

evaluation

For our evaluation, we implemented the sequential algorithm for all benchmarks to

serves as the baseline for speedups. In the ﬁrst set of experiments which are presented

in the next sections, we have the initial heap size of the Java virtual machine set to

a very big number in order to eliminate all Garbage Collections. In this way, we can

focus on other parameters that affect the performance of our solution. The performance

penalty caused by the garbage collector is evaluated separately in the last section of this

chapter.

5.2

Evaluation of the four implementation schemes

Figure 5.1 shows the efﬁciency of the four implementation schemes compared to the

sequential algorithm for the two machines described in Table 5.1. The efﬁciency is

calculated as the ratio of the speedup over the sequential algorithm to the number of

cores in each machine. In this experiment, each implementation scheme uses at its

parallel stages as many workers as the number of cores in each machine i.e. 16 for

System1 and 12 for System2. There are three main observations that we can make on

these results.

Our ﬁrst observation is that some applications do not perform well, regardless of

the machine or the implementation scheme. These are the Histogram and KMeans

applications. The reason for this is that the MapReduce model is not an efﬁcient ﬁt for

these two applications [5].

Histogram is a good example of the model’s inefﬁciency when the problem in hand

is not naturally associated with any keys. Unlike applications like word count, the

output of Histogram is predictable. It consists of three ﬁxed-width tables (i.e. 3 x 256)

Yüklə 427,52 Kb.

Dostları ilə paylaş:

1 ... 9 10 11 12 13 14 15 16 ... 19