A mapReduce Skeleton for Skandium

Yüklə 427,52 Kb.

Pdf görüntüsü

səhifə	17/19
tarix	05.03.2018
ölçüsü	427,52 Kb.
	#30153

1 ... 11 12 13 14 15 16 17 18 19

Chapter 6 Optimization of the MapReduce Skeleton
6.1.2 Using a binary search tree for O log(n) store

Chapter 5. Performance Evaluation

Figure 5.8: The garbage collector impact on the performance of the four MapReduce

applications

Chapter 6

Optimization of the MapReduce

Skeleton

From the results of the previous chapter, it is evident that as far as performance is con-

cerned, our solution has some clear weaknesses. From the comparison with the manual

java threading we can draw the conclusion that our implementation although it protects

the programmer from some pitfalls of parallelism, it is by far outperformed by the man-

ual threading scheme . Our next consideration is how we can narrow the performance

gap between the two approaches so we can use the beneﬁts of our skeletal scheme

i.e. better programmability without losing too much on the performance comparing

to manual threading. In this chapter, we identify the problems of our implementation,

and we propose a set of optimizations that aim at improving the skeletons performance

in order to match, or be comparable, to the manual threading approach.

In addition, we will consider the potential of the skeleton’s auto-tuning and we

will present a simple auto-tuning mechanism that allows the implementation to bypass

stages of the map reduce pipeline that are unnecessary for certain types of applications.

6.1

Hash Table Improvements

6.1.1

Choosing an optimal Hash Table Size

In our original MapReduce implementation, the size of the hash arrays that are used to

store the intermediate pairs is ﬁxed to 256 entries. The ﬁxed size turns to be ineffective

for applications like word count, where the number of distinct keys is quite big (i.e.

150.000 distinct keys for word count). The big number of distinct keys and the few

Chapter 6. Optimization of the MapReduce Skeleton

hash array entries mean that the sorted ArrayLists of Combiners in each entry of the

hash array can grow large. This slows down the store process since larger lists should

be searched for the existence of a Combiner and their elements should be moved down

for the insertion of the Combiner if it does not already exist in the list.

Ideally, we want to have a big hash array, whose size is proportional to the number

of the distinct keys in the application. This leads to short lists in each entry and fast

storing of the pairs. At the same time, we do not want to have a hashArray much

bigger than the number of distinct keys ﬁrst because we are wasting memory, and

second because the combiners will then be more dispersed in the array and this leads

to a more expensive join stage.

Figure 6.1 shows the speedup of the word count application as we change the size

of the hash array. We show the results on both System and System2

Figure 6.1: The Speedup with varying hash array size for word count on System1(left)

and System2(right)

On System1, we can see that the speedup is very low for a small hash array and

gradually increases until it reaches a maximum at 64 K entries (i.e. 3 Combiners per

entry) and then drops again. The hashArray that was used in our original version

had 256 entries and it is shown in red in the ﬁgure. Similar behavior is observed on

System2.

Matrix Multiplication beneﬁts even more with the increase in the size of the hash

array. This is because the number of Combiners is signiﬁcantly larger in this case (i.e.

2.5M pairs), meaning that for a small HashArray the lists in each entry can grow too

large. As illustrated in Figure 6.2, the performance beneﬁts are really high compared

to our original size for both systems.

Figure 6.3 shows the same experiment for inverted index. The speedup remains

Chapter 6. Optimization of the MapReduce Skeleton

Figure 6.2: The Speedup with varying hash array size for matrix multiply on System1

(left) and System2(right)

almost invariant, while a signiﬁcant decrease is observed on both systems only when

the hash array grow too large. Since the small hashArray in these application performs

equally well to a larger one, it is more beneﬁcial to use a small hash array, to reduce

the memory consumed by the structure.

Figure 6.3: The Speedup with varying hash array size for inverted index on System1

(left) and System2 (right)

Figure 6.4 shows the same experiment for KMeans. Similarly to inverted index,

the speedup is uniform as we increase the hash array size, and drops only when the

array grows too large.

A careful look at these ﬁgures indicates that on System1, a signiﬁcant speedup drop

occurs for all applications when the hashArray is increased past 64k entries, while on

System2 this signiﬁcant drop occurs at 512k entries. The fact that the drop occurs at the

same point for all applications and the fact that that this decrease occurs for different

Chapter 6. Optimization of the MapReduce Skeleton

Figure 6.4: The Speedup with varying hash array size for Kmeans on System1 (left)

and System2 (right)

array sizes on the two machines lead us to the assumption that the machine itself must

impose a hard limit on the size of the data structure. This hard limit is most probably

related to the cache hierarchy of each machine. For example, if the hashArray grows

too large, it may not ﬁt in the L1 cache so we have two many L1 cache misses, either

at the store or at the partition stage which degrade the performance of the application.

Our two systems have L1 caches of equal size, nevertheless, the hierarchy of System2

with the private L2 cache per processor and the common L3 cache before the main

memory can apparently handle better the L1 cache misses compared to the architecture

of System1 which has a shared by two processors L2 cache and no L3 cache.

To summarize, no single hash array table size works best for all the workloads. For

applications that produce of a small number of keys the size of the hashArray should be

kept to a minimum. On the other hand for applications that produce a large number of

pairs the size of hashArray should be increased up to the point that it does not interact

negatively with the system’s cache subsystem or up to the point where the values are

not too dispersed in the hashArray, something that would slow down the join stage.

6.1.2

Using a binary search tree for O log(n) store

In this section, we present a second improvement on the data structure that holds the

intermediate keys. This optimization has also been used by [14]. The cost for the

store muscle to insert a combiner into the hash array depends on whether the key is

already present in the relevant hash array entry’s list. If the key is present, and the

number of the stored combiners in the list is k, the cost is O(1) to hash to the correct

hashArray entry and O(logk) to ﬁnd the existing key in the entry’s list. If the key is

Yüklə 427,52 Kb.

Dostları ilə paylaş:

1 ... 11 12 13 14 15 16 17 18 19