A mapReduce Skeleton for Skandium

Yüklə 427,52 Kb.

Pdf görüntüsü

səhifə	15/19
tarix	05.03.2018
ölçüsü	427,52 Kb.
	#30153

1 ... 11 12 13 14 15 16 17 18 19

5.4.1 Implementation using manual threading
5.4.1.1 Word Count
5.4.1.2 Matrix Multiplication
5.4.1.3 Inverted Index

Chapter 5. Performance Evaluation

39

5.4

Comparison to manual Java threading

In this section, we compare our MapReduce implementation with traditional parallel

programming techniques. The comparison is focused on the performance of the two

techniques but we also report on the expressiveness of each approach although this is

rather subjective.

For this set of experiments, we initially ported our sequential applications (word

count, matrix multiply, inverted index , KMeans) to a Java multi-threaded environment.

We then measured how the performance scales as we increase the number of threads for

the different applications and how it is compared to the performance of the MapReduce

implementation. Apparently, the parallelization of each application using low level

threading can be achieved in many different ways and there are numerous approaches

to harvest concurrency each of which has different efﬁciency and requires different

amount of effort from the programmer. The next section presents in more detail our

implementation decisions.

5.4.1

Implementation using manual threading

Our manual threading applications are not optimized, we implemented these under

the assumption that the programmer is mainly focused to solve the problem in hand

(e.g. counting words) he has basic knowledge of threading techniques, but he has put

only a moderate effort on making the parallel execution efﬁcient. All of our appli-

cations use explicit threading in the sense that the threads are created, managed and

killed in the absolute responsibility of the programmer rather than using more abstract

frameworks that accommodate parallel solutions like the ThreadPool Executor or the

Fork/Join Framework. Next, we brieﬂy present for each application how is was imple-

mented using Java threads and we report its performance comparing to the MapReduce

implementation.

5.4.1.1

Word Count

In the word count application, each thread processes a section of the original text and

stores the words in a concurrentHashMap along with a counter that indicates how many

times each word has occurred. All threads use the same hash map to store the words.

When a new word is found, the thread checks if the word is already present in the

ConcurrentHashMap. If so, its counter is incremented, else a new entry for the word is

Chapter 5. Performance Evaluation

created.

Since the ConcurrentHashMap in Java cannot hold primitive data types, the counter

is implemented as an object. The counter was implemented in two different ways, that

are both presented here to show how a minor implementation decision like this can

seriously affect the performance of the application in a multi-threaded environment.

In the ﬁrst, naive implementation, a Java Integer object is stored in the HashTable

along with each word. The Integer object indicates how many times the word has

occurred. Since the Integer objects are immutable in Java, i.e. their state cannot be

modiﬁed after the object is created, for each new occurrence of the word a new Integer

object is created, with its values equal to the value of the previous object plus one and

it replaces the old Integer object in the hashMap entry.

In the second implementation, we created a Counter object that has a single ﬁeld

of primitive integer type. The difference here is that the ﬁeld of the same object is

updated, without allocating a new object.

Figure 5.4 compares the performance of the two different implementations and how

are they compared to the MapReduce word count implementation.

Figure 5.4: MapReduce vs Manual Threading for word count on System1 (left) and

System2 (right)

We can see that the naive implementation with Java threads exhibits very poor

performance on System1 and it does not scale to more than four workers. This is

because we have too many object allocations and as we increase the number of threads

the allocation rate gets so high that it leads to a large amount of memory bus write

trafﬁc that limits the scalability and performance of the application [15]. In System2

the L3 cache apparently absorbs much of the memory trafﬁc but the net effect remains,

although it is of a smaller scale.

Chapter 5. Performance Evaluation

On the other hand, the ”careful” implementation achieves very good speedup and

scalability. The performance of the MapReduce skeleton is between these two thread-

ing approaches.

5.4.1.2

Matrix Multiplication

In this application, each thread is assigned a chunk of rows of matrix A and the whole

matrix B. It then simply computes the corresponding rows of the result matrix. The

manual threading scheme is expected to present signiﬁcant performance beneﬁts com-

pared to the MapReduce implementation since the threads directly manipulate the re-

sult matrix and the unnecessary pair manipulation of the MapReduce implementation

does not take place.

Figure 5.5 shows the results on the two Systems. As expected, the manual Java

threading achieves far better performance than MapReduce for this application. The

manual threading scheme performs slightly better on System 2, however its perfor-

mance considerable drops when the number of workers is increased passed the number

of cores in contrast to System1 where the performance is maintained at about the same

levels.

Figure 5.5: MapReduce vs Manual Threading for matrix multiply on System1 (left) and

System2 (right)

5.4.1.3

Inverted Index

The inverted index application implemented with manual threading is very similar to

word count. The concurrentHashMap is used by all threads and stores the words along

with a list of their position in the text. The threads update the position list when they

Yüklə 427,52 Kb.

Dostları ilə paylaş:

1 ... 11 12 13 14 15 16 17 18 19