Chapter 5. Performance Evaluation
39
5.4
Comparison to manual Java threading
In this section, we compare our MapReduce implementation with traditional parallel
programming techniques. The comparison is focused on the performance of the two
techniques but we also report on the expressiveness of each approach although this is
rather subjective.
For this set of experiments, we initially ported our sequential applications (word
count, matrix multiply, inverted index , KMeans) to a Java multi-threaded environment.
We then measured how the performance scales as we increase the number of threads for
the different applications and how it is compared to the performance of the MapReduce
implementation. Apparently, the parallelization of each application using low level
threading can be achieved in many different ways and there are numerous approaches
to harvest concurrency each of which has different efficiency and requires different
amount of effort from the programmer. The next section presents in more detail our
implementation decisions.
5.4.1
Implementation using manual threading
Our manual threading applications are not optimized, we implemented these under
the assumption that the programmer is mainly focused to solve the problem in hand
(e.g. counting words) he has basic knowledge of threading techniques, but he has put
only a moderate effort on making the parallel execution efficient. All of our appli-
cations use explicit threading in the sense that the threads are created, managed and
killed in the absolute responsibility of the programmer rather than using more abstract
frameworks that accommodate parallel solutions like the ThreadPool Executor or the
Fork/Join Framework. Next, we briefly present for each application how is was imple-
mented using Java threads and we report its performance comparing to the MapReduce
implementation.
5.4.1.1
Word Count
In the word count application, each thread processes a section of the original text and
stores the words in a concurrentHashMap along with a counter that indicates how many
times each word has occurred. All threads use the same hash map to store the words.
When a new word is found, the thread checks if the word is already present in the
ConcurrentHashMap. If so, its counter is incremented, else a new entry for the word is
Chapter 5. Performance Evaluation
40
created.
Since the ConcurrentHashMap in Java cannot hold primitive data types, the counter
is implemented as an object. The counter was implemented in two different ways, that
are both presented here to show how a minor implementation decision like this can
seriously affect the performance of the application in a multi-threaded environment.
In the first, naive implementation, a Java Integer object is stored in the HashTable
along with each word. The Integer object indicates how many times the word has
occurred. Since the Integer objects are immutable in Java, i.e. their state cannot be
modified after the object is created, for each new occurrence of the word a new Integer
object is created, with its values equal to the value of the previous object plus one and
it replaces the old Integer object in the hashMap entry.
In the second implementation, we created a Counter object that has a single field
of primitive integer type. The difference here is that the field of the same object is
updated, without allocating a new object.
Figure 5.4 compares the performance of the two different implementations and how
are they compared to the MapReduce word count implementation.
Figure 5.4: MapReduce vs Manual Threading for word count on System1 (left) and
System2 (right)
We can see that the naive implementation with Java threads exhibits very poor
performance on System1 and it does not scale to more than four workers. This is
because we have too many object allocations and as we increase the number of threads
the allocation rate gets so high that it leads to a large amount of memory bus write
traffic that limits the scalability and performance of the application [15]. In System2
the L3 cache apparently absorbs much of the memory traffic but the net effect remains,
although it is of a smaller scale.
Chapter 5. Performance Evaluation
41
On the other hand, the ”careful” implementation achieves very good speedup and
scalability. The performance of the MapReduce skeleton is between these two thread-
ing approaches.
5.4.1.2
Matrix Multiplication
In this application, each thread is assigned a chunk of rows of matrix A and the whole
matrix B. It then simply computes the corresponding rows of the result matrix. The
manual threading scheme is expected to present significant performance benefits com-
pared to the MapReduce implementation since the threads directly manipulate the re-
sult matrix and the unnecessary pair manipulation of the MapReduce implementation
does not take place.
Figure 5.5 shows the results on the two Systems. As expected, the manual Java
threading achieves far better performance than MapReduce for this application. The
manual threading scheme performs slightly better on System 2, however its perfor-
mance considerable drops when the number of workers is increased passed the number
of cores in contrast to System1 where the performance is maintained at about the same
levels.
Figure 5.5: MapReduce vs Manual Threading for matrix multiply on System1 (left) and
System2 (right)
5.4.1.3
Inverted Index
The inverted index application implemented with manual threading is very similar to
word count. The concurrentHashMap is used by all threads and stores the words along
with a list of their position in the text. The threads update the position list when they