Chapter 1. Introduction
2
synchronization issues are completely hidden from the programmer.
MapReduce [8] is a popular structured parallel programming model that realizes
the skeleton idea and it is currently used for application development on large scale
clusters. In this model, the problem-specific code segments that are provided by the
user are just two functions: the Map function and Reduce function: The Map function
is applied to the input and emits a list of intermediate key-value pairs while the Re-
duce function aggregates the values according to the keys. All the other functionality
including the specification of parallelism, the distribution of data and fault tolerance is
provided by the run-time and it is completely hidden from the programmer. MapRe-
duce has already shown its power in solving data intensive problems like large scale in-
dexing, large scale graph computations, processing of satellite imagery and large-scale
machine learning problems [8].
The current popularity of multi cores and the prediction that tens to hundreds (even
thousands) of cores will appear on a single chip in the near future [4] make very attrac-
tive an implementation of the MapReduce model for shared memory systems. Such
an implementation should take into account the key characteristics of shared mem-
ory architectures that distinguish them from cluster architectures. In a shared memory
environment, the main performance concern is not to reduce the communication be-
tween the tasks, but to make the best use of the memory hierarchy and to carefully
select the in-memory data structures that are used to store and retrieve the intermediate
key-value pairs [5]. In addition, failures on multi-core architectures are not so frequent
because the number of nodes comparing to clusters is very small. This means that fault
tolerance is not as important as in the cluster version of the model.
1.1
Project Objectives
The goal of this project is to add the MapReduce skeleton to the Skandium library
and evaluate its performance. Skandium is a very recent framework implemented as
a Java library that targets multi-cores and provides the user with some basic parallel
patterns like the Map,the Divide and Conquer, the Farm and the Pipe that can be nested
and combined into more complex structures. This project is influenced by the Phoenix
project which first demonstrated the applicability of the Map Reduce model for shared
memory architectures. What makes this work novel is that we integrated the Map
Reduce skeleton into an existing skeleton framework.
Another difference between the two implementations is that Phoenix is based on
Chapter 1. Introduction
3
C/C++ this project is based on Java. The high level of abstraction in Java gives us cer-
tain advantages for the implementation of an abstract programming model like MapRe-
duce. However, there can be performance penalties caused by Java’s automatic mem-
ory management [9]. It is our goal to further explore this problem and how it affects
our implementation.
Another important goal of the project is to investigate how the performance of our
implementation is affected by the workload characteristics. The performance of the
MapReduce skeleton is sensitive to characteristics of the MapReduce application i.e.
the number of key-value pairs and the distribution of the distinct keys. This is because
the internal data structures and algorithms that are used by the skeleton may be fast
and efficient for some workloads but slow and expensive in terms of memory for other
workloads.
Finally, we will explore the potential of the skeleton’s auto-tuning in order to be
able to adapt to the workload characteristics and achieve the best possible performance
for a given application.
1.2
Thesis Outline
This thesis is divided into seven chapters including this one. The rest of the thesis is
structured as follows:
Chapter 2: Discusses the general background of the project, the concept of algo-
rithmic skeletons and the MapReduce model. The Phoenix system and the Skandium
library are also presented this chapter. In addition, we briefly present some typical
MapReduce applications that we used for the skeleton’s evaluation and finally we dis-
cuss the principles behind Java’s garbage collection.
Chapter 3: Describes the skeleton’s programming interface. This chapter presents
how the user of the Skandium library can use the MapReduce skeleton. In addition, an
example of the skeleton’s use for a word count application is given.
Chapter 4: Provides an insight on how the skeleton is implemented. The chapter
gives the implementation details of the skeleton’s internal structures and algorithms
that are used to efficiently store and partition the intermediate key-value pairs. We
present four different implementation schemes that were considered during the project
and we discuss the advantages and disadvantages of each.
Chapter 5: Evaluates the performance of the different implementation schemes
that are described in chapter 4. The schemes are compared against 5 different map
Chapter 1. Introduction
4
reduce applications on two different multi-core architectures. The best performing
scheme is then evaluated further and it is compared in terms of programmability and
performance with traditional parallel programming techniques based on Java threads.
Finally, the performance penalty caused by Java’s automatic memory management
mechanisms is discussed and quantified in this chapter.
Chapter 6: Further optimizations are presented that aim at improving the skele-
ton’s performance for different inputs and reducing the impact of the garbage collector
on the performance of the skeleton. In this chapter, we also consider how the skele-
ton could be able to adapt to the workload characteristics. We explore the potential
of the skeleton’s auto-tuning and we present a simple auto-tuning mechanism that we
implemented in the context of this work.
Chapter 7:Contains our conclusions along with a discussion about how this work
could be extended.