Chapter 2
Related Background
This chapter presents the related background material that defines fundamental notions
of the project. In section 2.1 the concept of algorithmic skeletons is presented. In
2.2 the background of the MapReduce model is covered and in 2.3 Phoenix, the first
implementation of MapReduce on shared memory architectures, is described. In 2.4
we introduce the details of the Skandium library. Finally in 2.5 we describe a set of
typical MapReduce applications that we use to evaluate the performance of the skeleton
and in 2.6 we present the principles behind the Java’s automatic memory management
which as we will see in later chapters, has a significant impact on the performance of
our implementation.
2.1
Algorithmic Skeletons
The Algorithmic skeletons [6] is a programming model that provides a high level of
abstraction to the programmer and aims at facilitating the efficient use of parallel hard-
ware. This model presents the user with a set of algorithmic skeletons which are in-
dependent algorithmic techniques that capture the high level computational structure
of algorithm e.g the pipeline, the task farm. The user selects the skeleton that best de-
scribes the problem in hand and fills the gaps of the abstract structure with his own code
fragments that turn the abstract computation into a complete solution to the problem.
By selecting a skeleton, the user implicitly selects the problem decomposition (i.e
the identification of processes that can operate concurrently) as it is inherent to the
structure of the skeleton. He also implicitly selects problem distribution (i.e. the static
or dynamic mapping of the parallel processes to the computational resourses) which
is provided by one of the carefully optimized implementations of the skeleton for the
5
Chapter 2. Related Background
6
particular machine. With the decomposition and the distribution defined, the commu-
nication and the synchronization of the processes can also be handled by the runtime
system, completely hiding the difficult aspects of parallel programming from the user.
The unique characteristic of this model is its fragmented nature, that contrasts the
monolithic programming model of previous abstract systems that also aimed at hiding
parallel programming issues from the programmer. The skeletons are totally indepen-
dent of each other, each can more easily find an efficient implementation on a variety
of architectures and new skeletons can be added to the system without affecting the be-
haviour of the others. More importantly, when previous systems presented an abstract
unrestricted programming model that was meant to solve any problem and inevitably
could not be supported by a realistic and efficient implementation, this model presents
restricted algorithmic structures that the system knows how to implement efficiently
and at the same time can guide the user to good programming style.
2.2
The MapReduce Skeleton
The computation described by the MapReduce skeleton is the following: The input
is processed by a set of map functions that run in parallel. Each mapper is assigned
a section of the original input and its output is a set of intermediate key-value pairs.
The intermediate values are then passed to the reduce stage where a set of reducers
run in parallel. All the pairs that have the same key are processed by the same re-
duce function. Each reduce function applies a reduction operation on the associated
with the same key values to form a smaller set of values for each key, typically one.
The user-defined parts of the code in this skeleton are the map and the reduce func-
tion while the runtime system automatically generates parallelism by creating several
instances of these functions and distributing across the nodes of the parallel machine
The computation is depicted in Figure 2.1.
In Google’s implementation of the model [8] which targets large scale clusters of
commodity machines, the run time system also hides from the programmer issues of
fault tolerance, load balancing and inter-machine communication. The model has been
used across a wide range of domains within Google including large scale machine
learning problems and large scale graph computations. According to [8] the fact that
the programmer focuses on functionality and not on parallelization and develops a
simple computation that is automatically executed efficiently on a thousand machines
has led to the speeding up of the development and prototype process. The code of an
Chapter 2. Related Background
7
Figure 2.1: The MapReduce programming model
application is simpler and easier to understand since all the details that obscure the
original simple computation are hidden in the library.
2.3
The Phoenix Implementation
Phoenix [5] [13] [10] was the first implementation of the MapReduce programming
model that targets multi-core and multi-processor platforms. The increasing number
of the cores on a single chip and the fact that traditional programming techniques will
fail to target the high level of parallelism in these machines effectively, was the main
motivation for applying a structured parallel programming model like MapReduce on
shared memory architectures.
Unlike Google’s implementation that targets large clusters and the main perfor-
mance concern is to reduce network traffic and avoid remote file accesses, phoenix
implementation is more focused on the careful selection of the in-memory data struc-
tures that are used to store and retrieve the intermediate key-value pairs. The system
also aims at using the memory hierarchy of the machine as efficiently as possible.
Phoenix’s first implementation [5] provided an application programming interface
for C and C++. Apart from the Map and the Reduce function, the programmer provides
a function that is used by the runtime for key comparison and a function that splits the
input data across the Map tasks. The programmer can optionally provide a function
that partitions the intermediate pairs for Reduce tasks based on their keys while the