A mapReduce Skeleton for Skandium

Yüklə 427,52 Kb.

Pdf görüntüsü

səhifə	3/19
tarix	05.03.2018
ölçüsü	427,52 Kb.
	#30153

1 2 3 4 5 6 7 8 9 ... 19

Chapter 1. Introduction

synchronization issues are completely hidden from the programmer.

MapReduce [8] is a popular structured parallel programming model that realizes

the skeleton idea and it is currently used for application development on large scale

clusters. In this model, the problem-speciﬁc code segments that are provided by the

user are just two functions: the Map function and Reduce function: The Map function

is applied to the input and emits a list of intermediate key-value pairs while the Re-

duce function aggregates the values according to the keys. All the other functionality

including the speciﬁcation of parallelism, the distribution of data and fault tolerance is

provided by the run-time and it is completely hidden from the programmer. MapRe-

duce has already shown its power in solving data intensive problems like large scale in-

dexing, large scale graph computations, processing of satellite imagery and large-scale

machine learning problems [8].

The current popularity of multi cores and the prediction that tens to hundreds (even

thousands) of cores will appear on a single chip in the near future [4] make very attrac-

tive an implementation of the MapReduce model for shared memory systems. Such

an implementation should take into account the key characteristics of shared mem-

ory architectures that distinguish them from cluster architectures. In a shared memory

environment, the main performance concern is not to reduce the communication be-

tween the tasks, but to make the best use of the memory hierarchy and to carefully

select the in-memory data structures that are used to store and retrieve the intermediate

key-value pairs [5]. In addition, failures on multi-core architectures are not so frequent

because the number of nodes comparing to clusters is very small. This means that fault

tolerance is not as important as in the cluster version of the model.

1.1

Project Objectives

The goal of this project is to add the MapReduce skeleton to the Skandium library

and evaluate its performance. Skandium is a very recent framework implemented as

a Java library that targets multi-cores and provides the user with some basic parallel

patterns like the Map,the Divide and Conquer, the Farm and the Pipe that can be nested

and combined into more complex structures. This project is inﬂuenced by the Phoenix

project which ﬁrst demonstrated the applicability of the Map Reduce model for shared

memory architectures. What makes this work novel is that we integrated the Map

Reduce skeleton into an existing skeleton framework.

Another difference between the two implementations is that Phoenix is based on

Chapter 1. Introduction

C/C++ this project is based on Java. The high level of abstraction in Java gives us cer-

tain advantages for the implementation of an abstract programming model like MapRe-

duce. However, there can be performance penalties caused by Java’s automatic mem-

ory management [9]. It is our goal to further explore this problem and how it affects

our implementation.

Another important goal of the project is to investigate how the performance of our

implementation is affected by the workload characteristics. The performance of the

MapReduce skeleton is sensitive to characteristics of the MapReduce application i.e.

the number of key-value pairs and the distribution of the distinct keys. This is because

the internal data structures and algorithms that are used by the skeleton may be fast

and efﬁcient for some workloads but slow and expensive in terms of memory for other

workloads.

Finally, we will explore the potential of the skeleton’s auto-tuning in order to be

able to adapt to the workload characteristics and achieve the best possible performance

for a given application.

1.2

Thesis Outline

This thesis is divided into seven chapters including this one. The rest of the thesis is

structured as follows:

Chapter 2: Discusses the general background of the project, the concept of algo-

rithmic skeletons and the MapReduce model. The Phoenix system and the Skandium

library are also presented this chapter. In addition, we brieﬂy present some typical

MapReduce applications that we used for the skeleton’s evaluation and ﬁnally we dis-

cuss the principles behind Java’s garbage collection.

Chapter 3: Describes the skeleton’s programming interface. This chapter presents

how the user of the Skandium library can use the MapReduce skeleton. In addition, an

example of the skeleton’s use for a word count application is given.

Chapter 4: Provides an insight on how the skeleton is implemented. The chapter

gives the implementation details of the skeleton’s internal structures and algorithms

that are used to efﬁciently store and partition the intermediate key-value pairs. We

present four different implementation schemes that were considered during the project

and we discuss the advantages and disadvantages of each.

Chapter 5: Evaluates the performance of the different implementation schemes

that are described in chapter 4. The schemes are compared against 5 different map

Chapter 1. Introduction

reduce applications on two different multi-core architectures. The best performing

scheme is then evaluated further and it is compared in terms of programmability and

performance with traditional parallel programming techniques based on Java threads.

Finally, the performance penalty caused by Java’s automatic memory management

mechanisms is discussed and quantiﬁed in this chapter.

Chapter 6: Further optimizations are presented that aim at improving the skele-

ton’s performance for different inputs and reducing the impact of the garbage collector

on the performance of the skeleton. In this chapter, we also consider how the skele-

ton could be able to adapt to the workload characteristics. We explore the potential

of the skeleton’s auto-tuning and we present a simple auto-tuning mechanism that we

implemented in the context of this work.

Chapter 7:Contains our conclusions along with a discussion about how this work

could be extended.

Yüklə 427,52 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 19