Chapter 3. MapReduce for Skandium: The programming model
23
Similarly to the mapper-library interface, we are using Java’s Collection interface
to pass the set of key from the system to the reducer. This gives flexibility to the
runtime system. We could easily substitute the system’s internal data structures ei-
ther permanently, because we found a better solution, or temporarily according to the
application or the underlying hardware, without changing the interface to the user.
1 p u b l i c
c l a s s R e d u c e r i m p l e m e n t s E x e c u t e ,
C o l l e c t i o n > {
3
p u b l i c
C o l l e c t i o n
e x e c u t e ( C o l l e c t i o n
param )
t h r o w s E x c e p t i o n {
5
A r r a y L i s t r e d u c e d P a i r s =new A r r a y L i s t <
K e y V a l u e P a i r >() ;
I t e r a t o r i = param . i t e r a t o r ( ) ;
7
i n t
c o u n t = 0 ;
w h i l e ( i . h a s N e x t ( ) ) {
9
Combiner p a i r = i . n e x t ( ) ;
S t r i n g word = ( S t r i n g ) p a i r . key ;
11
i f ( p a i r . l i s t O f V a l u e s ! = n u l l ) {
c o u n t = p a i r . l i s t O f V a l u e s . s i z e ( ) ;
13
} e l s e {
c o u n t = 1 ;
15
}
r e d u c e d P a i r s . add ( new K e y V a l u e P a i r ( word , c o u n t ) ) ;
17
c o u n t = 0 ;
}
19
r e t u r n r e d u c e d P a i r s ;
}
21 }
3.2.4
The Merge muscle
The merge muscle is responsible for merging together the output of the reducers. This
muscle falls in the category of the partially generic muscles and similarly to the split
muscle, the programmer can either choose one of the predefined generic muscles or he
can define his own. The merge muscles that are defined in the library expect a collec-
tion of Key Value Pairs from each reducer at their input and produce a single collection
Chapter 3. MapReduce for Skandium: The programming model
24
of Key Value Pairs. This means that the programmer, in order to use generic merge
muscles should restrict the output of his reducer to a Collection of Key Value Pairs.
There are currently two types of merge muscles in the repository. The first type
simply creates a single collection of pairs from the many collections that come from
the reducers and outputs the single collection. The second type additionally sorts the
resulted single collection by the key of each pair.
3.2.5
Creating the Skeleton’s Instance
Let us now see how the programmer can create a skeleton instance, fleshing it with the
muscle objects and passing to it the actual input. In the listing below, we can see part
of the main method of the application, which starts the MapReduce computation.
Initially, an instance of the library defined TextSplit is created in line 3 to be used as
splitter. The number of the available processors is passed to the constructor. This num-
ber determines the number of splits which in turn determine the number of mappers.
Next, the programmer creates instances of his own defined map and reduce muscles
(lines 4-5). Finally, an instance of the library-defined MergeUnsorted is created in line
6 to be used as the merge muscle. In line 7, the instance of the MapReduce skeleton is
created and the muscle objects are passed to the constructor. In line 9 the actual data is
passed to the constructor, while in line 11 the main thread blocks, until the MapReduce
computation is complete. The result of the computation which is a collection of pairs
is finally stored in an ArrayList.
1
p u b l i c
s t a t i c
v o i d main ( S t r i n g [ ]
a r g s ) {
3
T e x t S p l i t
s p l i t t e r =new T e x t S p l i t ( R u n t i m e . g e t R u n t i m e ( ) .
a v a i l a b l e P r o c e s s o r s ( ) ) ;
Mapper mapper =new Mapper ( ) ;
5
R e d u c e r r e d u c e r =new R e d u c e r ( ) ;
M e r g e U n s o r t e d f i n a l M e r g e =new M e r g e U n s o r t e d ( ) ;
7
MapReduce bbp = new MapReduce
C o l l e c t i o n >(
s p l i t T e x t , new Mapper ( ) , new R e d u c e r ( ) ) ;
9
F u t u r e > f u t u r e = bbp . i n p u t ( o u t ) ;
A r r a y L i s t
r e s u l t = ( A r r a y L i s t ) f u t u r e
. g e t ( ) ;
11 }
Chapter 4
MapReduce For Skandium:
Implementation details
In this section, the implementation details of the MapReduce skeleton are presented.
We first present what happens inside Skandium when the user creates a new MapRe-
duce skeleton and how the runtime stitches together the user-defined and the library-
defined code fragments. Next, we give the implementation details of the library-
defined muscles.
4.1
The Skeleton’s Instantiation
The code listing below shows the constructor of the MapReduce Skeleton.
1
p u b l i c
MapReduce ( S p l i t
s p l i t ,
E x e c u t e > mapper ,
3
E x e c u t e ,Y> r e d u c e r , Merge f i n a l M e r g e )
{
5
m a p S t a g e S k e l e t o n =new Map
>( s p l i t ,
mapper , new S t o r e ( ) ) ;
r e d u c e S t a g e S k e l e t o n =new Map,R>(new
P a r t i t i o n ( ) , r e d u c e r , f i n a l M e r g e ) ;
7
m a p R e d u c e P i p e l i n e =new P i p e ( m a p S t a g e S k e l e t o n ,
r e d u c e S t a g e S k e l e t o n ) ;
}
The constructor takes as parameters the user-defined mapper and reducer muscles
along with the splitter and the final merge muscle which can be either user-defined or
25