Learning to Play Pac-Man: An Evolutionary, Rule-based Approach
Marcus Gallagher
marcusg@itee.uq.edu.au
Amanda Ryan
s354299@student.uq.edu.au
School of Information Technology and Electrical Engineering
University of Queensland 4072
Australia
Abstract- Pac-Man is a well-known, real-time computer
game that provides an interesting platform for research.
This paper describes an initial approach to developing
an artificial agent that replaces the human to play a sim-
plified version of Pac-Man. The agent is specified as
a simple finite state machine and ruleset, with param-
eters that control the probability of movement by the
agent given the constraints of the maze at some instant
of time. In contrast to previous approaches, the agent
represents a dynamic strategy for playing Pac-Man,
rather than a pre-programmed maze-solving method.
The agent adaptively “learns” through the application
of population-based incremental learning (PBIL) to ad-
just the agents’ parameters. Experimental results are
presented that give insight into some of the complexities
of the game, as well as highlighting the limitations and
difficulties of the representation of the agent.
1 Introduction
Pac-Man is a well-known, real-time arcade computer game
originally developed by Toru Iwatani for the Namco Com-
pany in 1981. Different versions of the game have been de-
veloped subsequently for a large number of home computer,
game-console, and hand-held systems.
The typical version of Pac-Man is a one-player game
where the human player maneuvers the Pac-Man character
around a maze, attempting to avoid four “ghost” characters
while eating dots initially distributed throughout the maze.
If Pac-Man collides with a ghost, he loses one of his three
lives and play resumes with the ghosts reassigned to their
initial starting location (the “ghost cage” in the centre of
the maze). Four “power pills” are initially positioned near
each corner of a maze: when Pac-Man eats a power pill he
is able to turn the tables and eat the ghosts for a few sec-
onds of time. The game ends when Pac-Man has lost all
(usually 3) of his lives. Figure 1 shows a screen-shot of
the starting position of the first maze in the game. Pac-Man
is a real-time computer game that resembles a simplified
version of many modern first-person environment computer
games. That is, the game is centered around navigating
the player character around a semi-structured world, accu-
mulating points, avoiding and (when appropriate) attacking
Figure 1: The starting position of the Pac-Man game, show-
ing the maze structure, Pac-Man (lower-center), power pills
(large dots), dots (small dots) and ghosts (center).
non-player game characters
. This kind of game requires
dynamic control of the game agent by the human player
and involves task prioritization, planning, and risk assess-
ment. While it is relatively easy for a human to learn a basic
strategy for Pac-Man, the game has complex aspects that al-
low the possibility of developing more intelligent strategies
(in conjunction with other skills such as hand-eye coordina-
tion). It is also a challenge for a person to describe precisely
their Pac-Man-playing strategy, or to represent such a strat-
egy formally (e.g., as a set of rules).
This paper describes an initial approach to developing an
artificial agent that replaces the human playing Pac-Man.
For this work, the game has been significantly simplified
to having only a single ghost, dots and the Pac-Man agent
present in the mazes. The agent is specified as a simple
finite state machine and ruleset, with parameters that spec-
ify the state transition and the probabilities of movement
according to each rule. Section 2 provides an overview of
previous work relevant to Pac-Man and AI game playing,
¡
Alternatively, other game characters might be controlled by other hu-
man players in a multi-player setting.
while Section 3 describes the evolutionary, rule-based ap-
proach taken in this paper. Some experimental details are
discussed in Section 4. Results are presented in Section 5,
and Section 6 provides a summary and some discussion of
possible future work.
2 Artificial Intelligence Approaches to Pac-
Man
A relatively small amount of previous research has been
done toward the application of artificial intelligence to Pac-
Man or similar games. Koza [7] and Rosca [9] use Pac-Man
as an example problem domain to study the effectiveness of
genetic programming for task prioritization. Their approach
relies on a set of predefined control primitives for percep-
tion, action and program control (e.g., advance the agent on
the shortest path to the nearest uneaten power pill). The pro-
grams produced represent procedures that solve mazes of a
given structure, resulting in a sequence of primitives that
are followed. Genetic Programming has also been applied
to the computer game Tron [3], where coevolution was used
to produce agents that learn strategies by playing human op-
ponents over the internet.
Gugler [5] describes “Pac-Tape”, a project which at-
tempted to produce a self-playing Pac-Man based on the
original Pac-Man arcade game emulated on a desktop PC.
The approach appears to be based on brute force search,
but is not described or tested. Lawrence [8] builds on the
Pac-Tape work, but applies a genetic algorithm to evolve
a string of directions (north, south, east, west) to traverse a
maze. The aim was to evolve interesting patterns for solving
particular mazes. Unfortunately, success was limited, per-
haps due to unfavourable interactions between the genetic
operators (esp. crossover) and the representation used.
Kalyanpur and Simon [6] use a genetic algorithm to try
to improve the strategy of the ghosts in a Pac-Man-like
game. Here the solution produced is also a list of directions
to be traversed. A neural network is used to determine suit-
able crossover and mutation rates from experimental data.
Finally, De Bonet and Stauffer [2] describe a project using
reinforcement learning to develop strategies simultaneously
for Pac-Man and the ghosts, by starting with a small, simple
maze structure and gradually adding complexity.
The aim of our research is to investigate techniques to
develop an adaptive agent that learns inductively to play
Pac-Man. Our approach is aimed at producing agents that
can learn based only on the information that is available
to a human playing the game (or quantities that a human
could conceivably estimate in real-time). In particular, the
agent should not be able to exploit internal knowledge about
the game (e.g, the programmed behaviour of the ghosts) -
the (software) agent should be considered external from the
game software. Furthermore, this agent should learn gen-
eralizable strategies for good game play, based not on the
exact structure of a given maze but rather on more general
principles of good strategies to play Pac-Man.
3 Designing an Artificially Intelligent Agent
that learns to play Pac-Man Inductively
Pac-Man is a simple game to describe and play. Human
players have little difficulty in learning the basics of game
play, which are to obtain as many points as possible (by
clearing mazes of dots and eating power pills and ghosts),
whilst avoiding the ghosts. However there is not obviously
one correct way of specifying such a strategy precisely, per-
haps because it is largely based on perception of the visual
presentation of the game and its dynamics in time.
Moving beyond this basic approach makes clear the full
complexity of the game domain. A large point incentive is
provided for eating power pills, followed by eating ghosts in
the few seconds that the power pill remains effective, but the
ghosts generally try to retreat from Pac-Man in this time to
avoid being eaten. Thus an effective strategy might involve
attracting ghosts to the area of the maze near a power pill
before eating it. The ghosts movements are dependent on
each other and typically contain an element of randomness,
making it difficult to predict their behaviour and plan around
it.
Our approach here is to initially remove much of the
complexity from the game to see if it is possible to produce
an effective agent in a simplified Pac-Man environment. In
the experiments discussed below, only the dots and a sin-
gle ghost are present in a maze with Pac-Man. While this
clearly removes several interesting aspects of game-play, we
believe that what remains (learning to avoid the ghost and
perhaps to include random exploration) is still a fundamen-
tal aspect of the full Pac-Man game.
We were motivated to begin by using a simple, trans-
parent representation for the agent. A representation that
was potentially capable of capturing the general aspects of
a basic human strategy provides a good platform on which
to test the feasibility of using an evolutionary approach to
learning, as well as serving as a benchmark for comparing
future work. It was also interesting to test if the evolutionary
algorithm would generate an agent with an obvious strategy.
These considerations lead to the agent being represented as
a two-state finite state machine, with a set of probabilistic
rules for each state that control the movement of the agent.
At any given time, the state of the agent is determined by the
distance between Pac-Man and the ghost. If this distance is
greater than some given value,
, the agent is in the Ex-
plore state, while if the distance is less than
, the agent
switches to the Retreat state.
A human player is able to see the entire maze during
game play, but dynamically is usually focusing their atten-
tion primarily on the immediate area of the maze where
Pac-Man is currently located. Planning a long sequence of
moves in advance would not only be very difficult for a hu-
man, but also ineffective, since the dynamics of the game
would be likely to make such a strategy poor and irrelevant.
Our initial approach for an artificial Pac-Man agent is there-
fore to view play as a small number of possible turn types,
together with basic global information such as the position
and movement of the ghost. Figure 2 categorizes the en-
tire maze in terms of turn types, including straight corridors.
Note that the representation used below considers turn types
from the current orientation of Pac-Man, so that the “true”
orientation of a turn type does not need to be directly con-
sidered (e.g., vertical versus horizontal corridors).
Figure 2: The first Pac-Man maze in terms of turn types.
Some types are more common than others.
At each time step (tick) of the game, the agent is located
in an instance of a certain turn-type, and needs to produce
an output that becomes the current direction of movement
for Pac-Man. Depending on the turn-type of the current po-
sition (corridor, L-turn, T-junction, intersection), there are
a number of different feasible directions to move, including
maintaining the current direction (but excluding running di-
rectly into walls). In the Explore state, a probabilistic de-
cision is made to choose one of the feasible directions de-
pending on the turn type. As a result, Pac-Man is able to
explore the maze in a pseudo-random fashion. Exploration
does not use any other game information to determine the
agent’s output. In the Retreat state, the agent additionally
considers the location of the ghost to determine the output.
This results in the need to consider a number of possible
turn-type and ghost position combinations. We classified
the ghost’s position as being either forward, back, left, right,
forward-left, forward-right, back-left, or back-right relative
Agent()
while (game is in play)
currentdistance = Distance(pacman,ghost)
if currentdistance
¡£¢
¡
then
Explore
else
Retreat
end
¤
Explore()
switch(turntype)
case “corridor”
newdir = Random(P,prevdir,turntype)
case “L-turn”
newdir = Random(P,prevdir,turntype)
case “T-turn”
newdir = Random(P,prevdir,turntype,orientation)
case “Intersection”
newdir = Random(P,prevdir,turntype)
¤
¤
Retreat()
switch(turntype)
case “corridor”
newdir = Random(P,prevdir,turntype,ghostpos)
case “L-turn”
newdir = Random(P,prevdir,turntype,orientation,ghostpos)
case “T-turn”
newdir = Random(P,prevdir,turntype,orientation,ghostpos)
case “Intersection”
newdir = Random(P,prevdir,turntype,ghostpos)
¤
¤
Figure 3: Pseudo-code for the Pac-Man agent strategy.
to Pac-Man. Note that the first four classifications are de-
termined using only one coordinate dimension. For exam-
ple, if Pac-Man is currently facing north (up) on the screen,
the ghost would be classified as being “back” if its current
y-coordinate is less than Pac-Man’s current y-coordinate.
Adding the final four classifications are one way of refining
this information. For the same example (Pac-Man facing
north), the ghost’s location will be classified as follows:
back if
¡£¢¥¤§¦©¨
¥! "$#%'&
and
¡£¢¥¤§¦©¨)(10
'2 "$#
(
&
back-left/left-back if
¡3¢'¤§¦'¨)
4¥! "$#
&
and
¡£¢¥¤§¦©¨
(
56¥2 "¥#
(
&
back-right/right-back if
¡3¢¥¤¦'¨)
76¥2 "¥#
&
and
¡£¢¥¤§¦©¨
(98
6¥2 "¥#
(
&
forward if
¡3¢¥¤¦'¨))
8
'2 "$#§'&
and
¡3¢'¤§¦'¨))(@0
'2 "$#
(
&
forward-left/left-forward if
¡£¢¥¤§¦©¨
8
6¥2 "¥#
&
and
¡3¢'¤§¦'¨)
(
A6¥2 "¥#
(
&
forward-right/right-forward
if
¡£¢¥¤§¦©¨
8
¥! B¥#%'&
and
¡£¢¥¤§¦©¨)(
8
¥! "$#%(¥&
left if
¡£¢¥¤§¦©¨
0
'2 "$#§'&
and
¡£¢¥¤§¦©¨)(
'2 "$#
(
&
right if
¡3¢'¤§¦'¨)
0
'2 "$#
&
and
¡£¢¥¤§¦©¨
(
8
'2 "$#
(
&
Note also that, for example, back-left is equivalent to left-
back as indicated in the above. Unfortunately, having eight
classes for the position of the ghost for each turn-type means
that the ruleset (and the number of parameters) also gets
larger. Hence in our implementation, eight classes were
used to classify the ghost’s position for intersections, with
only the first four classes used for other turn-types. Pseudo-
code for the agent’s strategy is shown in Figure 3.
Implementation of the agent based on the specifications
above leads to a total of 85 adjustable parameters, collected
into a parameter vector
C
0D¡
FE!G!GHG!E
6IQP
&
, that controls the
agent’s behaviour. A sub-component of
C
is used to decide
Pac-Man’s current (new) direction of movement, depending
on the previous direction (prevdir), turn type, orientation,
and (in retreat mode) the ghost’s location. Parameter
is
the distance value at which the agent shifts from Explore
to Retreat (and vice-versa), calculated as the direct (i.e., ig-
noring maze walls) Manhattan distance between Pac-Man
and the ghost. All other parameters of P are probability val-
ues,
RTS
VU
SXW
EY
0a`
E!G!GHG2Eb©c
. Table 1 gives a complete
specification of these values.
Parameter
Description
Distance to ghost
Explore:
6d!eVf
Corridor: forward, backward
gHe
P
L-turn: forward, backward
6h!e
I
T-turn (a) approach centre
6i
e
T-turn (b) approach left
d2e
g
T-turn (c) approach right
P
e
I
Intersection
Retreat:
i
epd)q
Corridor: ghost forward
d
epdd
Corridor: ghost behind
dQf2epd)g
L-turn: ghost forward
d
P
epdh
L-turn: ghost behind
dQr2epd
i
T-turn (a): ghost behind
fq!epfd
T-turn (b): ghost behind
fQf2epf
P
T-turn (c): ghost behind
fQh2epf
I
T-turn (a): ghost on left
6f
i
eVg
T-turn (b): ghost on left
gsd2eVgg
T-turn (a): ghost on right
g
P
eVgQr
T-turn (b): ghost on right
g
I
e
P
q
T-turn (b): ghost forward
P
e
P
f
T-turn (c): ghost forward
P
g!e
P
r
Intersection : ghost forward
6PQI
eph
Intersection : ghost behind
hQd2eph
P
Intersection : ghost left
hQh2eph
i
Intersection : ghost right
rq!eprf
Intersection : ghost forward/left
rg!eprr
Intersection : ghost forward/right
r
I
e
I
Intersection : ghost behind/left
6I
d2e
IP
Intersection : ghost behind/right
Table 1: Description of parameter values used in the Pac-
Man agent and their usage.
4 Simulation Methodology
4.1 Fitness Function
For our simplified Pac-Man game, a fitness function was
developed based primarily on the score obtained by Pac-
Man. The only way to score points in this simplified game
is to eat dots from the maze, and since there is a fixed initial
number of dots in a maze, the maximum possible score for
each maze is known a priori. It was considered that the
time Pac-Man manages to survive should also be a factor in
the fitness function. Note however that there also exists a
possibility that Pac-Man may successfully avoid the ghost
indefinitely, but fail to clear the maze of dots. Because of
this a term was added to reward the time survived by Pac-
Man, but imposing a pre-chosen limit on this factor. The
fitness function used is:
0¢¡
£
¤¦¥§¤¨£
¨Q2¦©
£
¤¦¥¤¨£
"§¨Q2¦©
£
¤¦¥§¤¨£
#"!
£
¤¦¥§¤¨£
"%
£
¤¦¥§¤¨£
E
W$#
(1)
The score and time factors in the fitness function are nor-
malized by their maximum (respectively, known and cho-
sen) values for each level. Hence, the maximum fitness for
each level is 2.
The movement of the ghost in the game has a small
amount of randomness, and the agent developed above
clearly produces stochastic movement for Pac-Man. As a
consequence, the fitness value of each game given a fixed
C
will also be stochastic. In such a situation it is typical
to conduct repeated evaluations of the fitness function with
a given parameter set, and produce an average fitness value
to be used by the evolutionary algorithm. In the simulations
below, 10 games were played per average fitness evaluation.
Recall that Pac-Man has three lives in a game, meaning that
Pac-Man “runs” in the maze 30 times for each fitness eval-
uation used in the EA.
4.2 Algorithm
The algorithm used to learn the parameters of an agent is
an implementation of population-based incremental learn-
ing (PBIL) for continuous search spaces [1, 10, 11]. PBIL
replaces the conventional genetic operators of mutation and
recombination with a probability vector, which is used to
generate each population and is updated via a learning rule
based on the single best individual in the current popula-
tion. Each component of the probability vector represents
the mean of a Gaussian distribution. In our experiments the
population size was set to 25 (computational time prevented
a larger value). For the standard deviations of the Gaussians
and the PBIL learning rate parameter,
%
, several different
values were tested. Note that for
%
0
W
, this algorithm is
equivalent to a simple
¡
W
E§&
&
evolution strategy with a con-
stant standard deviation value for all variables [4].
Vector
Mean
Std. Dev.
Min.
Max.
C('
0.215
0.080
0.104
0.511
C
'
d
0.613
0.316
0.187
1.699
C('
f
1.372
0.522
0.276
2.062
Table 2: Results for hand-coded parameter vectors tested,
summarizing fitness values from 50 games with each pa-
rameter vector.
The probability vector was initialized with
0
W
c
- this value
chosen such that the agent initially spent
roughly equal amounts of time in the Explore and Retreat
states.
was allowed to evolve as a continuous value, al-
though the Manhattan distance between Pac-Man and the
ghost is integer-valued. The remaining parameters repre-
sented probabilities for subsets of variables, so they were
initialized with uniform probability for each feasible output
direction of the agent (e.g.
Vd!eVf
0
R
G
c
,
P
e
I
0
R
G
`
c
).
These parameters were re-normalized after being updated at
each generation by the PBIL learning rule. Note that PBIL
evolves a probabilistic model of the search space which con-
verges towards a locally or globally optimal value. In this
paper we interpret the probability vector learnt by the al-
gorithm as the realization of our evolved “agent”, and a
population-based search is used to perform this task.
5 Results
5.1 Hand-coded Agent Parameters
A feature of the agent implementation described above is
that it is transparent: each parameter has a clear function in
the control of the output of the agent. It is therefore possible
to experiment with an agent of hand-coded parameters. The
rule sets allow for basic explore and retreat behaviour only,
so the hand-coded parameters were chosen to try and pro-
duce an agent with the ability to explore the maze widely in
the “Explore” state, and to retreat from the ghost to avoid
being eaten in the “Retreat” state. Results for the different
hand-coded parameter vectors tested are shown in Table 2.
Consider firstly an agent,
C)'
, with equal probabilities
assigned to each parameter in each relevant subset of val-
ues. The behaviour of this agent was observed to be highly
erratic, because equal probabilities mean that Pac-Man is
just as likely to reverse his direction at any point in time as
he is to choose any other feasible direction. As a result, the
movement of Pac-Man is dominated by oscillations (e.g.,
forwards-backwards in a corridor) around the current point.
This is reflected in the low fitness values observed for agent
C
'
in Table 2.
For the next parameter vector tested (
C
'
d
), in the Ex-
plore state a lower probability was assigned to reversing the
current direction, with probabilities for all other feasible di-
rections assigned uniformly (e.g. for a corridor to move for-
wards and backwards respectively,
d
0
R
G
bE
f
0
R
G
`
; for
an intersection to move forwards, backwards, left and right
respectively
P
r
I
0
R
G
¡
E
h
0
R
G
W
). For the Retreat
state, zero probability was assigned to the direction that led
to the ghost, with all other feasible options given uniform
probability. This agent had reduced oscillations in its move-
ment and was observed to have some ability to retreat from
the ghost over “chase” sequences of turns (see Table 2 for
results). However, the agent only occasionally comes close
to clearing a maze (surviving long enough to do so) and still
contains a harmful degree of oscillation in its output.
Finally, a parameter vector (
C)'
f
) was tested that refined
C
'
d
by having very low probability values for reversing the
current direction. For example, for a corridor to move for-
wards and backwards respectively,
d
0
R
G
¢£¢§E
6f
0
R
G
R§W
;
for an intersection to move forwards, backwards, left and
right respectively
P
r
I
0
R
G
¡¤¡
E
h
0
R
G
R§W
). This agent
produced improved behaviour (Table 2), typically coming
close to clearing the maze and occasionally doing so (6 out
of the 50 trials produced fitness values above 2.0). More
importantly (since the Explore state for the agent does not
have any knowledge of the dots in the maze), the agent is
able to evade the ghost for long periods of time, due to the
Retreat state. Nevertheless, the limitations of the agent be-
came clear from observing the performance of this agent.
The ruleset decides the move direction based on the cur-
rent location only. In retreat mode, the position of the ghost
is considered but only crudely measured. As an example,
consider a situation where Pac-Man is located at one of the
extreme corners of the maze (ref. Figure 1 or Figure 2). For
say, the top-right corner, the ghost will usually be moving
towards Pac-Man from a below-left position. Because of
the limited way that the agent considers the position of the
ghost, moving left is undesirable because the ghost is to the
left, but moving downward is also undesirable because the
ghost is also downward. Oscillation results, until the ghost
becomes very close to Pac-Man. Other situations allow the
agent to be captured by the ghost because equiprobable al-
ternatives for retreating can lead to oscillatory behaviour.
5.2 PBIL-evolved Agent
Figure 4 shows the result of the PBIL algorithm evolving a
Pac-Man agent parameter vector, in terms of the minimum,
mean, and maximum fitness values in the population distri-
bution. Although the game code was modified to make play
as fast as possible, 250 games are played in each genera-
tion of the algorithm (population of 25, 10 games per fit-
ness evaluation), meaning several days of simulation time.
For this experiment, a standard deviation of 0.01 was used
for all parameters except
, which used a value of 1.0, and
the learning rate was
%
0
R
G
R
c
. The mean of the population
0
100
200
300
400
500
600
700
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Generations
Fitness
Figure 4: Evolution of the minimum (crosses), mean (line)
and maximum (dots) fitness values in the population for
PBIL with
%
0
R
G
R
c
.
shows a smooth learning curve compared to other standard
deviation/learning rate values that we experimented with.
After 722 generations the population mean is around 0.8,
which is better than our hand-coded parameter sets
C
and
C
d
but below the performance of
C
f
. We were interested in
the influence of the PBIL learning rate on the results of the
algorithm. Figure 5 shows the result of a different experi-
ment, with
%
0
W
G
R
(note that less generations have been
performed). This is equivalent to a
¡
W
E
`
c
&
-evolution strat-
egy. It is evident that this algorithm is able to initially pro-
vide more rapid improvement in fitness, but after approxi-
mately 100 generations progress is much more noisy than
that of Figure 4. In preliminary experiments we observed
that a larger standard deviation value had a similar effect.
The results suggest that a kind of annealing scheme for ei-
ther of these parameters may allow the algorithm to con-
verge more reliably. Overall the performance of the algo-
rithms with different learning rates is similar in terms of the
kind of fitness values obtained during learning. The prob-
ability vectors produced by the the two algorithms in the
experiments above were examined manually. Many of the
parameters appeared to be converging towards values simi-
lar to the hand-coded strategy
C)'
f
above. It is clear from
the results shown in Figures 4 and 5 that the probability
vector is still subject to significant perturbation, making it
difficult to interpret without extending the experiment over
more generations.
6 Discussion
This paper has developed an approach to constructing an
artificial agent that learns to play a simplified version of
0
50
100
150
200
250
300
350
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Generations
Fitness
Figure 5: Evolution of the minimum (crosses), mean (line)
and maximum (dots) fitness values in the population for
PBIL with learning rate = 1.0 (i.e. (1,25)-ES).
Pac-Man. The agent is specified as a simple state machine
and parameterized ruleset, with the PBIL algorithm used to
learn suitable values for these parameters. Hand-coded pa-
rameters were also tested.
The results highlight the limitations of the representa-
tion used as well as some of the complexities of the game,
even in the highly simplified form considered in these exper-
iments. The ruleset could certainly be improved given the
difficulties observed in our simulations. For example, con-
sidering 8 classifications for the ghost’s position for each
turn type (Section 3 should reduce the problems observed
in Section 5.1. This would however necessarily result in a
larger ruleset. The representation also has redundancies in
the way that probability values are represented (e.g. replace
6f
by
¡
W
6d
&
. Nevertheless, the representation seems to
have serious limitations for scaling up the “intelligence” of
the agent. This may result in a very high dimensional op-
timization problem which would require a large amount of
computation time to allow enough generations for the algo-
rithm to produce a good result. Given the complexities of
the full game compared to the version considered here, it
seems that extending on a ruleset-based representation may
be difficult and impractical.
The methodology used in this paper is a first attempt at
this problem and there are several factors that might be im-
proved. Firstly, it is clear that the different strategies (hand-
coded and learned) used in this paper do not come close to
capturing the variety of possible effective strategies used by
humans in the full Pac-Man game. We hypothesize that the
basic explore/retreat approach is likely to have a role in an
effective strategy for the full Pac-Man game, but this has
not been verified. Secondly, our decision to use an agent
that can consider only “localized” information of the game
is a factor that deserves further consideration. Finally, the
fitness function used is based on both score achieved and
time taken. The impact of the time factor on our results is
not clear and it may be possible to remove this from the
fitness function with no adverse effects.
Nevertheless, we believe that the approach taken here
could be useful as a benchmark in considering different rep-
resentations and approaches to evolving a Pac-Man playing
agent. It is expected that future work would need to be
able to improve on the performance (fitness) of the agent.
However it will also be interesting to compare the com-
plexity (dimensionality, interpretability, computational re-
quirements) of alternative approaches, to the rule-based ap-
proach developed above. Finally, we conjecture that general
aspects of the approach taken here to developing an adap-
tive agent for Pac-Man may eventually lead to techniques
that can be applied successfully to other real-time computer
games.
Bibliography
[1] S. Baluja. Population-Based Incremental Learning: A
method for integrating genetic search based function
optimization and competitive learning. Technical Re-
port CMU-CS-94-163, School of Computer Science,
Carnegie Mellon University, 1994.
[2] J. S. De Bonet and C. P. Stauffer. Learning to play
pacman
using
incremental reinforcement learning. Retrieved from
http://www.ai.mit.edu/people/stauffer/Projects/PacMan/
(19/06/03), 1999.
[3] P. Funes, E. Sklar, H. Juill´e, and J. Pollack. Animal-
animat coevolution: using the animal population as
fitness function.
In R. Pfeifer et. al., editor, From
Animals to Animats 5: Proceedings of the Fifth In-
ternational Conference on Simulation of Adaptive Be-
haviour, pages 525–533. MIT Press, 1998.
[4] M. Gallagher. Multi-layer perceptron error surfaces:
visualization, structure and modelling. PhD thesis,
Dept. Computer Science and Electrical Engineering,
University of Queensland, 2000.
[5] S.
Gugler.
Pac-tape.
Retrieved
from
http://www.geocities.com/SiliconValley/Bay/4458/
(21/05/01), 1997.
[6] A. Kalyanpur and M. Simon.
Pacman using ge-
netic algorithms and neural networks.
Retrieved
from http://www.ece.umd.edu/
¡
adityak/Pacman.pdf
(19/06/03), 2001.
[7] J. Koza. Genetic Programming: On the Programming
of Computer by Means of Natural Selection.
MIT
Press, 1992.
[8] S. Lawrence.
Pac-man autoplayer.
Retrieved
from http://www.cis.rit.edu/
¡
jerry/Software/autopac/
(21/05/01), 1999.
[9] J. P. Rosca. Generality versus size in genetic program-
ming. In J. Koza et. al., editor, Genetic Programming
(GP96) Conference, pages 381–387. MIT Press, 1996.
[10] S. Rudlof and M. K¨oppen.
Stochastic hill climb-
ing with learning by vectors of normal distribu-
tions.
In 1st Online Workshop on Soft Comput-
ing, Retrieved from http://www.bioele.nuee.nagoya-
u.ac.jp/wsc1/ (8/12/99), 1996.
[11] M. Sebag and A. Ducoulombier.
Extending
population-based incremental learning to continuous
search spaces. In A. Eiben et. al., editor, Parallel Prob-
lem Solving from Nature - PPSN V, volume 1498 of
Lecture Notes in Computer Science, pages 418–427,
Berlin, New York, 1998. Springer.
Dostları ilə paylaş: |