Marcus Gallagher

Yüklə 146,26 Kb.

Pdf görüntüsü

tarix	14.10.2017
ölçüsü	146,26 Kb.
	#4783

Learning to Play Pac-Man: An Evolutionary, Rule-based Approach

Marcus Gallagher

marcusg@itee.uq.edu.au

Amanda Ryan

s354299@student.uq.edu.au

School of Information Technology and Electrical Engineering

University of Queensland 4072

Australia

Abstract- Pac-Man is a well-known, real-time computer

game that provides an interesting platform for research.

This paper describes an initial approach to developing

an artiﬁcial agent that replaces the human to play a sim-

pliﬁed version of Pac-Man. The agent is speciﬁed as

a simple ﬁnite state machine and ruleset, with param-

eters that control the probability of movement by the

agent given the constraints of the maze at some instant

of time. In contrast to previous approaches, the agent

represents a dynamic strategy for playing Pac-Man,

rather than a pre-programmed maze-solving method.

The agent adaptively “learns” through the application

of population-based incremental learning (PBIL) to ad-

just the agents’ parameters. Experimental results are

presented that give insight into some of the complexities

of the game, as well as highlighting the limitations and

difﬁculties of the representation of the agent.

1 Introduction

Pac-Man is a well-known, real-time arcade computer game

originally developed by Toru Iwatani for the Namco Com-

pany in 1981. Different versions of the game have been de-

veloped subsequently for a large number of home computer,

game-console, and hand-held systems.

The typical version of Pac-Man is a one-player game

where the human player maneuvers the Pac-Man character

around a maze, attempting to avoid four “ghost” characters

while eating dots initially distributed throughout the maze.

If Pac-Man collides with a ghost, he loses one of his three

lives and play resumes with the ghosts reassigned to their

initial starting location (the “ghost cage” in the centre of

the maze). Four “power pills” are initially positioned near

each corner of a maze: when Pac-Man eats a power pill he

is able to turn the tables and eat the ghosts for a few sec-

onds of time. The game ends when Pac-Man has lost all

(usually 3) of his lives. Figure 1 shows a screen-shot of

the starting position of the ﬁrst maze in the game. Pac-Man

is a real-time computer game that resembles a simpliﬁed

version of many modern ﬁrst-person environment computer

games. That is, the game is centered around navigating

the player character around a semi-structured world, accu-

mulating points, avoiding and (when appropriate) attacking

Figure 1: The starting position of the Pac-Man game, show-

ing the maze structure, Pac-Man (lower-center), power pills

(large dots), dots (small dots) and ghosts (center).

non-player game characters

. This kind of game requires

dynamic control of the game agent by the human player

and involves task prioritization, planning, and risk assess-

ment. While it is relatively easy for a human to learn a basic

strategy for Pac-Man, the game has complex aspects that al-

low the possibility of developing more intelligent strategies

(in conjunction with other skills such as hand-eye coordina-

tion). It is also a challenge for a person to describe precisely

their Pac-Man-playing strategy, or to represent such a strat-

egy formally (e.g., as a set of rules).

This paper describes an initial approach to developing an

artiﬁcial agent that replaces the human playing Pac-Man.

For this work, the game has been signiﬁcantly simpliﬁed

to having only a single ghost, dots and the Pac-Man agent

present in the mazes. The agent is speciﬁed as a simple

ﬁnite state machine and ruleset, with parameters that spec-

ify the state transition and the probabilities of movement

according to each rule. Section 2 provides an overview of

previous work relevant to Pac-Man and AI game playing,

Alternatively, other game characters might be controlled by other hu-

man players in a multi-player setting.

while Section 3 describes the evolutionary, rule-based ap-

proach taken in this paper. Some experimental details are

discussed in Section 4. Results are presented in Section 5,

and Section 6 provides a summary and some discussion of

possible future work.

2 Artiﬁcial Intelligence Approaches to Pac-

Man

A relatively small amount of previous research has been

done toward the application of artiﬁcial intelligence to Pac-

Man or similar games. Koza [7] and Rosca [9] use Pac-Man

as an example problem domain to study the effectiveness of

genetic programming for task prioritization. Their approach

relies on a set of predeﬁned control primitives for percep-

tion, action and program control (e.g., advance the agent on

the shortest path to the nearest uneaten power pill). The pro-

grams produced represent procedures that solve mazes of a

given structure, resulting in a sequence of primitives that

are followed. Genetic Programming has also been applied

to the computer game Tron [3], where coevolution was used

to produce agents that learn strategies by playing human op-

ponents over the internet.

Gugler [5] describes “Pac-Tape”, a project which at-

tempted to produce a self-playing Pac-Man based on the

original Pac-Man arcade game emulated on a desktop PC.

The approach appears to be based on brute force search,

but is not described or tested. Lawrence [8] builds on the

Pac-Tape work, but applies a genetic algorithm to evolve

a string of directions (north, south, east, west) to traverse a

maze. The aim was to evolve interesting patterns for solving

particular mazes. Unfortunately, success was limited, per-

haps due to unfavourable interactions between the genetic

operators (esp. crossover) and the representation used.

Kalyanpur and Simon [6] use a genetic algorithm to try

to improve the strategy of the ghosts in a Pac-Man-like

game. Here the solution produced is also a list of directions

to be traversed. A neural network is used to determine suit-

able crossover and mutation rates from experimental data.

Finally, De Bonet and Stauffer [2] describe a project using

reinforcement learning to develop strategies simultaneously

for Pac-Man and the ghosts, by starting with a small, simple

maze structure and gradually adding complexity.

The aim of our research is to investigate techniques to

develop an adaptive agent that learns inductively to play

Pac-Man. Our approach is aimed at producing agents that

can learn based only on the information that is available

to a human playing the game (or quantities that a human

could conceivably estimate in real-time). In particular, the

agent should not be able to exploit internal knowledge about

the game (e.g, the programmed behaviour of the ghosts) -

the (software) agent should be considered external from the

game software. Furthermore, this agent should learn gen-

eralizable strategies for good game play, based not on the

exact structure of a given maze but rather on more general

principles of good strategies to play Pac-Man.

3 Designing an Artiﬁcially Intelligent Agent

that learns to play Pac-Man Inductively

Pac-Man is a simple game to describe and play. Human

players have little difﬁculty in learning the basics of game

play, which are to obtain as many points as possible (by

clearing mazes of dots and eating power pills and ghosts),

whilst avoiding the ghosts. However there is not obviously

one correct way of specifying such a strategy precisely, per-

haps because it is largely based on perception of the visual

presentation of the game and its dynamics in time.

Moving beyond this basic approach makes clear the full

complexity of the game domain. A large point incentive is

provided for eating power pills, followed by eating ghosts in

the few seconds that the power pill remains effective, but the

ghosts generally try to retreat from Pac-Man in this time to

avoid being eaten. Thus an effective strategy might involve

attracting ghosts to the area of the maze near a power pill

before eating it. The ghosts movements are dependent on

each other and typically contain an element of randomness,

making it difﬁcult to predict their behaviour and plan around

it.

Our approach here is to initially remove much of the

complexity from the game to see if it is possible to produce

an effective agent in a simpliﬁed Pac-Man environment. In

the experiments discussed below, only the dots and a sin-

gle ghost are present in a maze with Pac-Man. While this

clearly removes several interesting aspects of game-play, we

believe that what remains (learning to avoid the ghost and

perhaps to include random exploration) is still a fundamen-

tal aspect of the full Pac-Man game.

We were motivated to begin by using a simple, trans-

parent representation for the agent. A representation that

was potentially capable of capturing the general aspects of

a basic human strategy provides a good platform on which

to test the feasibility of using an evolutionary approach to

learning, as well as serving as a benchmark for comparing

future work. It was also interesting to test if the evolutionary

algorithm would generate an agent with an obvious strategy.

These considerations lead to the agent being represented as

a two-state ﬁnite state machine, with a set of probabilistic

rules for each state that control the movement of the agent.

At any given time, the state of the agent is determined by the

distance between Pac-Man and the ghost. If this distance is

greater than some given value,

, the agent is in the Ex-

plore state, while if the distance is less than

, the agent

switches to the Retreat state.

A human player is able to see the entire maze during

game play, but dynamically is usually focusing their atten-

tion primarily on the immediate area of the maze where

Pac-Man is currently located. Planning a long sequence of

moves in advance would not only be very difﬁcult for a hu-

man, but also ineffective, since the dynamics of the game

would be likely to make such a strategy poor and irrelevant.

Our initial approach for an artiﬁcial Pac-Man agent is there-

fore to view play as a small number of possible turn types,

together with basic global information such as the position

and movement of the ghost. Figure 2 categorizes the en-

tire maze in terms of turn types, including straight corridors.

Note that the representation used below considers turn types

from the current orientation of Pac-Man, so that the “true”

orientation of a turn type does not need to be directly con-

sidered (e.g., vertical versus horizontal corridors).

Figure 2: The ﬁrst Pac-Man maze in terms of turn types.

Some types are more common than others.

At each time step (tick) of the game, the agent is located

in an instance of a certain turn-type, and needs to produce

an output that becomes the current direction of movement

for Pac-Man. Depending on the turn-type of the current po-

sition (corridor, L-turn, T-junction, intersection), there are

a number of different feasible directions to move, including

maintaining the current direction (but excluding running di-

rectly into walls). In the Explore state, a probabilistic de-

cision is made to choose one of the feasible directions de-

pending on the turn type. As a result, Pac-Man is able to

explore the maze in a pseudo-random fashion. Exploration

does not use any other game information to determine the

agent’s output. In the Retreat state, the agent additionally

considers the location of the ghost to determine the output.

This results in the need to consider a number of possible

turn-type and ghost position combinations. We classiﬁed

the ghost’s position as being either forward, back, left, right,

forward-left, forward-right, back-left, or back-right relative

Agent()

while (game is in play)

currentdistance = Distance(pacman,ghost)

if currentdistance

¡£¢

¡

then

Explore

else

Retreat

end

Explore()

switch(turntype)

case “corridor”

newdir = Random(P,prevdir,turntype)

case “L-turn”

newdir = Random(P,prevdir,turntype)

case “T-turn”

newdir = Random(P,prevdir,turntype,orientation)

case “Intersection”

newdir = Random(P,prevdir,turntype)

Retreat()

switch(turntype)

case “corridor”

newdir = Random(P,prevdir,turntype,ghostpos)

case “L-turn”

newdir = Random(P,prevdir,turntype,orientation,ghostpos)

case “T-turn”

newdir = Random(P,prevdir,turntype,orientation,ghostpos)

case “Intersection”

newdir = Random(P,prevdir,turntype,ghostpos)

Figure 3: Pseudo-code for the Pac-Man agent strategy.

to Pac-Man. Note that the ﬁrst four classiﬁcations are de-

termined using only one coordinate dimension. For exam-

ple, if Pac-Man is currently facing north (up) on the screen,

the ghost would be classiﬁed as being “back” if its current

y-coordinate is less than Pac-Man’s current y-coordinate.

Adding the ﬁnal four classiﬁcations are one way of reﬁning

this information. For the same example (Pac-Man facing

north), the ghost’s location will be classiﬁed as follows:

back if

¡£¢¥¤§¦©¨

¥! "$#%'&

and

¡£¢¥¤§¦©¨)(10

'2 "$#

(

back-left/left-back if

¡3¢'¤§¦'¨)

4¥! "$#

and

¡£¢¥¤§¦©¨

(

56¥2 "¥#

(

back-right/right-back if

¡3¢¥¤¦'¨)

76¥2 "¥#

and

¡£¢¥¤§¦©¨

(98

6¥2 "¥#

(

forward if

¡3¢¥¤¦'¨))

'2 "$#§'&

and

¡3¢'¤§¦'¨))(@0

'2 "$#

(

forward-left/left-forward if

¡£¢¥¤§¦©¨

8

6¥2 "¥#

and

¡3¢'¤§¦'¨)

(

A6¥2 "¥#

(

forward-right/right-forward

¡£¢¥¤§¦©¨

¥! B¥#%'&

and

¡£¢¥¤§¦©¨)(

¥! "$#%(¥&

left if

¡£¢¥¤§¦©¨

'2 "$#§'&

and

¡£¢¥¤§¦©¨)(

'2 "$#

(

right if

¡3¢'¤§¦'¨)

0

'2 "$#

and

¡£¢¥¤§¦©¨

(

'2 "$#

(

Note also that, for example, back-left is equivalent to left-

back as indicated in the above. Unfortunately, having eight

classes for the position of the ghost for each turn-type means

that the ruleset (and the number of parameters) also gets

larger. Hence in our implementation, eight classes were

used to classify the ghost’s position for intersections, with

only the ﬁrst four classes used for other turn-types. Pseudo-

code for the agent’s strategy is shown in Figure 3.

Implementation of the agent based on the speciﬁcations

above leads to a total of 85 adjustable parameters, collected

into a parameter vector

0D¡

FE!G!GHG!E

6IQP

&

, that controls the

agent’s behaviour. A sub-component of

is used to decide

Pac-Man’s current (new) direction of movement, depending

on the previous direction (prevdir), turn type, orientation,

and (in retreat mode) the ghost’s location. Parameter

the distance value at which the agent shifts from Explore

to Retreat (and vice-versa), calculated as the direct (i.e., ig-

noring maze walls) Manhattan distance between Pac-Man

and the ghost. All other parameters of P are probability val-

ues,

RTS

SXW

0a`

E!G!GHG2Eb©c

. Table 1 gives a complete

speciﬁcation of these values.

Parameter

Description

Distance to ghost

Explore:

6d!eVf

Corridor: forward, backward

gHe

P

L-turn: forward, backward

6h!e

T-turn (a) approach centre

T-turn (b) approach left

d2e

T-turn (c) approach right

Intersection

Retreat:

epd)q

Corridor: ghost forward

epdd

Corridor: ghost behind

dQf2epd)g

L-turn: ghost forward

epdh

L-turn: ghost behind

dQr2epd

T-turn (a): ghost behind

fq!epfd

T-turn (b): ghost behind

fQf2epf

T-turn (c): ghost behind

fQh2epf

T-turn (a): ghost on left

i

eVg

T-turn (b): ghost on left

gsd2eVgg

T-turn (a): ghost on right

P

eVgQr

T-turn (b): ghost on right

T-turn (b): ghost forward

T-turn (c): ghost forward

g!e

Intersection : ghost forward

6PQI

eph

Intersection : ghost behind

hQd2eph

Intersection : ghost left

hQh2eph

Intersection : ghost right

rq!eprf

Intersection : ghost forward/left

rg!eprr

Intersection : ghost forward/right

Intersection : ghost behind/left

d2e

Intersection : ghost behind/right

Table 1: Description of parameter values used in the Pac-

Man agent and their usage.

4 Simulation Methodology

4.1 Fitness Function

For our simpliﬁed Pac-Man game, a ﬁtness function was

developed based primarily on the score obtained by Pac-

Man. The only way to score points in this simpliﬁed game

is to eat dots from the maze, and since there is a ﬁxed initial

number of dots in a maze, the maximum possible score for

each maze is known a priori. It was considered that the

time Pac-Man manages to survive should also be a factor in

the ﬁtness function. Note however that there also exists a

possibility that Pac-Man may successfully avoid the ghost

indeﬁnitely, but fail to clear the maze of dots. Because of

this a term was added to reward the time survived by Pac-

Man, but imposing a pre-chosen limit on this factor. The

ﬁtness function used is:

0¢¡

¤¦¥§¤¨£

¨Q2¦©

¤¦¥¤¨£

"§¨Q2¦©

¤¦¥§¤¨£

#"!

¤¦¥§¤¨£

W$#

(1)

The score and time factors in the ﬁtness function are nor-

malized by their maximum (respectively, known and cho-

sen) values for each level. Hence, the maximum ﬁtness for

each level is 2.

The movement of the ghost in the game has a small

amount of randomness, and the agent developed above

clearly produces stochastic movement for Pac-Man. As a

consequence, the ﬁtness value of each game given a ﬁxed

will also be stochastic. In such a situation it is typical

to conduct repeated evaluations of the ﬁtness function with

a given parameter set, and produce an average ﬁtness value

to be used by the evolutionary algorithm. In the simulations

below, 10 games were played per average ﬁtness evaluation.

Recall that Pac-Man has three lives in a game, meaning that

Pac-Man “runs” in the maze 30 times for each ﬁtness eval-

uation used in the EA.

4.2 Algorithm

The algorithm used to learn the parameters of an agent is

an implementation of population-based incremental learn-

ing (PBIL) for continuous search spaces [1, 10, 11]. PBIL

replaces the conventional genetic operators of mutation and

recombination with a probability vector, which is used to

generate each population and is updated via a learning rule

based on the single best individual in the current popula-

tion. Each component of the probability vector represents

the mean of a Gaussian distribution. In our experiments the

population size was set to 25 (computational time prevented

a larger value). For the standard deviations of the Gaussians

and the PBIL learning rate parameter,

, several different

values were tested. Note that for

, this algorithm is

equivalent to a simple

E§&

evolution strategy with a con-

stant standard deviation value for all variables [4].

Vector

Mean

Std. Dev.

Min.

Max.

C('

0.215

0.080

0.104

0.511

0.613

0.316

0.187

1.699

C('

1.372

0.522

0.276

2.062

Table 2: Results for hand-coded parameter vectors tested,

summarizing ﬁtness values from 50 games with each pa-

rameter vector.

The probability vector was initialized with

- this value

chosen such that the agent initially spent

roughly equal amounts of time in the Explore and Retreat

states.

was allowed to evolve as a continuous value, al-

though the Manhattan distance between Pac-Man and the

ghost is integer-valued. The remaining parameters repre-

sented probabilities for subsets of variables, so they were

initialized with uniform probability for each feasible output

direction of the agent (e.g.

Vd!eVf

0

R

These parameters were re-normalized after being updated at

each generation by the PBIL learning rule. Note that PBIL

evolves a probabilistic model of the search space which con-

verges towards a locally or globally optimal value. In this

paper we interpret the probability vector learnt by the al-

gorithm as the realization of our evolved “agent”, and a

population-based search is used to perform this task.

5 Results

5.1 Hand-coded Agent Parameters

A feature of the agent implementation described above is

that it is transparent: each parameter has a clear function in

the control of the output of the agent. It is therefore possible

to experiment with an agent of hand-coded parameters. The

rule sets allow for basic explore and retreat behaviour only,

so the hand-coded parameters were chosen to try and pro-

duce an agent with the ability to explore the maze widely in

the “Explore” state, and to retreat from the ghost to avoid

being eaten in the “Retreat” state. Results for the different

hand-coded parameter vectors tested are shown in Table 2.

Consider ﬁrstly an agent,

C)'

, with equal probabilities

assigned to each parameter in each relevant subset of val-

ues. The behaviour of this agent was observed to be highly

erratic, because equal probabilities mean that Pac-Man is

just as likely to reverse his direction at any point in time as

he is to choose any other feasible direction. As a result, the

movement of Pac-Man is dominated by oscillations (e.g.,

forwards-backwards in a corridor) around the current point.

This is reﬂected in the low ﬁtness values observed for agent

in Table 2.

For the next parameter vector tested (

'

d

), in the Ex-

plore state a lower probability was assigned to reversing the

current direction, with probabilities for all other feasible di-

rections assigned uniformly (e.g. for a corridor to move for-

wards and backwards respectively,

d

0

; for

an intersection to move forwards, backwards, left and right

respectively

). For the Retreat

state, zero probability was assigned to the direction that led

to the ghost, with all other feasible options given uniform

probability. This agent had reduced oscillations in its move-

ment and was observed to have some ability to retreat from

the ghost over “chase” sequences of turns (see Table 2 for

results). However, the agent only occasionally comes close

to clearing a maze (surviving long enough to do so) and still

contains a harmful degree of oscillation in its output.

Finally, a parameter vector (

C)'

) was tested that reﬁned

'

d

by having very low probability values for reversing the

current direction. For example, for a corridor to move for-

wards and backwards respectively,

¢£¢§E

R§W

;

for an intersection to move forwards, backwards, left and

right respectively

¡¤¡

R§W

). This agent

produced improved behaviour (Table 2), typically coming

close to clearing the maze and occasionally doing so (6 out

of the 50 trials produced ﬁtness values above 2.0). More

importantly (since the Explore state for the agent does not

have any knowledge of the dots in the maze), the agent is

able to evade the ghost for long periods of time, due to the

Retreat state. Nevertheless, the limitations of the agent be-

came clear from observing the performance of this agent.

The ruleset decides the move direction based on the cur-

rent location only. In retreat mode, the position of the ghost

is considered but only crudely measured. As an example,

consider a situation where Pac-Man is located at one of the

extreme corners of the maze (ref. Figure 1 or Figure 2). For

say, the top-right corner, the ghost will usually be moving

towards Pac-Man from a below-left position. Because of

the limited way that the agent considers the position of the

ghost, moving left is undesirable because the ghost is to the

left, but moving downward is also undesirable because the

ghost is also downward. Oscillation results, until the ghost

becomes very close to Pac-Man. Other situations allow the

agent to be captured by the ghost because equiprobable al-

ternatives for retreating can lead to oscillatory behaviour.

5.2 PBIL-evolved Agent

Figure 4 shows the result of the PBIL algorithm evolving a

Pac-Man agent parameter vector, in terms of the minimum,

mean, and maximum ﬁtness values in the population distri-

bution. Although the game code was modiﬁed to make play

as fast as possible, 250 games are played in each genera-

tion of the algorithm (population of 25, 10 games per ﬁt-

ness evaluation), meaning several days of simulation time.

For this experiment, a standard deviation of 0.01 was used

for all parameters except

, which used a value of 1.0, and

the learning rate was

. The mean of the population

100

200

300

400

500

600

700

0.2

0.4

0.6

0.8

1.2

1.4

1.6

Generations

Fitness

Figure 4: Evolution of the minimum (crosses), mean (line)

and maximum (dots) ﬁtness values in the population for

PBIL with

shows a smooth learning curve compared to other standard

deviation/learning rate values that we experimented with.

After 722 generations the population mean is around 0.8,

which is better than our hand-coded parameter sets

and

but below the performance of

. We were interested in

the inﬂuence of the PBIL learning rate on the results of the

algorithm. Figure 5 shows the result of a different experi-

ment, with

0

W

(note that less generations have been

performed). This is equivalent to a

-evolution strat-

egy. It is evident that this algorithm is able to initially pro-

vide more rapid improvement in ﬁtness, but after approxi-

mately 100 generations progress is much more noisy than

that of Figure 4. In preliminary experiments we observed

that a larger standard deviation value had a similar effect.

The results suggest that a kind of annealing scheme for ei-

ther of these parameters may allow the algorithm to con-

verge more reliably. Overall the performance of the algo-

rithms with different learning rates is similar in terms of the

kind of ﬁtness values obtained during learning. The prob-

ability vectors produced by the the two algorithms in the

experiments above were examined manually. Many of the

parameters appeared to be converging towards values simi-

lar to the hand-coded strategy

C)'

f

above. It is clear from

the results shown in Figures 4 and 5 that the probability

vector is still subject to signiﬁcant perturbation, making it

difﬁcult to interpret without extending the experiment over

more generations.

6 Discussion

This paper has developed an approach to constructing an

artiﬁcial agent that learns to play a simpliﬁed version of

100

150

200

250

300

350

0.2

0.4

0.6

0.8

1.2

1.4

1.6

Generations

Fitness

Figure 5: Evolution of the minimum (crosses), mean (line)

and maximum (dots) ﬁtness values in the population for

PBIL with learning rate = 1.0 (i.e. (1,25)-ES).

Pac-Man. The agent is speciﬁed as a simple state machine

and parameterized ruleset, with the PBIL algorithm used to

learn suitable values for these parameters. Hand-coded pa-

rameters were also tested.

The results highlight the limitations of the representa-

tion used as well as some of the complexities of the game,

even in the highly simpliﬁed form considered in these exper-

iments. The ruleset could certainly be improved given the

difﬁculties observed in our simulations. For example, con-

sidering 8 classiﬁcations for the ghost’s position for each

turn type (Section 3 should reduce the problems observed

in Section 5.1. This would however necessarily result in a

larger ruleset. The representation also has redundancies in

the way that probability values are represented (e.g. replace

. Nevertheless, the representation seems to

have serious limitations for scaling up the “intelligence” of

the agent. This may result in a very high dimensional op-

timization problem which would require a large amount of

computation time to allow enough generations for the algo-

rithm to produce a good result. Given the complexities of

the full game compared to the version considered here, it

seems that extending on a ruleset-based representation may

be difﬁcult and impractical.

The methodology used in this paper is a ﬁrst attempt at

this problem and there are several factors that might be im-

proved. Firstly, it is clear that the different strategies (hand-

coded and learned) used in this paper do not come close to

capturing the variety of possible effective strategies used by

humans in the full Pac-Man game. We hypothesize that the

basic explore/retreat approach is likely to have a role in an

effective strategy for the full Pac-Man game, but this has

not been veriﬁed. Secondly, our decision to use an agent

that can consider only “localized” information of the game

is a factor that deserves further consideration. Finally, the

ﬁtness function used is based on both score achieved and

time taken. The impact of the time factor on our results is

not clear and it may be possible to remove this from the

ﬁtness function with no adverse effects.

Nevertheless, we believe that the approach taken here

could be useful as a benchmark in considering different rep-

resentations and approaches to evolving a Pac-Man playing

agent. It is expected that future work would need to be

able to improve on the performance (ﬁtness) of the agent.

However it will also be interesting to compare the com-

plexity (dimensionality, interpretability, computational re-

quirements) of alternative approaches, to the rule-based ap-

proach developed above. Finally, we conjecture that general

aspects of the approach taken here to developing an adap-

tive agent for Pac-Man may eventually lead to techniques

that can be applied successfully to other real-time computer

games.

Bibliography

[1] S. Baluja. Population-Based Incremental Learning: A

method for integrating genetic search based function

optimization and competitive learning. Technical Re-

port CMU-CS-94-163, School of Computer Science,

Carnegie Mellon University, 1994.

[2] J. S. De Bonet and C. P. Stauffer. Learning to play

pacman

using

incremental reinforcement learning. Retrieved from

http://www.ai.mit.edu/people/stauffer/Projects/PacMan/

(19/06/03), 1999.

[3] P. Funes, E. Sklar, H. Juill´e, and J. Pollack. Animal-

animat coevolution: using the animal population as

ﬁtness function.

In R. Pfeifer et. al., editor, From

Animals to Animats 5: Proceedings of the Fifth In-

ternational Conference on Simulation of Adaptive Be-

haviour, pages 525–533. MIT Press, 1998.

[4] M. Gallagher. Multi-layer perceptron error surfaces:

visualization, structure and modelling. PhD thesis,

Dept. Computer Science and Electrical Engineering,

University of Queensland, 2000.

[5] S.

Gugler.

Pac-tape.

Retrieved

from

http://www.geocities.com/SiliconValley/Bay/4458/

(21/05/01), 1997.

[6] A. Kalyanpur and M. Simon.

Pacman using ge-

netic algorithms and neural networks.

Retrieved

from http://www.ece.umd.edu/

adityak/Pacman.pdf

(19/06/03), 2001.

[7] J. Koza. Genetic Programming: On the Programming

of Computer by Means of Natural Selection.

MIT

Press, 1992.

[8] S. Lawrence.

Pac-man autoplayer.

Retrieved

from http://www.cis.rit.edu/

jerry/Software/autopac/

(21/05/01), 1999.

[9] J. P. Rosca. Generality versus size in genetic program-

ming. In J. Koza et. al., editor, Genetic Programming

(GP96) Conference, pages 381–387. MIT Press, 1996.

[10] S. Rudlof and M. K¨oppen.

Stochastic hill climb-

ing with learning by vectors of normal distribu-

tions.

In 1st Online Workshop on Soft Comput-

ing, Retrieved from http://www.bioele.nuee.nagoya-

u.ac.jp/wsc1/ (8/12/99), 1996.

[11] M. Sebag and A. Ducoulombier.

Extending

population-based incremental learning to continuous

search spaces. In A. Eiben et. al., editor, Parallel Prob-

lem Solving from Nature - PPSN V, volume 1498 of

Lecture Notes in Computer Science, pages 418–427,

Berlin, New York, 1998. Springer.

Yüklə 146,26 Kb.

Dostları ilə paylaş: