Artiﬁcial Intelligence 134 (2002) 57-83

Yüklə 259,28 Kb.

Pdf görüntüsü

səhifə	10/10
tarix	16.11.2017
ölçüsü	259,28 Kb.
	#10667

1 2 3 4 5 6 7 8 9 10

9. Conclusion
Acknowledgements
Appendix A. Evaluation tables and registers
Appendix B. Summary of Deep Blue and predecessors

78

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

The extended book was introduced into Deep Thought 2 in 1991, and was used with

good success through the matches with Kasparov. In [6], an example is given of how the

extended book worked in game 2 of the 1997 Kasparov–Deep Blue match.

8.3. Endgame databases

The endgame databases in Deep Blue includes all chess positions with ﬁve or fewer

pieces

20

on the board, as well as selected positions with six pieces that included a pair

of blocked pawns. The primary sources for these databases were the Ken Thompson CD-

ROMs [27] and additional databases supplied by Lewis Stiller.

Endgames databases were used both off-line and on-line. The off-line usage was during

the design of the chess chips. Each chip had a ROM which stored patterns to help evaluate

certain frequently occurring chess endgames. The databases were used to verify and

evaluate these patterns. See Section 5 for more details.

The software search used the databases in on-line mode. Each of the 30 general purpose

processors in the Deep Blue system replicated the 4-piece and important 5-piece databases

on their local disk. The remaining databases, including those with 6 pieces, were duplicated

on two 20-GB RAID disk arrays, and were available to all the general purpose processors

through the SP switch.

Endgames were stored in the databases with one bit per position (indicating lose or does-

not-lose). If a position is reached during the search that had a known value, it received

a score composed of two parts: a high-order, game theoretic value, and a low-order,

tie-breaker value. The tiebreaker value is simply the value produced by the evaluation

function on the position in question. If this score is sufﬁcient to cause a cutoff, the search

immediately backs up this score.

For example, suppose Deep Blue had to choose between various possible continuations

that resulted in it playing the weak side of a rook and pawn versus rook endgame. Deep

Blue would, of course, prefer drawn positions over lost ones. In addition, given the

choice between different drawn positions, it would choose the one with the best evaluation

function value.

The endgame databases did not play a critical role in the matches against Kasparov. In

the 1997 match, only game 4 approached an endgame that required access to the databases,

but the ROMs on the chess chips had sufﬁcient knowledge to recognize the rook and pawn

versus rook draws that could have arisen.

8.4. Time control

Chess games typically have a set of requirements on the speed of play, termed the “time

control”. For example, the Deep Blue–Kasparov games initially required 40 moves to be

played in two hours. Failure to make the speciﬁed number of moves leads to forfeiting the

game.

The time control mechanism in Deep Blue is relatively straightforward. Before each

search, two time targets are set. The ﬁrst is the normal time target, set to be approximately

We use the term “piece” to also include pawns.

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

the time remaining to the next time control divided by the moves remaining. In practice,

a considerable time buffer is reserved, which allows for sufﬁcient time to handle technical

difﬁculties, as well as saving time for a possible “sudden-death” phase. The second time

target is the panic time target, which is roughly one third of the remaining time.

If, at the normal time target, the situation is “normal”, a time-out occurs and the current

best move is played. There are a few conditions under which Deep Blue will go beyond its

normal target into “panic time”.

• The current best move has dropped 15 points or more from the score of the previous

iteration. In this case, the search continues until either a new move is found within the

15 point margin, the iteration is completed, or the panic time target is reached.

• The best move from the previous iteration is in a potential “fail-low” state. Continue

until this state is resolved, or the panic time target is reached. If this move ends up

dropping 15 points or more from its prior value, continue as in the previous case.

• A new move is in a potential “fail-high” state. Continue until this state is resolved, or

the panic time target is reached.

These conditions are triggered frequently during a game, but it is quite rare to actually

go all the way to the panic time target. In the 1997 match with Kasparov, this happened

only once.

9. Conclusion

The success of Deep Blue in the 1997 match was not the result of any one factor. The

large searching capability, non-uniform search, and complex evaluation function were all

critical. However other factors also played a role, e.g., endgame databases, the extended

book, and evaluation function tuning.

It is clear, however, that there were many areas where improvements could have been

made. With additional effort, the parallel search efﬁciency could have been increased.

The hardware search and evaluation could have been made more efﬁcient and ﬂexible

with the addition of an external FPGA. Current research (e.g., [11]) suggests that the

addition of pruning mechanisms to Deep Blue might have signiﬁcantly improved the

search. Evaluation function tuning, both automatic and manual, was far from complete.

In the course of the development of Deep Blue, there were many design decisions that

had to be made. We made particular choices, but there were many alternatives that were

left unexplored. We hope this paper encourages further exploration of this space.

Acknowledgements

We would like to gratefully acknowledge the help of the following people: C.J. Tan, Jerry

Brody, Joel Benjamin, John Fedorowicz, Nick De Firmian, Miguel Illescas, Lewis Stiller,

Don Maddox, Gerald Tesauro, Joefon Jann, Thomas Anantharaman, Andreas Nowatzyk,

Peter Jansen, and Mike Browne. We would also like to thank the editor and the anonymous

referees for their helpful comments.

The following are trademarks of International Business Machines Corporation: IBM,

Deep Blue, RS/6000, SP, AIX, Micro Channel.

80

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

Appendix A. Evaluation tables and registers

• Rooks on seventh: There are 12 8-bit registers, for the various combinations of

{White, Black}

× {single, doubled} × {regular, absolute-king-trapped, absolute-

king-not-trapped}.

• Bias: This register was designed for use with external (off-chip) hardware.

• Rook behind passed: A bonus is awarded for a rook behind a passed pawn of the same

color.

• Mpin and hung: A bonus is awarded when the opponent has a piece that is hung and

pinned against a piece equal in value to the pinner.

• Pinned and simple hung: A bonus is awarded when the opponent has a piece that is

pinned and hung.

• Hung: A bonus is awarded if the opponent has a piece that is hung.

• Xraying: A bonus is awarded for having an “xray attack”, i.e., an attack masked by

one’s own piece.

• Pinned and hung: A bonus is awarded when the opponent has a piece that is both

pinned and hung. “Hung” is distinguished from “simple hung” by a more detailed

analysis of the capture sequence.

• Permanent pin and simple hung. A bonus is awarded for a permanent pin of a piece.

• Knight trap: These 6 registers (3 for each side) provide penalties for some frequently

occurring situations where knights can get trapped.

• Rook trap: These 8 registers (4 for each side) provide penalties for some frequently

occurring situations where rooks can get trapped.

• Queen trap: These 2 registers (1 for each side) provide penalties for when there is no

safe queen mobility.

• Wasted pawns: A penalty is assessed for pawn groups that have “wasted” pawns.

A pawn group has a wasted pawn if it is unable to create a passed pawn against a

smaller group of enemy pawns. There are two values, one each for White and Black,

and the penalties are dynamically scaled by the amount of material on the board at the

time of evaluation.

• Bishop pair: A bonus may be awarded for having a pair of bishops. There are two

values, on each for White and Black.

• Separated passed: Widely separated passed pawns are advantageous if trying to win

in a bishops of opposite color endgame.

• Missing wing: Provides a penalty for the side that is ahead if it allows itself to be left

with pawns on only one side.

• BOC: A set of reductions on the evaluation for pure bishops of opposite color, and

various combinations of major pieces and bishops of opposite color.

• Side to move: A bonus may be given for being on move.

• Multiple pawns: There are two 80-entry tables, for the various combinations of

{White, Black}

× {backward, not backward} × {opposed, unopposed} × {doubled,

tripled+}

× {ah, bg, cf, de} × {isolated, pseudo-isolated, supported, duo}. Some

combinations are not legal. There is an 8-bit penalty value associated with each

realizable combination.

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

• Minor on weak: This pair of tables gives bonuses for minor pieces on weak squares,

taking into account such factors as advancement, centrality, pawn support, minor

piece type, challengability, and screened status.

• Self block: This table assesses how each side’s bishops are restrained by their own

pawn.

• Opp block: This table assesses how each sides’s bishops are restrained by the

opponents’s pawns.

• Back block: This table assesses how doubled pawns can restrain a bishop’s mobility.

• Pinned: Pins are awarded bonuses depending on the strength of the pin and the amount

of pressure on the pinned piece.

• Mobility: The mobility to each square on the board is summed to produce on overall

mobility score. Each square has 16 possible levels of mobility, from maximum Black

control through maximum White control. The control level and the square location

determine each square’s contribution.

• Pawn structure: This table assesses various features of pawn structure not handled

elsewhere.

• Passed pawns: This table assesses the worth of passed pawns.

• Joint signature: This table allows adjustments to be made for particular piece

matchups.

• Rooks on ﬁles: There are two 192-entry tables, for the various combinations of

{White, Black}

× {opposed, unopposed} × {4 blockage types} × {semi-open, not

semi-open}

× {ah, bg, cf, de} × {0, 1, or 2 rooks}. There is an 6-bit value associated

with each combination, and two 2-bit ﬂags indicating how the value is to be used in

the king-safety calculation.

• Bishops: Bishops are awarded bonuses for the value of the diagonals they control. The

diagonals are individually assessed, and include factors such as transparency (ability

to open at will), king safety, and target of attack.

• Pawn storm: This table helps to assess pawns being used to attack the opponent’s

king.

• Pawn shelter: This table evaluates the pawn shelter that protects or may protect one’s

own king.

• Development: This table measures the differences in development between the two

sides, factors in the king situation, and gives a bonus to the side that is ahead in

development/king safety.

• Trapped bishop: This table is used to detect situations where bishops can become

trapped.

• Signature: These tables, one for each side, allow adjustments to be made for particular

piece combinations that work well or poorly together. For example, queen and knight

are thought to cooperate better than queen and bishop.

• Contempt: The table gives the adjustment to the draw score. It is used to either prefer

or avoid draws, depending on the opponent and the match situation. The adjustment

is material dependent.

• Piece placement: This table, in common use in most chess programs, has one entry

for each piece type on each square of the board.

82

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

Table B.1

Summary of systems

System name

First played

Processors

Nodes/second

Feature groups

ChipTest

1986

50K

ChipTest-m

1987

1

400K

Deep Thought

1988

2–6

700K–2M

Deep Thought 2

1991

14–24

4M–7M

Deep Blue I

1996

216

50M–100M

Deep Blue II

1997

480

100M–200M

Deep Blue Jr.

1997

20M–30M

Deep Blue Jr. demo

1997

1

2M

38

Appendix B. Summary of Deep Blue and predecessors

Table B.1 gives some characteristics of the various systems in the Deep Blue lineage.

There are a number of qualiﬁcations on the values in this table.

• The ChipTest and Deep Thought systems used software adjustments to account for

features not recognized by the evaluation hardware.

• The Deep Blue systems allowed the evaluation features to interact in ways that were

not possible in earlier systems.

• The demo version of Deep Blue Jr. was restricted to one second of computation

time on one processor, did not detect repetition with game history, and had a ﬁxed

evaluation function that did not vary with game stage.

References

[1] T.S. Anantharaman, Extension heuristics, ICCA J. 14 (2) (1991) 135–143.

[2] T.S. Anantharman, M.S. Campbell, F.-h. Hsu, Singular extensions: Adding selectivity to brute-force

searching, Artiﬁcial Intelligence 43 (1) (1990) 99–110. Also published in: ICCA J. 11 (4) (1988) 135–143.

[3] D.F. Beal, Experiments with the null move, in: D.F. Beal (Ed.), Advances in Computer Chess 5, Elsevier

Science, Amsterdam, 1989, pp. 65–79.

[4] H. Berliner, Chess as problem solving: The development of a tactics analyzer, Ph.D. Thesis, Carnegie

Mellon University, Pittsburgh, PA, 1974.

[5] M.G. Brockington, A taxonomy of parallel game-tree search algorithms, ICCA J. 19 (3) (1996) 162–174.

[6] M. Campbell, Knowledge discovery in Deep Blue, Comm. ACM 42 (11) (November 1999) 65–67.

[7] J.H. Condon, K. Thompson, Belle chess hardware, in: Advances in Computer Chess 3, Pergamon Press,

Oxford, 1982, pp. 45–54.

[8] R. Feldmann, Spielbaumsuche mit massiv parallelen Systemen, Ph.D. Thesis, Universität-Gesamt-

hochschule Paderborn, Germany, 1993.

[9] G. Goetsch, M.S. Campbell, Experiments with the null-move heuristic, in: T.A. Marsland, J. Schaeffer

(Eds.), Computers, Chess, and Cognition, Springer, Berlin, 1990, pp. 159–168.

[10] W. Gropp, E. Lusk, A. Skjellum, Using MPI, 2nd Edition, MIT Press, Cambridge, MA, 1999.

[11] E.A. Heinz, Scalable Search in Computer Chess, Friedrick Vieweg & Son, 2000.

[12] F.-h. Hsu, A two-million moves/s CMOS single-chip chess move generator, IEEE J. Solid State

Circuits 22 (5) (1987) 841–846.

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

[13] F.-h. Hsu, Large-scale parallelization of alpha-beta search: An algorithmic and architectural study, Ph.D.

Thesis, Carnegie Mellon, Pittsburgh, PA, 1990.

[14] F.-h. Hsu, IBM’s Deep Blue chess grandmaster chips, IEEE Micro (March–April 1999) 70–81.

[15] F.-h. Hsu, Behind Deep Blue, Princeton University Press, Princeton, NJ, 2002.

[16] F.-h. Hsu, T. Anantharman, M. Campbell, A. Nowatzyk, A Grandmaster chess machine, Scientiﬁc American

(October 1990) 44–50.

[17] F.-h. Hsu, T.S. Anantharman, M.S. Campbell, A. Nowatzyk, Deep Thought, in: T.A. Marsland, J. Schaeffer

(Eds.), Computers, Chess, and Cognition, Springer, Berlin, 1990, pp. 55–78.

[18] A. Junghanns, Are there practical alternatives to alpha-beta in computer chess?, ICCA J. 21 (1) (1998)

14–32.

[19] D.E. Knuth, R.W. Moore, An analysis of alpha-beta pruning, Artiﬁcial Intelligence 6 (4) (1975) 293–326.

[20] D.N.L. Levy, D. Broughton, M. Taylor, The SEX algorithm in computer chess, ICCA J. 12 (1) (1989) 10–21.

[21] D.A. McAllester, D. Yuret, Alpha-beta-conspiracy search, 1993. http://www.research.att.com/~dmac/abc.

ps.

[22] M. Newborn, Kasparov Versus Deep Blue: Computer Chess Comes of Age, Springer, Berlin, 1997.

[23] A. Reinefeld, An improvement to the scout tree-search algorithm, ICCA J. 6 (4) (1983) 4–14.

[24] D.J. Slate, L.R. Atkin, Chess 4.5—The Northwestern University chess program, in: P.W. Frey (Ed.), Chess

Skill in Man and Machine, Springer, New York, 1977, pp. 82–118.

[25] G. Tesauro, Connectionist learning of expert preferences by comparison training, in: D. Touretzky (Ed.),

Advances in Neural Information Processing Systems 1 (NIPS-88), Morgan Kaufmann, San Mateo, CA,

1989, pp. 99–106.

[26] G. Tesauro, Comparison training of chess evaluation functions, in: J. Furnkranz, M. Kumbat (Eds.),

Machines that Learn to Play Games, Nova Science Publishers, 2001, pp. 117–130.

[27] K. Thompson, Retrograde analysis of certain endgames, ICCA J. 9 (3) (1986) 594–597.

Yüklə 259,28 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 10