Artiﬁcial Intelligence 134 (2002) 57-83

Yüklə 259,28 Kb.

Pdf görüntüsü

səhifə	4/10
tarix	16.11.2017
ölçüsü	259,28 Kb.
	#10667

1 2 3 4 5 6 7 8 9 10

4. Software search

64

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

appropriately. A displaced count of one also can trigger the “no progress” condition: see

Section 4.3.

3.4. Extendability

The chess chips optionally support the use of an external FPGA (Field Programmable

Gate Array) to provide access to an external transposition table, more complicated search

control, and additional terms for the evaluation function. In theory this mechanism would

have allowed the hardware search to approach the efﬁciency and complexity of the software

search. Null move search was also explicitly supported by this method. Due to time

constraints, this capability was never used in Deep Blue.

4. Software search

Based on the experiences with Deep Thought 1, a new selective search was built for

Deep Thought 2 (which would later form the basis for the Deep Blue selective search).

This search, which we call “dual credit with delayed extensions” was designed based on a

number of principles:

1. Extend forcing/forced pairs of moves. An important part of tactics in chess concerns

forcing/forced pairs (ffp’s) of moves.

These show up in various contexts, e.g.,

(a) White has an unstoppable winning threat. Black has numerous delaying moves

(e.g., checks, mate threats, attacks on high valued pieces, etc.) which demand

precise responses by White. Eventually the delaying moves run out and the White

win is discovered.

(b) White has a series of sacriﬁces and immediate threats, which eventually result in

checkmate or the win of material greater than that sacriﬁced.

Ideally one would extend the search two ply for each ffp. This almost always leads

to a “search explosion”, i.e., an effectively non-terminating search. The following

techniques are intended to address this problem.

2. Forced moves are expectation dependent. A move may be forced for one level of

expectation and not forced for another. A move that is “fail low”, i.e., below the

current level of expectation, is never considered forced in the Deep Blue search. This

restriction is also described in [2].

3. Fractional extensions [20]. It is not feasible to fully extend all the ffp’s without the

search exploding. First of all, there may be more than one reasonable response to a

forcing move, though by the deﬁnition of forcing there should only be a small number

of reasonable responses. Second, even forced moves usually have legal alternatives

which must be refuted. One method of addressing this problem is to allow fractional

extensions, where an ffp does not get a full 2-ply extension, but rather some smaller

amount, say 1.75 ply. The less forcing the ffp, the less the extension.

A move that is forced must have a backed up score signiﬁcantly better (by more than some threshold) than

the backed up score of all the available alternatives. This can be generalized to the case where there are a very

small number of moves better than all the alternatives.

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

4. Delayed extensions. Often an isolated ffp is meaningless, and in any case it is not

that hard to “search through”. Things usually become interesting (and more difﬁcult)

when there is a series of ffp’s. One response to this observation is to allow ffp’s to

accumulate “credit”, and only when sufﬁcient credit is available can it be “cashed in”

for an extension. By setting this threshold appropriately, extensions are delayed until

multiple ffp’s occur in a given path.

5. Dual credit. An immediate and serious problem that arises in the above is on the

“principal variation” (PV). The PV represents current best play for both sides, i.e.,

both sides are at their level of expectation. In this case, both sides may be forced to

some degree, and both sides are not fail low, which allows credit to be accumulated.

Clearly it is not feasible to accumulate more than 1 ply of credit for several PV

moves in a row: the search will explode. One solution to this problem is to separate

and accumulate the credit for the two sides separately.

If either side accumulates

sufﬁcient credit to cash in for an extension, the other side must give up an equal

amount of credit.

6. Preserve the search envelope. As observed in [2], it is essential to preserve the search

envelope to avoid an oscillating search.

Fig. 1 illustrates some of the basic ideas in the dual credit with delayed extensions

algorithm. The pseudo-code is based on a depth-limited version of alpha-beta using the

negamax formulation [19]. Lines added to the basic alpha-beta code are marked with a

“ * ”.

The ﬁrst difference we note is in line 5, where the two credits are passed recursively as

parameters in the call to DC. The next signiﬁcant point is lines 14 through 19. Here is where

an extension may take place. In line 14, CREDIT_LIMIT is the “cash-in” threshold. Line

15 calculates the number of plies of extension to be performed, which is the integer number

of plies needed to bring hisCredit below CREDIT_LIMIT. Given a CREDIT_LIMIT of 2,

as is used in Deep Blue, a value of hisCredit of 2.5 gives a 1-ply extension, and a hisCredit

value of 3.25 gives a 2-ply extension. Note that as the extension is performed (line 18),

both hisCredit and myCredit are reduced by the corresponding amount (though not below

zero).

Until line 26, the code is similar to alpha-beta. Line 26 swaps hisCredit and myCredit as

is required by the negamax framework, and recursively calls the search on the current

successor position. Line 28 is reached if the current move has exceeded the previous

bestScore. This is the prerequisite for credit to be given. GenerateCredit() hides a wealth

of details, including full or reduced depth offset searches and null move threat tests. If

credit is generated for the current move,

the search on the successor position is reinvoked

(line 30) with the new credit added in. This time, if the score exceeds bestScore, it becomes

the new bestScore (line 32).

To simplify this description, we have skipped over issues related to preserving the search

envelope. In an actual implementation, it would be essential to know the amount of credit

McAllester and Yuret [21] suggested separating the depth computation for the two sides, though not in the

context of a credit system.

Note that the generated credit can be fractional. Deep Blue has a granularity of 1/4 ply.

Yüklə 259,28 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 10