64
M. Campbell et al. / Artificial Intelligence 134 (2002) 57–83
appropriately. A displaced count of one also can trigger the “no progress” condition: see
Section 4.3.
3.4. Extendability
The chess chips optionally support the use of an external FPGA (Field Programmable
Gate Array) to provide access to an external transposition table, more complicated search
control, and additional terms for the evaluation function. In theory this mechanism would
have allowed the hardware search to approach the efficiency and complexity of the software
search. Null move search was also explicitly supported by this method. Due to time
constraints, this capability was never used in Deep Blue.
4. Software search
Based on the experiences with Deep Thought 1, a new selective search was built for
Deep Thought 2 (which would later form the basis for the Deep Blue selective search).
This search, which we call “dual credit with delayed extensions” was designed based on a
number of principles:
1. Extend forcing/forced pairs of moves. An important part of tactics in chess concerns
forcing/forced pairs (ffp’s) of moves.
9
These show up in various contexts, e.g.,
(a) White has an unstoppable winning threat. Black has numerous delaying moves
(e.g., checks, mate threats, attacks on high valued pieces, etc.) which demand
precise responses by White. Eventually the delaying moves run out and the White
win is discovered.
(b) White has a series of sacrifices and immediate threats, which eventually result in
checkmate or the win of material greater than that sacrificed.
Ideally one would extend the search two ply for each ffp. This almost always leads
to a “search explosion”, i.e., an effectively non-terminating search. The following
techniques are intended to address this problem.
2. Forced moves are expectation dependent. A move may be forced for one level of
expectation and not forced for another. A move that is “fail low”, i.e., below the
current level of expectation, is never considered forced in the Deep Blue search. This
restriction is also described in [2].
3. Fractional extensions [20]. It is not feasible to fully extend all the ffp’s without the
search exploding. First of all, there may be more than one reasonable response to a
forcing move, though by the definition of forcing there should only be a small number
of reasonable responses. Second, even forced moves usually have legal alternatives
which must be refuted. One method of addressing this problem is to allow fractional
extensions, where an ffp does not get a full 2-ply extension, but rather some smaller
amount, say 1.75 ply. The less forcing the ffp, the less the extension.
9
A move that is forced must have a backed up score significantly better (by more than some threshold) than
the backed up score of all the available alternatives. This can be generalized to the case where there are a very
small number of moves better than all the alternatives.
M. Campbell et al. / Artificial Intelligence 134 (2002) 57–83
65
4. Delayed extensions. Often an isolated ffp is meaningless, and in any case it is not
that hard to “search through”. Things usually become interesting (and more difficult)
when there is a series of ffp’s. One response to this observation is to allow ffp’s to
accumulate “credit”, and only when sufficient credit is available can it be “cashed in”
for an extension. By setting this threshold appropriately, extensions are delayed until
multiple ffp’s occur in a given path.
5. Dual credit. An immediate and serious problem that arises in the above is on the
“principal variation” (PV). The PV represents current best play for both sides, i.e.,
both sides are at their level of expectation. In this case, both sides may be forced to
some degree, and both sides are not fail low, which allows credit to be accumulated.
Clearly it is not feasible to accumulate more than 1 ply of credit for several PV
moves in a row: the search will explode. One solution to this problem is to separate
and accumulate the credit for the two sides separately.
10
If either side accumulates
sufficient credit to cash in for an extension, the other side must give up an equal
amount of credit.
6. Preserve the search envelope. As observed in [2], it is essential to preserve the search
envelope to avoid an oscillating search.
Fig. 1 illustrates some of the basic ideas in the dual credit with delayed extensions
algorithm. The pseudo-code is based on a depth-limited version of alpha-beta using the
negamax formulation [19]. Lines added to the basic alpha-beta code are marked with a
“ * ”.
The first difference we note is in line 5, where the two credits are passed recursively as
parameters in the call to DC. The next significant point is lines 14 through 19. Here is where
an extension may take place. In line 14, CREDIT_LIMIT is the “cash-in” threshold. Line
15 calculates the number of plies of extension to be performed, which is the integer number
of plies needed to bring hisCredit below CREDIT_LIMIT. Given a CREDIT_LIMIT of 2,
as is used in Deep Blue, a value of hisCredit of 2.5 gives a 1-ply extension, and a hisCredit
value of 3.25 gives a 2-ply extension. Note that as the extension is performed (line 18),
both hisCredit and myCredit are reduced by the corresponding amount (though not below
zero).
Until line 26, the code is similar to alpha-beta. Line 26 swaps hisCredit and myCredit as
is required by the negamax framework, and recursively calls the search on the current
successor position. Line 28 is reached if the current move has exceeded the previous
bestScore. This is the prerequisite for credit to be given. GenerateCredit() hides a wealth
of details, including full or reduced depth offset searches and null move threat tests. If
credit is generated for the current move,
11
the search on the successor position is reinvoked
(line 30) with the new credit added in. This time, if the score exceeds bestScore, it becomes
the new bestScore (line 32).
To simplify this description, we have skipped over issues related to preserving the search
envelope. In an actual implementation, it would be essential to know the amount of credit
10
McAllester and Yuret [21] suggested separating the depth computation for the two sides, though not in the
context of a credit system.
11
Note that the generated credit can be fractional. Deep Blue has a granularity of 1/4 ply.