Artiﬁcial Intelligence 134 (2002) 57-83

Yüklə 259,28 Kb.

Pdf görüntüsü

səhifə	7/10
tarix	16.11.2017
ölçüsü	259,28 Kb.
	#10667

1 2 3 4 5 6 7 8 9 10

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

chips communicate with their host node via a Micro Channel

bus. This heterogeneous

architecture has a strong inﬂuence on the parallel search algorithm used in Deep Blue, as

discussed below.

6.1. Parallel search algorithm

To characterize the parallel search algorithm used in Deep Blue, we will use the

taxonomy given in [5].

1. Processor hierarchy. Deep Blue uses a static processor tree, with one SP node

controlling the other 29 nodes, which in turn control 16 chess chips each. The static

nature of the hierarchy is in part determined by the fact that the chess chips are not

general purpose processors and can only act as slaves. In addition, the chess chips

can only communicate directly with their host node.

2. Control distribution. Deep Blue uses a centralized control of the parallel search.

The control is managed on the SP nodes, since the chess chips do not have this

functionality.

3. Parallelism possible. Deep Blue permits parallelism under the following condi-

tions:

17

(a) Type 1 (PV) nodes. After the ﬁrst move has been examined at a PV node, all

the alternatives may be examined in parallel (with an offset window to enable

the selective search algorithm). Null move searches, used to generate threat

information for the selective search, can also be carried out in parallel. Finally,

PV nodes at the hardware/software boundary can be searched in parallel. Because

the hardware search allows only null-window searches, a number of searches are

required to determine an exact score within a window.

(b) Good type 2 nodes (nodes where the ﬁrst move “fails high”, or exceeds

expectations). Most parallel algorithms do not allow or need parallelism in this

case. Deep Blue executes reduced depth offset searches as well as the null move

searches in parallel after the ﬁrst move fails high. As above, these searches

generate information for use by the selective search.

moves after the ﬁrst are searched in parallel. After a fail high move is found, all

the alternatives are searched in parallel with a reduced depth offset search. Null

move searches also can execute in parallel at this point.

(d) Type 3 nodes (nodes where all the moves fail low). These moves can all be

searched in parallel.

4. Synchronization. Type 1 and type 2 nodes are synchronization points for Deep Blue.

The ﬁrst move must be evaluated before parallelism is allowed. There are also global

synchronization points at the end of iterations.

The terminology of node types used in this section was originally deﬁned in [19].

72

M. Campbell et al. / Artiﬁcial Intelligence 134 (2002) 57–83

6.2. Parallel search implementation

The early iterations of the Deep Blue parallel search are carried out on the master node.

There is not much parallelism in the ﬁrst few iterations, and the master is fast enough (it

has 16 chess chips) that there is little to be gained by attempting to further parallelize the

search.

As the search gets deeper, jobs get allocated throughout the system. There are three

major issues that need to be addressed:

• Load balancing. The search extensions algorithm used in Deep Blue leads to widely

varying tree sizes for a given search depth. This extends all the way to the hardware,

where the complex quiescence search can cause a search to “blow up”. This can

lead to severe load balancing problems. The solution used in Deep Blue was to

abort long-running hardware searches (more than 8000 nodes) and push more of the

search into software. This gives additional opportunities for parallelism. Similarly,

jobs on the worker nodes can abort and return their job to the master for further

splitting.

• Master overload. The performance bottleneck of the Deep Blue system was the mas-

ter processor. One method of alleviating this was to ensure that the workers always

had a job “on deck”, ready to execute when it completes its active job. This reduces

the effect of the communication latency between master and workers.

• Sharing between nodes. Workers do not directly communicate with each other. This

decision was made in order to simplify the implementation. Workers will generally

pass their “expensive” transposition table results up to the master.

As a ﬁnal point, it should be noted that the Deep Blue parallel search is non-

deterministic. Various factors can inﬂuence timing and processor job assignments.

Although this was not a major concern, it makes debugging the system much more difﬁcult.

6.3. Parallel search performance

We have limited experimental results to assess the efﬁciency of the Deep Blue

parallel search. The most accurate numbers were derived on a single-node version of

Deep Blue with 24 chess chips. We compared the 24 chip system with a single chip

system on a variety of positions. The results varied widely depending on the tactical

complexity of the position searched. For positions with many deep forcing sequences

speedups averaged about 7, for an observed efﬁciency of about 30%. For quieter positions,

speedups averaged 18, for an observed efﬁciency of 75%. The non-deterministic nature

of the search, particularly in tactical positions, makes it more difﬁcult to conduct these

measurements.

It is difﬁcult to assess the efﬁciency of the full 30-node Deep Blue system. Indirect

evidence suggests an overall observed efﬁciency of about 8% in tactical positions and about

12% in quiet positions. It is clear that there is room for improvement here. However it was

a conscious design decision of the Deep Blue team to focus on improving the evaluation

function following the 1996 Kasparov match, and the parallel search code was largely

untouched between the 1996 and 1997 matches.

Yüklə 259,28 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 10