Game of perfect information

Yüklə 522 b.

tarix	16.11.2017
ölçüsü	522 b.
	#10674

Game of perfect information

Game of perfect information
Finite game

Finite action sets
Finite length

Chess has a solution: win/tie/lose (Nash equilibrium)
Subgame perfect Nash equilibrium (via backward induction)
REALITY: computational complexity bounds rationality

Beal (1980) and Nau (1982, 83) analyzed whether values backed up by minimax search are more trustworthy than the heuristic values themselves. The analyses of the model showed that backed-up values are somewhat less trustworthy

Beal (1980) and Nau (1982, 83) analyzed whether values backed up by minimax search are more trustworthy than the heuristic values themselves. The analyses of the model showed that backed-up values are somewhat less trustworthy
Anomaly goes away if sibling nodes’ values are highly correlated [Beal 1982, Bratko & Gams 1982, Nau 1982]
Pearl (1984) partly disagreed with this conclusion, and claimed that while strong dependencies between sibling nodes can eliminate the pathology, practical games like chess don’t possess dependencies of sufficient strength.

He pointed out that few chess positions are so strong that they cannot be spoiled abruptly if one really tries hard to do so.
He concluded that success of minimax is “based on the fact that common games do not possess a uniform structure but are riddled with early terminal positions, colloquially named blunders, pitfalls or traps. Close ancestors of such traps carry more reliable evaluations than the rest of the nodes, and when more of these ancestors are exposed by the search, the decisions become more valid.”

Still not fully understood. For new results, see:

Sadikov, Bratko, Kononenko. (2003) Search versus Knowledge: An Empirical Study of Minimax on KRK, In: van den Herik, Iida and Heinz (eds.) Advances in Computer Games: Many Games, Many Challenges, Kluwer Academic Publishers, pp. 33-44
Understanding Sampling Style Adversarial Search Methods [PDF]. Raghuram Ramanujan, Ashish Sabharwal, Bart Selman. UAI-2010, pp 474-483.
On Adversarial Search Spaces and Sampling-Based Planning [PDF]. Raghuram Ramanujan, Ashish Sabharwal, Bart Selman. ICAPS-2010, pp 242-245.

Difference (between player and opponent) of

Difference (between player and opponent) of

Material
Mobility
King position
Bishop pair
Rook pair
Open rook files
Control of center (piecewise)
Others

Deep Blue used ~6,000 different features in its evaluation function (in hardware)

Deep Blue used ~6,000 different features in its evaluation function (in hardware)
A different weighting of these features is downloaded to the chips after every real world move (based on current situation on the board)

Contributed to strong positional play

Acquiring the weights for Deep Blue

Weight learning based on a database of 900 grand master games (~120 features)

Alter weight of one feature => 5-6 ply search => if matches better with grand master play, then alter that parameter in the same direction further
Least-squares with no search

Other learning is possible, e.g. Tesauro’s Backgammon

Solves credit assignment problem
Was confined to linear combination of features

Manually: Grand master Joel Benjamin played take-back chess. At possible errors, the evaluation was broken down, visualized, and weighting possibly changed

Quiescence search

Quiescence search

Evaluation function (domain specific) returns another number in addition to evaluation: stability

Threats
Other

Continue search (beyond normal horizon) if position is unstable
Introduces variance in search time

Singular extension

Domain independent
A node is searched deeper if its value is much better than its siblings’
Even 30-40 ply
A variant is used by Deep Blue

Store millions of positions in a hash table to avoid searching them again

Store millions of positions in a hash table to avoid searching them again

Position
Hash code
Score
Exact / upper bound / lower bound
Depth of searched tree rooted at the position
Best move to make at the position

Algorithm

When a position P is arrived at, the hash table is probed
If there is a match, and

new_depth(P) ≥ stored_depth(P), and
score in the table is exact, or the bound on the score is sufficient to cause the move leading to P to be inferior to some other choice

then P is assigned the attributes from the table
else computer scores (by direct evaluation or search (old best move searched first)) P and stores the new attributes in the table

Fills up => replacement strategies

Keep positions with greater searched tree depth under them
Keep positions with more searched nodes under them

State space = {WTM, BTM} x {all possible configurations of remaining pieces}

State space = {WTM, BTM} x {all possible configurations of remaining pieces}
BTM table, WTM table, legal moves connect states between these
Start at terminal positions: mate, stalemate, immediate capture without compensation (=reduction). Mark white’s wins by won-in-0
Mark unclassified WTM positions that allow a move to a won-in-0 by won-in-1 (store the associated move)
Mark unclassified BTM positions as won-in-2 if forced moved to won-in-1 position
Repeat this until no more labellings occurred
Do the same for black
Remaining positions are draws

All 5 piece endgames solved (can have > 10^8 states) & many 6 piece

All 5 piece endgames solved (can have > 10^8 states) & many 6 piece

KRBKNN (~10^11 states): longest path-to-reduction 223

Rule changes

Max number of moves from capture/pawn move to completion

Chess knowledge

Splitting rook from king in KRKQ
KRKN game was thought to be a draw, but

White wins in 51% of WTM
White wins in 87% of BTM

**~200 million moves / second = 3.6 * 10^10 moves in 3 minutes**

~200 million moves / second = 3.6 * 10^10 moves in 3 minutes
3 min corresponds to

~7 plies of uniform depth minimax search
10-14 plies of uniform depth alpha-beta search

1 sec corresponds to 380 years of human thinking time
Software searches first

Selective and singular extensions

Specialized hardware searches last 5 ply

32-node RS6000 SP multicomputer

32-node RS6000 SP multicomputer
Each node had

1 IBM Power2 Super Chip (P2SC)
16 chess chips

Move generation (often takes 40-50% of time)
Evaluation
Some endgame heuristics & small endgame databases

32 Gbyte opening & endgame database

Win-loss-draw-draw-draw-loss

Win-loss-draw-draw-draw-loss

(In even-numbered games, Deep Blue played white)

Engineering

Engineering

Better evaluation functions for chess
Faster hardware
Empirically better search algorithms
Learning from examples and especially from self-play
There already are grandmaster-level programs that run on a regular PC, e.g., Fritz

Fun

Harder games, e.g. Go
Easier games, e.g., checkers (some openings solved [2005])

Science

Extending game theory with normative models of bounded rationality
Developing normative (e.g. decision theoretic) search algorithms

MGSS* [Russell&Wefald 1991] is an example of a first step
Conspiracy numbers

Impacts are beyond just chess

Impacts of faster hardware
Impacts of game theory with bounded rationality, e.g. auctions, voting, electronic commerce, coalition formation

Yüklə 522 b.

Dostları ilə paylaş:

Game of perfect information

Game of perfect information

Game of perfect information

Finite game

Chess has a solution: win/tie/lose (Nash equilibrium)

Subgame perfect Nash equilibrium (via backward induction)

REALITY: computational complexity bounds rationality

Beal (1980) and Nau (1982, 83) analyzed whether values backed up by minimax search are more trustworthy than the heuristic values themselves. The analyses of the model showed that backed-up values are somewhat less trustworthy

Beal (1980) and Nau (1982, 83) analyzed whether values backed up by minimax search are more trustworthy than the heuristic values themselves. The analyses of the model showed that backed-up values are somewhat less trustworthy

Anomaly goes away if sibling nodes’ values are highly correlated [Beal 1982, Bratko & Gams 1982, Nau 1982]

Pearl (1984) partly disagreed with this conclusion, and claimed that while strong dependencies between sibling nodes can eliminate the pathology, practical games like chess don’t possess dependencies of sufficient strength.

Still not fully understood. For new results, see:

Difference (between player and opponent) of

Difference (between player and opponent) of

Deep Blue used ~6,000 different features in its evaluation function (in hardware)

Deep Blue used ~6,000 different features in its evaluation function (in hardware)

A different weighting of these features is downloaded to the chips after every real world move (based on current situation on the board)

Acquiring the weights for Deep Blue

Quiescence search

Quiescence search

Singular extension

Store millions of positions in a hash table to avoid searching them again

Store millions of positions in a hash table to avoid searching them again

Algorithm

Fills up => replacement strategies

State space = {WTM, BTM} x {all possible configurations of remaining pieces}

State space = {WTM, BTM} x {all possible configurations of remaining pieces}

BTM table, WTM table, legal moves connect states between these

Start at terminal positions: mate, stalemate, immediate capture without compensation (=reduction). Mark white’s wins by won-in-0

Mark unclassified WTM positions that allow a move to a won-in-0 by won-in-1 (store the associated move)

Mark unclassified BTM positions as won-in-2 if forced moved to won-in-1 position

Repeat this until no more labellings occurred

Do the same for black

Remaining positions are draws

All 5 piece endgames solved (can have > 10^8 states) & many 6 piece

All 5 piece endgames solved (can have > 10^8 states) & many 6 piece

Rule changes

Chess knowledge

~200 million moves / second = 3.6 * 10^10 moves in 3 minutes

~200 million moves / second = 3.6 * 10^10 moves in 3 minutes

3 min corresponds to

1 sec corresponds to 380 years of human thinking time

Software searches first

Specialized hardware searches last 5 ply

32-node RS6000 SP multicomputer

32-node RS6000 SP multicomputer

Each node had

32 Gbyte opening & endgame database

Win-loss-draw-draw-draw-loss

Win-loss-draw-draw-draw-loss

Engineering

Engineering

Fun

Science

Impacts are beyond just chess

**~200 million moves / second = 3.6 * 10^10 moves in 3 minutes**

**~200 million moves / second = 3.6 * 10^10 moves in 3 minutes**