The Anatomy of a Game Program |
There are many Reversi (or Othello) games on the web. How do these applets work? I don't mean the graphics stuff - every Java programmer knows how to handle this aspect, since most applets have a graphical interface. But how to write a program that plays a game? On this page, we'll examine some of the general concepts of game programming and also some of the difficulties peculiar to the game of Reversi.
Other Pages on my Reversi Applet
To determine the best move, the applet has to look ahead. If you have ever played a strategy game, you know this principle very well: "If I make this move, and he makes that move, I can win by making this move. But what if he makes that move? ..."
A game program proceeds exactly in the same manner. But there are a few technical difficulties. How do you know that a certain position in the game is "good" for you? You have some evaluation of a position, but "some" is not good enough for a program: we need to be able to formulate a clear-cut criterion for the quality of a position that we can express in the rigid framework of a programming language. Such evaluation functions depend on the nature of the game - one for chess differs of course from one for Reversi, which again is different than one for, say, Five-in-a-Row.
A common way to view evaluation functions is to encode our estimate of the quality of a position in a number. The higher the number we get from our evaluation, the better a position is for the program. The lower the number, the better the game goes for our opponent. Usually, one designs evaluation functions such that they return positive values when things are going well for the program, and negative numbers if the opponent is winning. Zero indicates a balance of power.
We'll discuss various evaluation functions for Reversi below.
Assume for the time being that we have an evaluation function at our disposal. We then can use a simple technique called "MiniMax Search" to look ahead. MiniMax search is called thusly because when we try to determine the best move, we always consider the best moves of each player. And since the evaluation function returns positive values for the program's moves and negative values for the opponent's moves, we have to consider alternatingly the maximum and the minimum of the evaluation function's results.
At some point, we have to stop looking ahead - simply because even in a simple game like Reversi, there are far too many possibilities to consider. We just cannot look ahead from the first move until the end of the game. Therefore, we need to limit the search depth of our look-ahead. When we have reached the last level, which is called the horizon because we can't look beyond it, we use the evaluation function to find out whether the position we arrived at is good. Finally, we get a game tree like the one below:
Top | +---------------+----------------+ | | A B (0) MAX +----------+---------+ +----------+----------+ a b c d (1) MIN +----+----+ +----+----+ +----+----+ | A B C D E F G H (2) MAX
This figure shows the game tree resulting from a 2-move look-ahead. On the
lowest level (2), we just call the evaluation function to find out which of
the moves A to C, D and E, F and G, and H, is the best. We take the maximum
of these evaluations as the evaluation for moves a
, b
,
c
, and d
on level one. One that level, we now take the
minimum of these values as the evaluation for moves A and B on level 0. On
the top level, we again take the maximum and then we execute the chosen move.
In the above Minimax procedure, we alternatingly have to take the minimum or the maximum of the values returned for successor positions. This makes programming a Minimax procedure a little cumbersome: we have to distinguish the two cases explicitly.
For programming a Minimax search, the Negamax approach is more comfortable. Instead of always evaluating the position from the computer's point of view (as we do in Minimax), we now evaluate it always from the point of view of the player who is to move next! The opponent's value for the same position then becomes simply the negative of that player's value.
The main advantage of this approach is that we can get rid of the two different cases. We don't have to distinguish between MIN and MAX levels anymore, instead, we always take the maximum, but we negate the position evaluation on each level. The two variants are equivalent since taking the minimum of positive values yields the same as taking the maximum of the same values, negated.
Consider the values 3, 5, 6, and 10: on a MIN-level, we choose 3. Now we always maximize, but we negate values, so we're in fact dealing with the values -3, -5, -6, and -10: the largest one is -3. In both cases, we've chosen the same position.
Alpha-Beta search is a refinement of MiniMax search that allows us to prune the game tree: if we find that it isn't worth continuing examining moves in a certain position, we backtrack immediately.
Description of Alpha-Beta search still missing...
NegaScout (i.e. minimum window search), outcome search, MT-algorithms. Description still missing.
We already mentioned the "horizon" above: it's the last level in the search tree. Sometimes, it happens that a game program plays rather bad simply because it cannot look ahead far enough. A shallow look-ahead may be sufficient in peaceful positions, but there are others, in particular during attacks, where a correct evaluation of the situation is only possible by looking ahead until we again reach a quite position.
If the program stops looking ahead at its normal lowest level in the game tree, it may get a very wrong evaluation of such positions, which may easily make it lose the whole game. This is the so-called "horizon effect". To avoid it, one has to dynamically increase the search depth in these situations.
Quiescence search does exactly that: if the program recognizes a move as an attack, it extends the game tree at that branch until it can evaluate its actions and reactions during the attack, i.e. until a situation is reached in which neither side has an imminent threat.
The horizon effect can be seen best in my applet on level "Easy": the applet sometimes plays on a so-called X-square (a square diagonally adjacent to a corner) and then lose this corner after another three or four moves. On the higher levels, it doesn't do that. This is a result of my approximate implementation of quiescence search, which works well on higher levels but is too simplistic on level "Easy". On this level, the applet only considers its own immediate possible moves, it doesn't look ahead to also consider possible answers of its opponent. Therefore it cannot foresee the loss of the corner unless it is lost immediately.
Decsription still missing...
Explanation of iterative deepening still to be written...
Note that limiting the Java applet exactly to at most n seconds for its looking ahead would require that interrupting tasks worked in the JDK. As it doesn't, the best we can do is check the elapsed time on each round of iterative deepening, and return the best move found so far if our time's up.
This approach can be improved slightly by also checking the timeout on every level in the game tree (i.e. at each position) and stopping evaluation if the time is up. That's the way I implemented it in the applet.
The endgame is played perfectly, so in the last 8 or 10 moves, depending on the "strength" setting (i.e. depth of look-ahead), the applet doesn't care about timeouts. Since endgame evaluation is much faster than earlier in the game (simpler evaluation function, less possible moves on each additional level), this is barely noticeable.
My applet does not "learn".
Description still in the works. Keywords: retrograde analysis, genetic algorithms, game library
In this section, we'll discuss the choices made for my Java applet. As game engine, it uses Alpha-Beta search with a zero-window technique (the so-called "NegaScout" algorithm) and a transposition table. Except when in the endgame it uses iterative deepening.
Hey, waiter...! No, it's not that.
In Alpha-Beta search, the order in which the moves are tried is crucial, especially when the zero-window technique is used. Trying the better moves first will create a lot more cut-offs than an arbitrary order. Therefore we not only need an evaluation function for positions, we also need to find some ordering criterion for possible moves.
In my reversi applet, I chose a mainly static order function, given by a table that contains high positive values for the corners, negative values for the adjacent fields, and other positive values for all other fields. If a move is to be played to a certain field, its value in the table gives the preference of that move - the higher the value, the better the move is (probably). The values for the corners and the adjacent fields are updated from the edge table, all other entries are static. The fields diagonally adjacent to a corner always get the negative value of the corresponding corner.
This table is used to sort the moves. Moves with a high value in the table are tried first. The table guarantees that corner moves come first, while moves to adjacent squares, especially those that are diagonally adjacent, come last.
In Reversi, there are quite a few threats. My applet considers only the most obvious ones: forced moves (i.e. there is only one move), corner moves and moves to a square adjacent to a corner. Due to computational restrictions, it doesn't look ahead until it finds a quiet position, instead, it just increases the tree depth by one level for each threatening move. While this isn't exactly a correct implementation of quiescence search, it already improves the program considerably.
Many Reversi programs don't play great - in fact, most of the Reversi applets I had found on the web were easy to beat. I think that this is due to the lousy evaluation function they use.
Simple Reversi programs just use the number of pieces of the color they play on the board as an indicator for the quality of a position. This greedy evaluation function is derived from the final goal of the game: the player who has the most pieces wins. But this only makes sense if you really can look ahead until the end of the game - which is only feasible in the last 10 to 18 moves (depending on the performance of your computer). If you can't do that, this greedy function will mislead your program. If you have a lot of pieces in any given position except at the end, this doesn't mean anything: a few moves by your opponent, and in the end, the situation may well be totally reversed.
A better criterion for determining a position is to count the number of moves a player can make: the mobility is decisive in Reversi! If you only have a few moves, it's likely that your opponent can play such that you are forced to make bad moves. Therefore, if your mobility is high, the position is better for you than if it's low.
Using simple mobility as the evaluation function still gives a disappontingly weak program. The program still plays too often "on the outside", when it would be better to have its own pieces towards the center of the position. This can be expressed by also considering potential mobility. Potential mobility can be estimated by simply counting all the opponent's pieces that are adjacent to an empty field.
It turns out that the combination of these two values makes the program play much better, though it still plays very badly. We have to take a look at a peculiar phase in an Reversi game: the edge fight. Very often, the situation along an edge - even if the corners aren't taken yet - can be decisive for the outcome of a game. What to do? Including all edge moves in the set of "critical moves" leading to a quiescence search is not feasible - the program would run far too slow, because we'd get ridiculously deep game trees.
I decided to precompute a table of evaluation values for all 6561 (38) possible situations along an edge. Actually, I should also have included the squares diagonally adjacent to the corners, but that would have given a table of 59049 (310) entries, which I deemed a bit large for a simple Java applet.
This table is precomputed by starting with all possible positions and then playing a one-dimensional Reversi game (allowing passes at any move) to the end. The resulting percentage of victories for a given starting position determines that edge configuration's value.
Now with this improvement, the program already plays quite well. Just sometimes it would play onto a square diagonally adjacent to a corner when this subsequently made it lose that corner. To cure that, I biased the priority table (see above) against such moves. In addition, I also incorporated the number of occupied corners in the evaluation function. Generally (though there are exceptions!!), it is good to move to a corner. The program therefore calculates part of the evaluation function from the ratio of the numbers of corners owned by itself and its opponent.
With these four components:
as the evaluation function, the resulting program is quite hard to beat. (As you can witness by playing a game!) Actually, the applet doesn't use mobility and potential mobility directly since these values are not symmetric. (Black and White can have very different values for the same position, and these values usually are not their negative.) Therefore, I use the difference in mobility as the criterion: this value is stable for a given position.
There are a couple of other Reversi-concepts that I didn't implement.
Good Reversi players often will try not to play an edge move. Why? The reason is that edge moves interact with the pieces on the adjacent edges, turning more pieces and thus creating opportunities for the opponent. My Reversi applet has no notion of this.
Having access to a certain region on the board is important. If we cannot access a region, our opponent can. A player can for instance lose access because all the edge discs in that area are his own. I don't think it's necessary to deal with access explicitly: the potential mobility already should take care of that.
Parity is especially important in corner regions. A player can lose parity if s/he loses access to a region of an odd number of fields. This may prove a disadvantage later on because players set pieces alternatingly, so the player might be forced to make a bad move.
The applet also lacks an opening library, and its perfect endgame play is a bit shallow: due to the performance limitations of interpreted Java, only the last 8 to 10 moves are computed fully.
The absence of an opening library doesn't seem to do too much harm. Simple mobility already is sufficient to avoid the parallel opening: the applet only plays the perpendicular or the diagonal opening.
Strategy games at one time received a lot of attention from the AI community. (Recently, this interest seems to have decreased somewhat, maybe due to the appearance of brute-force game playing computers like Deep Blue for chess.) It is no surprise that there is a vast literature and theory on strategy (zero-sum) games. Still, most of these deal with chess or checkers. On Reversi (or Othello), I found only a few:
Kierulf, Anders: Smart Game Board: a Workbench for Game-Playing Programs, with Go and Othello as Case Studies, Ph.D. Thesis No. 9135, Dept. of Computer Science, ETH Zürich, Switzerland, 1990
- Has an extensive bibliography on other papers on Go or Othello; the author is the creator of the two Othello programs Brand and Peer Gynt.
Buro, M.: Techniken für die Bewertung von Spielsituationen anhand von Beispielen, Ph.D. Thesis, Fachbereich 17 - Mathematik/Informatik, Universität Paderborn, Germany, 1994
- Written in german, but very interesting: the author is the creator of Logistello, one of the best Othello programs of the world. Unfortunately most of the advanced techniques to speed up the search of the game tree he describes are only applicable at search depths that are well beyond what a Java applet can ever hope to achieve. It also seems that his mobility approximation only yields plausible results at higher search depths - for a shallow search, this approximation sometimes misjudges positions completely, making the applet play very poorly.
Rosenbloom, Paul S.: A World-Championship-Level Othello Program, in Levy, David (Ed.), Computer Games, Vol. II, Springer 1988, pp. 365 - 405.
- Describes the Othello program IAGO.
Lee, K.; Mahajan, S.: The Development of a World Class Othello Program, Artificial Intelligence 43 (1990), pp. 21 - 36.
- Describes the Othello program BILL, a "successor" of IAGO. (I consider BILL as a direct descendant of IAGO since its evaluation function is based on very much the same concepts. The only significant addition was the use of Bayesian learning for the feature combination.)
A general reference on strategy games is
Marsland, T.A.: Computer Chess Methods, in Shapiro (Ed.), Encyclopedia of Artificial Intelligence, Wiley & Sons, 1987, pp. 159 - 171
Some basic references on Alpha-beta pruning are
Knuth, D.E.; Moore, R. W: An Analysis of Alpha-Beta Pruning, Artificial Intelligence 6(4), North Holland Publishing Company, 1975, pp. 293 - 326
Reinefeld, A.; Schaeffer, J.; Marsland, T.A.: Information Acquisition in Minimal Window Search, Proceedings of the 9th International Joint Conference on Artificial Intelligence, Vol. 2, Los Angeles, CA, USA, 1985, pp. 1040 - 1043
Marsland, T.A.: Relative Efficiency of Alpha-Beta Implementations, Proceedings of the 8th International Joint Conference on Artificial Intelligence, Vol. 2, Karlsruhe, Germany, 1983, pp. 763 - 766
A recent, new approach is described in
Plaat, A.; Schaeffer, J.; Pijls, W.; de Bruin, A.: A New Paradigm for Minimax Search, Technical Report TR 94-18, Dept. of Computing Science, University of Alberta, Edmonton, Alberta, Canada, Dec. 1994
(Basically, they found out that doing repeated minimum-window searches instead of an alpha-beta search may yield slightly better search times in some cases. Their MTD algorithm comprises a whole class of other, well-known algorithms: SSS*, DUAL*, C*, ...)
More references on Othello have been collected by Robert Gatliff.
Copyright © 2000 by
Thomas Wolf. All rights
reserved.
TW, Aug 02, 2000 |