Training artificial neural networks to play
connecting games






Martin Johne
Neural Networks Term Project
5.10.2006










Contents


1. Problem statement

The goal of this project was to train artificial neural network to play a connecting game on a board of the size 4x4x4. How connecting games are played
is explained here. It turned out that is very complicated to learn such a large board to a neural network, but training smaller boards turned out relatively easy.
The game should be learned without presenting to the network how to play the game. Only the information if a game was finished and who was the winner
was given. For this reason the learning method temporal differences was used.

2. Reinforcement Learning

A human can learn in two ways. One method is to learn from existing knowledge, for example from a teacher, also called supervised learning.
Another possiblility is to lean from trial and error. If you don't know how a certain food tastes, you can try it, and if it tastes disgusting, you will not
try it again. Nobody told you to leave your hand from that food, but you won't touch it again.
The neural network doesn't know how to play a connecting game and nobody will teach it, thus it must learn it by reinforcement learning.

3. How to decide a move

It is necessary that the computer can decide which move leads to the best probability to win a game.


figure 1: tic tac toe
For this reason we can assign a probability value to each afterstate which gives the chance to win the game from the actual game situation.
An afterstate is the board representation at a given time during the game. In figure 1 you can see one afterstate. If the winning player making the blue
marks would not have made the winning move already, this board representation would be another afterstate. Note that the board representation in
figure 1 could be reached in different ways during the game. The first player for example could have started with anyone of his marks in figure 1, leading
to the same final board representation, where the blue player won. The afterstate denotes only the final state and not the way the state was reached.
We can assign a state-value function V(s), which returns the probability to win a game from a given afterstate s on. Supposing the computer player has
the next move, the program can now test the probabilities for every possible move, and take the move with the highest probability to win.

4. Temporal Differences

A good computer player needs a good state-value function. One way to learn this function is the temporal differences algorithm.
This is the update formula:

V(s) = V(s) + alpha[V(s') - V(s)].
s'  = current afterstate
s = previous afterstate
alpha = step size parameter

Assumed all afterstates are initialised with a value of 0.5 and a player won a game. We assign a state-value of 1.0 to a aftersate s' where a player won.
Then the afterstate s where the winning player made the move before the winning move will be changed as follow: 0.75 = 0.5 + 0.5[1 - 0.5]. Alpha was
0.5 in this case. This formula works only from step to step in a game. In a connecting game the game length is known. There are exact as many possible
moves as the number of the fields of the board. Here is it possible to record a game and change the state-values backward after the game is finished using the
same formula. On a small board size the state-values for every afterstate can be stored in a table but for larger boards this isn't possible.

5. Using Neural Networks

Instead of using a table to store all state-values we can use a neural network. A neural network is able to generalize if it doesn't know an exact answer.
In this project a normal feed forward neural network is used which is teached with the backpropagation algorithm. The network has three layers, an input
layer with a size depending on the board size, a hidden layer with a free choosable number of neurons and an output layer consisting only of one neuron representing
the state-value. The input layer has as many neurons as the board fields. For every field the input to the corresponding neuron is 0.0 if the field is empty, -1.0 if
player one has a marker on this field and 1.0 if  player two has a marker on this field. The activation function for the hidden neurons and the output neuron is the sigmoid
function. The input layer and hidden layer are getting an extra bias neuron with a constant output of -1.0. Figure 2 shows the neural network.

figure 2: neural network

One neural network for each player was used. The networks played against each other.
Following training algorithm was used:

1. Play one game, the looser will be trained in step 2.
    If nobody won both players will be trained in step 2.
2. Looser plays against the winner and makes sometime random moves.
    The looser is trained with temporal differences after the game is finished.
    The winner stays untouched.
3. Repeat step 2 for a certain number of games.
4. Continue with step 1 until both players learned the game.

6. Conclusion

It was possible to train neural networks to play connecting games perfectly on small game boards. With increasing board size it gets difficult to train a perfect playing
network. Nevertheless was it possible to get a good playing network for a larger board size. The following table shows the results.

 board dimenstions x/y/z/fields to win Neurons input/hiddenTraining cyclesTraining TimePlaying SkillPossible Afterstates
3/3/1/39/1511.60011 sperfect5.889
4/4/1/416/32293.70021 m 19 sperfect10.110.712
5/1/5/425/404.000.0006 h 27 m 26 sgoodunknown

One can see that it is necessary to increase the number of training cycles enormously for larger board sizes. This can be explained with the huge number of
afterstates for larger boards. It may be possible to train a network to play the last board in the table perfectly by changing the training parameters, but it needs
much time to find the best network settings. Nevertheless is it impressive, what a small neural network can learn.

7. Classes

An overview over the classes con be found in the doxygen documentation.

8. References


Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction,
MIT Press, Cambridge, 1998, online: http://www.cs.ualberta.ca/~sutton/book/the-book.html

Sebastian Siegel, Training an artificial neural network to play tic-tac-toe, Term Project,
2001, online: http://homepages.cae.wisc.edu/~ece539/project/f01/siegel.pdf

Leslie Pack Kaelbing, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey,
1996, online: http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/rl-survey.html

Prof. H. Iwe, Vorlesung Neuronale Netze, Hochschule für Technik und Wirtschaft Dresen, 2002