Training artificial neural networks to play
connecting gamesMartin JohneNeural Networks Term Project5.10.2006
1. Problem statement
The
goal of this project was to train artificial neural network to play a
connecting game on a board of the size 4x4x4. How connecting games are
played
is explained here. It
turned out that is very complicated to learn such a large board to a
neural network, but training smaller boards turned out relatively easy.
The
game should be learned without presenting to the network how to play
the game. Only the information if a game was finished and who was the
winner
was given. For this reason the learning method temporal differences was used.
2. Reinforcement Learning
A
human can learn in two ways. One method is to learn from existing
knowledge, for example from a teacher, also called supervised learning.
Another
possiblility is to lean from trial and error. If you don't know how a
certain food tastes, you can try it, and if it tastes disgusting, you will not
try it again. Nobody told you to leave your hand from that food, but you won't touch it again.
The
neural network doesn't know how to play a connecting game and nobody will
teach it, thus it must learn it by reinforcement learning.
3. How to decide a move
It is necessary that the computer can decide which move leads to the best probability to win a game.
For
this reason we can assign a probability value to each afterstate which
gives the chance to win the game from the actual game situation.
An
afterstate is the board representation at a given time during the game.
In figure 1 you can see one afterstate. If the winning player making
the blue
marks would not have made the winning move already, this
board representation would be another afterstate. Note that the board
representation in
figure 1 could be reached in different ways
during the game. The first player for example could have started with
anyone of his marks in figure 1, leading
to the same final
board representation, where the blue player won. The
afterstate denotes only the final state and not the way the state was
reached.
We can assign a state-value function V(s), which returns the
probability to win a game from a given afterstate s on. Supposing the
computer player has
the
next move, the program can now test the probabilities for every
possible move, and take the move with the highest probability to win.
4. Temporal Differences
A
good computer player needs a good state-value function. One way to
learn this function is the temporal differences algorithm.
This is the update formula:
V(s) = V(s) + alpha[V(s') - V(s)].
s' = current afterstate
s = previous afterstate
alpha = step size parameter
Assumed
all afterstates are initialised with a value of 0.5 and a player won a
game. We assign a state-value of 1.0 to a aftersate s' where a player
won.
Then the afterstate s where the winning player made the move
before the winning move will be changed as follow: 0.75 = 0.5 +
0.5[1 - 0.5]. Alpha was
0.5 in this case. This formula works only from step to step in a game.
In a connecting game the game length is known. There are exact as
many possible
moves as the number of the fields of the board. Here
is it possible to record a game and change the state-values backward
after the game is finished using the
same formula. On a small board
size the state-values for every afterstate can be stored in a table but
for larger boards this isn't possible.
5. Using Neural Networks
Instead
of using a table to store all state-values we can use a neural network.
A neural network is able to generalize if it doesn't know an exact
answer.
In this project a normal feed forward neural network is used
which is teached with the backpropagation algorithm. The network has
three layers, an input
layer with a size depending on the board
size, a hidden layer with a free choosable number of neurons and an
output layer consisting only of one neuron representing
the state-value. The input layer has as many neurons as the board fields. For
every field the input to the corresponding neuron is 0.0 if the field
is empty, -1.0 if
player one has a marker on this field and 1.0 if
player two has a marker on this field. The activation function
for the hidden neurons and the output neuron is the sigmoid
function.
The input layer and hidden layer are getting an extra bias neuron with
a constant output of -1.0. Figure 2 shows the neural network.
 |
figure 2: neural network |
One neural network for each player was used. The networks played against each other.
Following training algorithm was used:
1. Play one game, the looser will be trained in step 2.
If nobody won both players will be trained in step 2.
2. Looser plays against the winner and makes sometime random moves.
The looser is trained with temporal differences after the game is finished.
The winner stays untouched.
3. Repeat step 2 for a certain number of games.
4. Continue with step 1 until both players learned the game.
6. Conclusion
It
was possible to train neural networks to play connecting
games perfectly on small game boards. With increasing board size
it gets difficult to train a perfect playing
network. Nevertheless
was it possible to get a good playing network for a larger board size.
The following table shows the results.
board dimenstions x/y/z/fields to win | Neurons input/hidden | Training cycles | Training Time | Playing Skill | Possible Afterstates |
---|
3/3/1/3 | 9/15 | 11.600 | 11 s | perfect | 5.889 |
4/4/1/4 | 16/32 | 293.700 | 21 m 19 s | perfect | 10.110.712 |
5/1/5/4 | 25/40 | 4.000.000 | 6 h 27 m 26 s | good | unknown |
One
can see that it is necessary to increase the number of training cycles
enormously for larger board sizes. This can be explained with the huge
number of
afterstates for larger boards. It may be possible to train
a network to play the last board in the table perfectly by changing the
training parameters, but it needs
much time to find the best network settings. Nevertheless is it impressive, what a small neural network can learn.
7. Classes
An overview over the classes con be found in the doxygen documentation.
8. References
Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction,
MIT Press, Cambridge, 1998, online: http://www.cs.ualberta.ca/~sutton/book/the-book.html
Sebastian Siegel, Training an artificial neural network to play tic-tac-toe, Term Project,
2001, online: http://homepages.cae.wisc.edu/~ece539/project/f01/siegel.pdf
Leslie Pack Kaelbing, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey,
1996, online: http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/rl-survey.html
Prof. H. Iwe, Vorlesung Neuronale Netze, Hochschule für Technik und Wirtschaft Dresen, 2002