Tic-Tac-Toe

30 June, 2020


It is one of the simplest games we've played since childhood. It's got different phases attached to it : 1. Why am I always losing? 2. HAH! I won and 3. Oh, that's another draw. I wanted to keep winning, so I tried to decipher this game. What is the best possible move at every stage?
I'll go over the minimax algorithm, maximizing player and minimizing player. I have used python and a library called curses for the UI.

Explanation

There are two players involved : computer and human. I've assumed the human to be the minimizing player and hence designed the utility function that way. At every stage, we assume that the other player makes the best possible move. And how we quantify this is using the utility function. Human, the minimizing player will choose the move that has the least utility value and the opposite is true for the computer. Pick the best possible move for both at each stage. Logical right?

So how do we decide the utility function? Let's say the if the human wins by playing a move, we reduce 20 points and if the computer wins we increase 20 points. Game ends at these points. But not all moves are an endpoint right? So, we use the minimax strategy for the next move. We make a move and pick the computer's next best move. Wait if the computer doesn't know the best move? Again the computer will make a move and pick the human's best move. So till the time we either reach a draw, win or a loss we continue the recursion. So we pick the best move of all possible moves for the human, by assuming even the computer does so. Keep in mind the best move for both are quite opposite, one wants to increase the utility function and other reduce it. So we go down this huge recursion tree and at each layer we pick the move with the minimum utility for the human and maximum utility for the computer.

But winning now and winning three steps later shouldn't be the same right? We also have to penalize depth then. So every move extra we take, the utility function increases by 1. Whereas the computer decreases by 1. Keep mind the order of values : 1, 20. You want to design your utility function in such way that it can differntiate between a draw, loss and win. The magnitude of depths (1 in this case) over time should not exceed the win value (20 in this case). If this doesn't happen, it can pick the one with higher depth and not one in which you win.

Algorithm


Voila! There you go, except for the computer part. It is quite similar except for the signs. For the UI, you can go through the read-me. Simple hard-coded stuff.

Observations

1. The game is futile. This means that is both players play the game optimally the game always ends in a draw. You can verify this by alternating between human's and the computer's moves.
2. You can make the first move anywhere . The utility function remains the same wherever you play. Try this out with the one you just wrote. But people still prefer the one in the corners because it allows the opponent to make a mistake and let you win.
3. While designing the algroithm, the computer prefers the move with higher depth. So, if the computer can win now and win after a move, it'll choose the latter. Nice, right? For a long time, I thought is was a bug. You can design it the other way around as well.

Improvements

I tried making one for NxN grid. It turns out the that number of computations go out of hand extremely quickly. Let's take a look. For N=3, the first move has 9 possibilities, the computer then has 8 and so on. So a total of 9! moves which is roughly 360,000. Let's say we N=4, number of moves are now 16! which is roughly 10^13, much much bigger. So, how do we optimise this?

Firstly, we are counting many things multiple times. Let's say the first move is at (0,1) or (1,0), it is the same thing. But we are counting this twice. Maybe we can store the value and check if it's a similar state. How do we quantify if the states are same? Strings maybe. And how many values will we end up storing? Too many probably.

Secondly, pruning algorithms. But how do we know by how much are we reducing? Reducing 10^13 to 0.5 * 10^13, isn't exactly a huge difference. Alpha-beta pruning is one of the possible ones.

If you have any queries or suggestions, feel free to hit me up at : dornumofficial@gmail.com. For the complete code , you can click here