04-29-2022, 11:52 AM
I've looked into the AI issue with the Cordon3.lud version of your game, where Alpha-Beta was basically incorrectly claiming to have proven various wins or losses (and probably playing poorly based on those incorrect proofs).
This seems to be an issue with its Transposition Table. In that version of your game, the last-to position (i.e., the "to" position of the previous move made) is a very important variable of the game state. The entire point of the Select actions used is basically only to store an important value in that variable, and that then plays a major role in what moves are legal afterwards. But, we currently do not include this variable in our Zobrist hashes. This means that the algorithm incorrectly thinks that lots of different states are identical, due to them having the same Zobrist hash keys, and it leads to the incorrect proofs of wins/losses.
The correct solution would be for us to include the last-to (and also last-from) values in our Zobrist hashes. I'm just... not sure yet if we actually want to do that. Clearly it is the correct thing to do in theory, and necessary for some games. However, for the vast majority of games, this would not be necessary. In that vast majority of games, this would in fact cause us to stop recognising a whole bunch of transpositions, we start thinking they are different just because they have different previous moves leading up to them, which in the vast majority of games does not matter. So, the question that arises is... do we want our AlphaBeta to be stronger in most games, but plain incorrect in a few, or do we want it to be correct in theory, but for many games weaker in practice? Or can we somehow automatically detect whether or not these variables are relevant to a particular game's hash codes and automatically adapt whether or not we include them accordingly?
I do not have the answers to those questions yet, we'll have to think about it. So I can't directly promise a fix yet. Thanks for pointing it out anyway!
This seems to be an issue with its Transposition Table. In that version of your game, the last-to position (i.e., the "to" position of the previous move made) is a very important variable of the game state. The entire point of the Select actions used is basically only to store an important value in that variable, and that then plays a major role in what moves are legal afterwards. But, we currently do not include this variable in our Zobrist hashes. This means that the algorithm incorrectly thinks that lots of different states are identical, due to them having the same Zobrist hash keys, and it leads to the incorrect proofs of wins/losses.
The correct solution would be for us to include the last-to (and also last-from) values in our Zobrist hashes. I'm just... not sure yet if we actually want to do that. Clearly it is the correct thing to do in theory, and necessary for some games. However, for the vast majority of games, this would not be necessary. In that vast majority of games, this would in fact cause us to stop recognising a whole bunch of transpositions, we start thinking they are different just because they have different previous moves leading up to them, which in the vast majority of games does not matter. So, the question that arises is... do we want our AlphaBeta to be stronger in most games, but plain incorrect in a few, or do we want it to be correct in theory, but for many games weaker in practice? Or can we somehow automatically detect whether or not these variables are relevant to a particular game's hash codes and automatically adapt whether or not we include them accordingly?
I do not have the answers to those questions yet, we'll have to think about it. So I can't directly promise a fix yet. Thanks for pointing it out anyway!