09-06-2022, 09:52 AM
1. That will work fine, no need for the (useFor) thing in metadata if you're playing on the default options for your game.
2. I'd recommend using `--no-value-learning`, i.e. not training a heuristic-based value function. Training that is only actually useful if you're going to use it afterwards, which our standard Biased MCTS doesn't. We do have some variations of MCTSes that can use value functions, and maybe it could work well for your game, but it can vary greatly from game to game. In some games, it is easy to express a useful simple heuristic and this can help the MCTS. In other games, it is not easy to express a useful simple heuristic, or it adds too much computational overhead, and it does not help the MCTS.
3. If you can, it would be best to use both the selection and the playout features. Actually, the features are the same, but the weights can be different. The `(featureSet ...)` metadata item does allow you to specify both, if you want. If you don't want to / can't, stick to just the selection weights. Selection weights will work well in both the Selection and Playout phases of MCTS. Playout weights can be slightly better in the Playout phase, but have a greater risk of being bad in the Selection phase. The distinction between the two is really quite subtle and small though, so it's really not too bad to just stick to only Selection. The distinction between these two is also fairly new (didn't get around to writing about the difference in any publications yet), all the publications you've seen use only the Selection weights.
4. Biased MCTS uses features to guide both the Selection and the Playout phases of MCTS. Biased MCTS (Uniform Playouts) plays uniformly random playouts, only using features in the Selection phase of MCTS. Which one is best can again vary very much from game to game. Typically it depends on how "fast" or "slow" your game is (in terms of how quickly Ludii can run it, which is not necessarily the same as the expected number of moves per game, though there often is some correlation). If you have a game that Ludii can run very quickly, using features adds relatively much computational overhead, and just uniform playouts may be better. If the game is already very slow to run to begin with, the computational overhead of features is (relatively speaking) low, and then they're more likely to help in playouts.
5. There is probably no point in using TSPG for you. Based on the results we got in the CoG 2019 paper, I never directly use those weights in a Biased MCTS (because in playouts it doesn't help, and in selection it really hurts). I have been using them for other research goals though (very fast standalone playing based purely on features without any search at all, or identifying features that are interesting to explain to humans). I do recommend using WED in general, since it generally helps (maybe just a little bit).
6. No MC-GRAVE cannot serve directly as the expert. If you have a game where MC-GRAVE is remarkably strong, it might help to try using `--tournament-mode` though. But test it with a short training first so you don't waste too much time, I don't think I've used it in a long time myself so it might also just crash. If it still works, it enables a tournament mode similar to the one used by Polygames, where it keeps a larger population of many agents and draws from them in self-play. In that mode, I also add a plain UCT and a plain MC-GRAVE to this population, and if the MC-GRAVE is indeed very good it should get picked relatively often to generate experience.
7. Raising `--thinking-time` is a relatively good bet for increasing performance yes, at the cost of increased training time. Raising `--num-games`... *probably* also improves performance, but there technically also is a risk of decreasing performance. This is because I just keep adding new features throughout the entire process, so this will result in strictly more features. More features can be better, but there is a risk of it also being worse (due to increased computational overhead).
In general, I would recommend a command line like this (if I have no knowledge of which game I'm working with at all and just care about what is most likely to produce something decent):
Things I added:
* --game-length-cap 1000: Slightly lowering the default maximum number of moves in Ludii before a game is declared a draw
* --iteration-limit 12000: In most games the MCTS won't be able to hit this many iterations before it's thinking time per move (default 1 second) elapses anyway, so then this doesn't matter. But in extremely simple games, this lets us stop the MCTS after 12000 iterations. This can be a speedup, but also stops the MCTS expert distributions from becoming excessively deterministic.
* --wis: Use weighted importance sampling instead of ordinary importance sampling for WED/PER (see CoG 2020 paper)
* --playout-features-epsilon 0.5: This actually interpolates between Biased MCTS and Biased MCTS (Uniform Playouts). Uses features with probability 0.5 for every action in a playout.
* --checkpoint-freq 5: Don't need to fill up my directory with too many files
* --num-agent-threads 3: Use 3 threads for the MCTS agents (running iterations in parallel) during self-play. Can of course adjust based on your hardware
* --num-feature-discovery-threads 2: Use 2 threads for computing which new feature to add after every game of self-play. Can again adjust based on hardware, but it's pointless to raise this beyond the number of players in a game (so keep it at 1 or 2 for a 2-player game).
* --special-moves-expander-split: New thing, didn't get around to publishing anywhere yet. On average I've found this to be slightly helpful, but not for all games. If you have a game for which all win conditions are extremely difficult to express in small local patterns (like Hex where you have a very "global" win condition that spans the entire board), should probably leave this off.
* --handle-aliasing: Generally (slightly) helpful. Not really described in detail in any publications yet.
* --is-episode-durations: This is WED (but important to also add --wis as mentioned above, because otherwise it's using ordinary importance sampling instead of weighted).
* --prioritized-experience-replay: From same paper as WED.
2. I'd recommend using `--no-value-learning`, i.e. not training a heuristic-based value function. Training that is only actually useful if you're going to use it afterwards, which our standard Biased MCTS doesn't. We do have some variations of MCTSes that can use value functions, and maybe it could work well for your game, but it can vary greatly from game to game. In some games, it is easy to express a useful simple heuristic and this can help the MCTS. In other games, it is not easy to express a useful simple heuristic, or it adds too much computational overhead, and it does not help the MCTS.
3. If you can, it would be best to use both the selection and the playout features. Actually, the features are the same, but the weights can be different. The `(featureSet ...)` metadata item does allow you to specify both, if you want. If you don't want to / can't, stick to just the selection weights. Selection weights will work well in both the Selection and Playout phases of MCTS. Playout weights can be slightly better in the Playout phase, but have a greater risk of being bad in the Selection phase. The distinction between the two is really quite subtle and small though, so it's really not too bad to just stick to only Selection. The distinction between these two is also fairly new (didn't get around to writing about the difference in any publications yet), all the publications you've seen use only the Selection weights.
4. Biased MCTS uses features to guide both the Selection and the Playout phases of MCTS. Biased MCTS (Uniform Playouts) plays uniformly random playouts, only using features in the Selection phase of MCTS. Which one is best can again vary very much from game to game. Typically it depends on how "fast" or "slow" your game is (in terms of how quickly Ludii can run it, which is not necessarily the same as the expected number of moves per game, though there often is some correlation). If you have a game that Ludii can run very quickly, using features adds relatively much computational overhead, and just uniform playouts may be better. If the game is already very slow to run to begin with, the computational overhead of features is (relatively speaking) low, and then they're more likely to help in playouts.
5. There is probably no point in using TSPG for you. Based on the results we got in the CoG 2019 paper, I never directly use those weights in a Biased MCTS (because in playouts it doesn't help, and in selection it really hurts). I have been using them for other research goals though (very fast standalone playing based purely on features without any search at all, or identifying features that are interesting to explain to humans). I do recommend using WED in general, since it generally helps (maybe just a little bit).
6. No MC-GRAVE cannot serve directly as the expert. If you have a game where MC-GRAVE is remarkably strong, it might help to try using `--tournament-mode` though. But test it with a short training first so you don't waste too much time, I don't think I've used it in a long time myself so it might also just crash. If it still works, it enables a tournament mode similar to the one used by Polygames, where it keeps a larger population of many agents and draws from them in self-play. In that mode, I also add a plain UCT and a plain MC-GRAVE to this population, and if the MC-GRAVE is indeed very good it should get picked relatively often to generate experience.
7. Raising `--thinking-time` is a relatively good bet for increasing performance yes, at the cost of increased training time. Raising `--num-games`... *probably* also improves performance, but there technically also is a risk of decreasing performance. This is because I just keep adding new features throughout the entire process, so this will result in strictly more features. More features can be better, but there is a risk of it also being worse (due to increased computational overhead).
In general, I would recommend a command line like this (if I have no knowledge of which game I'm working with at all and just care about what is most likely to produce something decent):
Code:
java -jar Ludii-1.3.6.jar --expert-iteration --game "Game.lud" -n 300 --out-dir "ExIt" --no-value-learning --game-length-cap 1000 --thinking-time 1 --iteration-limit 12000 --wis --playout-features-epsilon 0.5 --checkpoint-freq 5 --num-agent-threads 3 --num-feature-discovery-threads 2 --special-moves-expander-split --handle-aliasing --is-episode-durations --prioritized-experience-replay
Things I added:
* --game-length-cap 1000: Slightly lowering the default maximum number of moves in Ludii before a game is declared a draw
* --iteration-limit 12000: In most games the MCTS won't be able to hit this many iterations before it's thinking time per move (default 1 second) elapses anyway, so then this doesn't matter. But in extremely simple games, this lets us stop the MCTS after 12000 iterations. This can be a speedup, but also stops the MCTS expert distributions from becoming excessively deterministic.
* --wis: Use weighted importance sampling instead of ordinary importance sampling for WED/PER (see CoG 2020 paper)
* --playout-features-epsilon 0.5: This actually interpolates between Biased MCTS and Biased MCTS (Uniform Playouts). Uses features with probability 0.5 for every action in a playout.
* --checkpoint-freq 5: Don't need to fill up my directory with too many files
* --num-agent-threads 3: Use 3 threads for the MCTS agents (running iterations in parallel) during self-play. Can of course adjust based on your hardware
* --num-feature-discovery-threads 2: Use 2 threads for computing which new feature to add after every game of self-play. Can again adjust based on hardware, but it's pointless to raise this beyond the number of players in a game (so keep it at 1 or 2 for a 2-player game).
* --special-moves-expander-split: New thing, didn't get around to publishing anywhere yet. On average I've found this to be slightly helpful, but not for all games. If you have a game for which all win conditions are extremely difficult to express in small local patterns (like Hex where you have a very "global" win condition that spans the entire board), should probably leave this off.
* --handle-aliasing: Generally (slightly) helpful. Not really described in detail in any publications yet.
* --is-episode-durations: This is WED (but important to also add --wis as mentioned above, because otherwise it's using ordinary importance sampling instead of weighted).
* --prioritized-experience-replay: From same paper as WED.