Hi,
I have some questions about the expert-iteration CLI if you don't mind me asking.
I'm currently working with a game (deterministic, two player, perfect information as you might expect) where I've found MC-GRAVE to be the best currently available AI algorithm (I'd be willing to send the .lud via PM or E-Mail, if you're interested. I'm not confident enough about my design abilities to post it publicly). MC-GRAVE is doing great already, however I wanted to see if I could use the polygames bridge or Biased MCTS to get even better results. I've fully read through (not necessarily understood) your 2019 paper and skimmed through your 2020 paper on this.
After playing around with the CLI, I've not been able to reach the results I was hoping for (maybe 25-30% win rate against MC-GRAVE?), so I have a few questions:
1. If I generate training data, and play with a default game option, for example:
Should this work or do I still need the "(useFor)" ludeme?
2. Should I be using a heuristic (value learning?) before I start training? In my particular case there are two ways the game can end (a score threshold and a second way).
3. Should I be using both "selectionFeatures" and "playoutFeatures"? If not, which one should I use over the other?
4. Should I be using "Biased MCTS" or "Biased MCTS (Uniform Playouts)" to later play with my data? I've read their explanation in the User Guide and some of the code comments, but am not sure I understand. Am I correct in thinking that Uniform Playouts doesn't use "playoutFeatures"? If I just go by which is used more in def_ai I would guess Uniform Playouts works better generally?
5. Should I be using TSPG (--train-tspg?) or WED (--is-episode-durations?) as mentioned in the papers? For TSPG, what do the values after "Effective Params:" in the final weights file mean?
6. Can MC-GRAVE be used for "--expert-ai"? This line seems to imply, that this is not the case.
7. Finally, if none of these yields new insight, would just cranking up "--num-games" and "--thinking-time" work?
Here are the three ways of training I've tried so far (with some combinations of selectionFeatures and playoutFeatures in the final .lud):
I have some questions about the expert-iteration CLI if you don't mind me asking.
I'm currently working with a game (deterministic, two player, perfect information as you might expect) where I've found MC-GRAVE to be the best currently available AI algorithm (I'd be willing to send the .lud via PM or E-Mail, if you're interested. I'm not confident enough about my design abilities to post it publicly). MC-GRAVE is doing great already, however I wanted to see if I could use the polygames bridge or Biased MCTS to get even better results. I've fully read through (not necessarily understood) your 2019 paper and skimmed through your 2020 paper on this.
After playing around with the CLI, I've not been able to reach the results I was hoping for (maybe 25-30% win rate against MC-GRAVE?), so I have a few questions:
1. If I generate training data, and play with a default game option, for example:
Code:
(item "5" <5> "Board size 5")*
2. Should I be using a heuristic (value learning?) before I start training? In my particular case there are two ways the game can end (a score threshold and a second way).
3. Should I be using both "selectionFeatures" and "playoutFeatures"? If not, which one should I use over the other?
4. Should I be using "Biased MCTS" or "Biased MCTS (Uniform Playouts)" to later play with my data? I've read their explanation in the User Guide and some of the code comments, but am not sure I understand. Am I correct in thinking that Uniform Playouts doesn't use "playoutFeatures"? If I just go by which is used more in def_ai I would guess Uniform Playouts works better generally?
5. Should I be using TSPG (--train-tspg?) or WED (--is-episode-durations?) as mentioned in the papers? For TSPG, what do the values after "Effective Params:" in the final weights file mean?
6. Can MC-GRAVE be used for "--expert-ai"? This line seems to imply, that this is not the case.
7. Finally, if none of these yields new insight, would just cranking up "--num-games" and "--thinking-time" work?
Here are the three ways of training I've tried so far (with some combinations of selectionFeatures and playoutFeatures in the final .lud):
Code:
java -jar Ludii-1.3.6.jar --expert-iteration --game "Game.lud" -n 300 --out-dir "ExIt" (with (heuristics {(score)}) )
java -jar Ludii-1.3.6.jar --expert-iteration --game "Game.lud" -n 300 --out-dir "ExIt" --no-value-learning
java -jar Ludii-1.3.6.jar --expert-iteration --game "Game.lud" -n 300 --out-dir "ExIt" --no-value-learning --train-tspg