12-01-2019, 08:40 PM
(12-01-2019, 07:04 PM)unhandyandy Wrote: Does "policy" refer to preferences among move choices, and "state evaluation" to evaluating a position?
Informally, yes. A policy would give you probabilities with which to select moves in a position by just looking at that position. A state evaluation function tells you how "good" a state is by looking at that state -- if you want to use that to inform move selection, you have to first simulate the application of all possible moves, such that you get all possible next states and can evaluate all of them to figure out retroactively which move was the "best".
MCTS is generally expected to outperform alpha-beta in general game playing settings where you have little domain knowledge, yes. In a game like Brandub, the default evaluation function that I described above appears to be relatively decent though -- it doesn't distinguish between piece types for material, but there aren't too many pieces types to distinguish between in this game anyway! We can probably get MCTS to outperform it by enhancing MCTS with some learned features though. And similarly also improve alpha-beta by learning an improved evaluation function for alpha-beta. Which one will come out on top then... no idea
Yeah, sure, people have used ML for positional evaluation functions. Often for single games at a time though, in General Game Playing MCTS has been much more dominant. One reason for that probably is that in older GGP systems, it required a significant amount of effort just to discover things that could act as features (like "material") from the logic-based game description languages. In Ludii, these sorts of higher-level concepts are already explicitly available right away, which makes it a bit easier to try using them as features.