Ludii Forum
alpha-beta - Printable Version

+- Ludii Forum (https://ludii.games/forums)
+-- Forum: General (https://ludii.games/forums/forumdisplay.php?fid=1)
+--- Forum: Discussion (https://ludii.games/forums/forumdisplay.php?fid=4)
+--- Thread: alpha-beta (/showthread.php?tid=12)



alpha-beta - unhandyandy - 12-01-2019

How does the A-B search engine evaluate positions at maximal depth?  Oddly, the A-B engine seems to the one that works best with Brandub.  But it might be improved with a customn evaluator, which would be easier to implement than  the selectAction method required for a custom engine.


RE: alpha-beta - DennisSoemers - 12-01-2019

In the case of Chess, we have added values for the different piece types to the metadata, and this is used for a simple material score in the heuristic evaluation function. So, for now, this is a very simple little bit of domain knowledge that we have specifically added to Chess.

For all other games, currently, we automatically construct a heuristic evaluation function with a material component and a mobility component.
- The material component gets a weight of 1.0, and simply sums up the pieces owned by a player to determine the score for that player (ignoring differences in piece types).
- The mobility component gets a weight of 0.01 (so it typically only ends up functioning as a tie-breaker in games where material is also relevant). Its score is the number of legal moves for the current player to move in a state, and 0 for every other player.

Because I'm personally not looking forward to manually adding domain knowledge in the form of heuristic evaluation functions to all the.... 160-180ish games we currently have, we're instead looking to automatically learn them automatically from self-play. Basic, learned heuristics for all games will hopefully be included in the first full v1.0.0 Ludii release in January.


RE: alpha-beta - unhandyandy - 12-01-2019

Quote:Basic, learned heuristics for all games will hopefully be included in the first full v1.0.0 Ludii release in January.
That's great!  Will you be using a NN approach, or some other ML strategy?

Might I suggest making it possible to write custom evaluators as a simpler alternative to writing custom selectAction functions.


RE: alpha-beta - DennisSoemers - 12-01-2019

For now, we've been focusing on simpler linear models. Some of the papers in the list of our project's outputs (http://www.ludeme.eu/outputs/) describe some of the stuff we've been doing with self-play training. Those papers so far have all been about learning policies (to be incorporated in MCTS) agents though, not so much about learning state-value functions (for MCTS) or state evaluation functions for alpha-beta.

NNs are a little bit problematic still for a few reasons. The primary one is that we're not working with Google-levels of hardware, and we're trying to learn to play hundreds of games rather than ~3 games. For our project, we also don't really need superhuman AI though, we're aiming for "average-game-enthusiast" levels of skill mostly.

Quote:Might I suggest making it possible to write custom evaluators as a simpler alternative to writing custom selectAction functions.

Good idea, thanks, we'll keep it in mind!


RE: alpha-beta - unhandyandy - 12-01-2019

I'm not an expert in ML.  Does "policy" refer to preferences among move choices, and "state evaluation" to evaluating a position?

I know that all the action is currently in MCTS, but I'm struck by how much better AB plays brandubh.  My impression is that MCTS works well at bootstrapping from zero knowledge of a game, but the ludeme project seems to be starting elsewhere. 

Has any research been done on ML of positional evaluation in the context of AB search?  For example, if a human could write an evaluator that then has its parameters tweaked by ML over the course of many plays, that might be the best of both worlds.


RE: alpha-beta - DennisSoemers - 12-01-2019

(12-01-2019, 07:04 PM)unhandyandy Wrote: Does "policy" refer to preferences among move choices, and "state evaluation" to evaluating a position?

Informally, yes. A policy would give you probabilities with which to select moves in a position by just looking at that position. A state evaluation function tells you how "good" a state is by looking at that state -- if you want to use that to inform move selection, you have to first simulate the application of all possible moves, such that you get all possible next states and can evaluate all of them to figure out retroactively which move was the "best".

MCTS is generally expected to outperform alpha-beta in general game playing settings where you have little domain knowledge, yes. In a game like Brandub, the default evaluation function that I described above appears to be relatively decent though -- it doesn't distinguish between piece types for material, but there aren't too many pieces types to distinguish between in this game anyway! We can probably get MCTS to outperform it by enhancing MCTS with some learned features though. And similarly also improve alpha-beta by learning an improved evaluation function for alpha-beta. Which one will come out on top then... no idea Smile

Yeah, sure, people have used ML for positional evaluation functions. Often for single games at a time though, in General Game Playing MCTS has been much more dominant. One reason for that probably is that in older GGP systems, it required a significant amount of effort just to discover things that could act as features (like "material") from the logic-based game description languages. In Ludii, these sorts of higher-level concepts are already explicitly available right away, which makes it a bit easier to try using them as features.


RE: alpha-beta - unhandyandy - 12-01-2019

"Which one will come out on top then... no idea"

Well, AB seems to have a significant head start.

" In Ludii, these sorts of higher-level concepts are already explicitly available right away, which makes it a bit easier to try using them as features. "

Which makes me think AB could maintain its lead.