Studying Rémi Coulom's paper
Posted: Tue Sep 11, 2018 1:59 am
I'm studying Rémi Coulom's paper https://hal.inria.fr/inria-00116992/document
It may already have been discussed here or in stack overflow / reddit ... but there's a bunch of smart people in here, maybe even Rémi himself and although I'm interested in the AI (or should I say machine learning) aspect of it, I will naturally approach it from a go angle.
No questions for now.
Things I have already figured out for myself, I think:
- minimax algorithm: a positive score means player A wins, a negative score means player B wins. When it's A's move, the program chooses the node with the maximum value, when it's B's move, the minimum.
- alpha-beta pruning: alpha is the minimum (worst) score A is assured of, beta is the maximum (worst) score B is assured of. When alpha becomes greater than beta, the node is discarded for further research because such a node cannot be reached by best play for both
- a "backup operator" is nothing else than the mechanism of computing the value (score) of a node from the values of its children nodes, starting from the leaves of the tree and going "back up". The name had me confused because I thought it was something that computed a "backup value" for the real value.
- "averaging" is a generalized term for evaluation. In traditional programs, an evaluation function computes a value for the leaf by using heuristics such as piece loss or central position in chess. I assume that with the introduction of randomization, which is core to Monte-Carlo, such evaluation has in general a degree of averaging over the random selection underneath the leaf
- the "count" of a node is the number of times the algorithm passed through the node
- an "internal node" is one for which the count is higher than a threshold value (Coulom sets this value = the number of points on the goban, which he claims to be empirically validated as a good compromise)
- an "external node" is one with a lower count than the threshold
The core idea is to no longer separate "backing up" and "averaging", and instead of completely cutting of less promising nodes, search them considerably less than promising nodes. This introduces "the probability of being the best move" as the score.
(to be continued)
It may already have been discussed here or in stack overflow / reddit ... but there's a bunch of smart people in here, maybe even Rémi himself and although I'm interested in the AI (or should I say machine learning) aspect of it, I will naturally approach it from a go angle.
No questions for now.
Things I have already figured out for myself, I think:
- minimax algorithm: a positive score means player A wins, a negative score means player B wins. When it's A's move, the program chooses the node with the maximum value, when it's B's move, the minimum.
- alpha-beta pruning: alpha is the minimum (worst) score A is assured of, beta is the maximum (worst) score B is assured of. When alpha becomes greater than beta, the node is discarded for further research because such a node cannot be reached by best play for both
- a "backup operator" is nothing else than the mechanism of computing the value (score) of a node from the values of its children nodes, starting from the leaves of the tree and going "back up". The name had me confused because I thought it was something that computed a "backup value" for the real value.
- "averaging" is a generalized term for evaluation. In traditional programs, an evaluation function computes a value for the leaf by using heuristics such as piece loss or central position in chess. I assume that with the introduction of randomization, which is core to Monte-Carlo, such evaluation has in general a degree of averaging over the random selection underneath the leaf
- the "count" of a node is the number of times the algorithm passed through the node
- an "internal node" is one for which the count is higher than a threshold value (Coulom sets this value = the number of points on the goban, which he claims to be empirically validated as a good compromise)
- an "external node" is one with a lower count than the threshold
The core idea is to no longer separate "backing up" and "averaging", and instead of completely cutting of less promising nodes, search them considerably less than promising nodes. This introduces "the probability of being the best move" as the score.
(to be continued)