Find the best move (AI will be no help)

Bill Spight · Post by **Bill Spight** » Fri Nov 01, 2019 9:11 pm

Ah, thanks, lightvector.

As for terminology, I was a Bayesian before the Bayesian revival, and I was not involved in it. I am also used to talking to frequentists. And I am not a subjectivist. Hence my saying that probability is conditioned on our knowledge. I am closest to Jaynes and Pearl, I guess.

In this discussion, with a number of people, it seemed to me that people had trouble with the idea of estimating probability. So I had no idea that anyone was taking a Bayesian perspective.

TelegraphGo · Post by **TelegraphGo** » Sat Nov 02, 2019 2:16 am

Frankly, I'm entirely unconvinced that the AI "winrates" and probability have anything to do with one other whatsoever. We have a lot of fuzzy talk about it in this thread, and since I'm confused I'd like to descend to the nitty-gritty. If it's not too dry, I'd like to try to define rigorously and without probability, what Bill is talking about, so that he or anyone else can show me where precisely we differ in opinion (if we do). Feel free to skip this post if you hate math

I think it's fair to model every AI as an effectively reproducible function f: X -> Y, where X is the set of all legal board states [note that this is finite], and Y is vaguely similar to [0,1] (let's call 0 opponent win and 1 AI win for discussion). If f(x) > 0.5, then the AI believes that with perfect play it wins, if f(x) < 0.5 then the AI believes its opponent wins. Beyond that is some sort of metric of the AI's confidence, which seems to generally give more extreme values when we humans have confidence as well. We'd like to be able to take two different confidence evaluations and call the magnitude of the difference significant, so that choosing to play one variation is certainly a mistake over choosing the other.

Define g(x): X -> X, g(x) = x U k, where k is the move that has the best f(x U k) for any legal move k. This function just plays the game that the AI says is best.

Then define h(x): X -> X as perfect play - I'm going to assume perfect understanding of the AI, which could be possible if you had instant access to f(x) for all x. h(x) = x U j, where j is a legal move [not necessarily noticed by the AI] such that the worst boardstate x' that AI would play towards with its own assessment of worst f(x') is reachable via playing the move j. Since I trust the AI to always count points correctly after endgame, and any working trap must be sprung eventually, there is no other mysterious better sequence to play to further beat the AI. Therefore, for any position x, g(h(g(h(g(h.....(x))) would produce the game that leads the AI to, at some point, encounter the position it views most dimly, but would play into repeatedly from the position x. Obviously, if the AI is losing, then this state will be 0, but if the AI is winning, it may only ever drop to 0.6, 0.8, etc. before eventually climbing up to 1.

Consider the function w(x): X -> X to output the exact boardstate where f(w(x)) is the minimal assessment reachable by the compound g(h(g(h(...(x))). If we're only interested in the next "stage" of the game, then we could limit ourselves to some finite number of g(h(... and redefine. We could even limit ourselves to exactly the depth of the AI's search tree. I'm not sure exactly what Bill wants here.

Our goal is to find some number M>0 that we'll call a "margin of error" independent of boardstate, such that for any two positions x, y in X, f(x)>0.5, f(y)>0.5, [f(x)-f(y) > M] => [f(w(x)) >/= f(w(y))].

The intent of this statement is that the margin of error will tell apart the 'comfortability' of the winning moves. The restriction of viewing 'winning' positions is, I think, sensible, as knowing how to be less likely to lose is opponent-based, as opposed to knowing how to be more likely to win. A difference above the margin of error will tell us that against a 'perfect' opponent, the AI can't handle either position, the better position is either winning and the other position is not, or that the position with the higher value can go less wrong when well handled.

It's likely that this value M isn't constant, and actually depends on the value of f(x) and f(y). Bill appears to suggest that in the range [0.5, 0.75] a reasonable value for M should be around 0.04 for at least one particular AI that he studied in depth.

I'm not quite so sure about that. The very position this thread started with showed a position x for which f(x) and f(h(x)) differed by more than 0.04 already. It seems likely that a game of the AI against our elusive perfect player h(x) would result in many similar traps being laid for the AI, most outside of its own evaluation. Perhaps what we'd really like to do is consider a strategic game, where the AI isn't ever surprised by h(x), but perhaps a little outplayed.

To do this, we can consider an augmented f'(x) such that for any position where the AI misses a tesuji that h(x) finds, [that is: f(w(g(x)) < f(w(h(x)))] the AI is forced to analyze the tesuji until it reevaluates (so that f(h(x)) > f(g(x)), redefining g(x) for that x). Then, the normal search process is extrapolated for all possible preceding boardstates, so that the AI no longer misses any tesuji, but retains its signature fluid evaluation.

If we similarly define a margin of error off of this augmented AI, M'>0 s.t. for all x,y in X, f'(x)>0.5, f'(y)>0.5, [f'(x)-f'(y)>M'] => [f'(w(x)) >/= f'(w(y'))], then M' = 0.04 seems pretty reasonable to me, actually. The AI is bound to make some strategic errors, but not an overwhelming amount. This seems excessively difficult to actually evaluate though - perhaps a method such as my process finding the move the AI missed could assist you for any position x, but that seems really arduous.

Talking about move-by-move 'margin of error' [more precisely, max(f'(g(x)) - f'(x)) over x in X] seems quite difficult to pin down, as it's almost entirely dependent on the AI missing something. Perhaps you can do it for positions in some subset of x, lacking in critical sequences that the AI ignores? I got the impression that this is what you're interested in, but it seems very difficult to qualify.

Note that I still haven't talked about probability once, and at this point I don't think I should see a need for it. If you can show that my model is incomplete without probability, then please, show me. But, as far as I understand this talk of Bayesian logic and whatnot is confusing the concepts of AI 'winrate', which is mostly just an arbitrary number who's programmers at Google once gave bounds of 0 and 1, and actual player v. player winrates, which to me are entirely separate.

Bill Spight · Post by **Bill Spight** » Sat Nov 02, 2019 5:05 am

TelegraphGo wrote:Frankly, I'm entirely unconvinced that the AI "winrates" and probability have anything to do with one other whatsoever. We have a lot of fuzzy talk about it in this thread, and since I'm confused I'd like to descend to the nitty-gritty. If it's not too dry, I'd like to try to define rigorously and without probability, what Bill is talking about, so that he or anyone else can show me where precisely we differ in opinion (if we do). Feel free to skip this post if you hate math

Just to let you know, I am not defining winrate as probability. That has already been done by others, and you can find it defined elsewhere in these discussions, which have been going on for a while, or someone in the know can define it here.

OC, it is possible just to regard it as a number between 0 and 1 that indicates a bot's evaluation of a position with a certain player to play, from the perspective of one player, P. If P's winrate is greater than 0.5, then the evaluation favors P, and if it is less than 0.5 it favors P's opponent. There is still the question of how good that evaluation is. In the Elf commentaries the Elf team does not report a choice of play by Elf which has fewer than 1500 playouts. Furthermore, when a human play has fewer than 500 playouts, they have it inherit its winrate from Elf's top reply to it.

Bill Spight · Post by **Bill Spight** » Sat Nov 02, 2019 5:18 am

TelegraphGo wrote:Bill appears to suggest that in the range [0.5, 0.75] a reasonable value for M should be around 0.04 for at least one particular AI that he studied in depth.

I'm not quite so sure about that.

Taking the margin of error for Elf as around 4% is by guess and by golly. It may be educated guesswork, but it is guesswork, faute de mieux. Nobody has done the research on how good winrates are, while commentators utilize them uncritically.

Talking about move-by-move 'margin of error' [more precisely, max(f'(g(x)) - f'(x)) over x in X] seems quite difficult to pin down, as it's almost entirely dependent on the AI missing something. Perhaps you can do it for positions in some subset

The very idea of winrate as probability depends upon AI missing something. OC, we know that it does.

Tryss · Post by **Tryss** » Sat Nov 02, 2019 8:42 am

TelegraphGo wrote:Note that I still haven't talked about probability once, and at this point I don't think I should see a need for it. If you can show that my model is incomplete without probability, then please, show me. But, as far as I understand this talk of Bayesian logic and whatnot is confusing the concepts of AI 'winrate', which is mostly just an arbitrary number who's programmers at Google once gave bounds of 0 and 1, and actual player v. player winrates, which to me are entirely separate.

The key is that your function f is an interpolation of the training data.

TelegraphGo · Post by **TelegraphGo** » Sat Nov 02, 2019 1:43 pm

The key is that your function f is an interpolation of the training data.

True, f is technically nondeterministic, but I think it shouldn't severely weaken the AI to assign an arbitrary deterministic choice to positions where the AI might come up with mildly different results on different evaluations. The AI seems to me to be relatively consistent on repeated analysis (at least compared to the magnitude of the errors it typically makes). If I don't misunderstand, shouldn't this make the f completely well-defined so that this isn't really a problem?

The very idea of winrate as probability depends upon AI missing something. OC, we know that it does.

The AI certainly does miss things, and I view that there's two 'classes' of it missing something - 1. analyzing deeply but slightly inaccurately and 2. failing to analyze deeply. The problem with move-by-move margin of error is that you could take a position where the AI is about to miss something of the second class. When the AI makes a class 2 oops, it could lose almost any amount of %, which is hard to describe. On the other hand, class 1 misses are much more likely to have some useful upper bound. I tried to analyze so that I was only accounting class 1 misses with my definitions, because it's hard to pinpoint the boardstates where the AI is about to make a class 2 miss. Maybe with more thought I could rigorously define this as well, though.

Just to let you know, I am not defining winrate as probability.

I had a feeling you weren't, but I wasn't quite sure exactly what you were going for. I'm mostly just trying to get some kind of agreed groundwork, so we can get to the fun stuff.

Taking the margin of error for Elf as around 4% is by guess and by golly. It may be educated guesswork, but it is guesswork, faute de mieux. Nobody has done the research on how good winrates are, while commentators utilize them uncritically.

I wouldn't expect anything more than guesswork here, since it's probably almost as hard to figure an exact number as it is to completely solve the game of Go itself. But where particularly can we apply the number? Is it one of the M that I suggested, or perhaps a different definition of M? What do you think the most useful kind of M is with regard to effective human-AI analysis?

Bill Spight · Post by **Bill Spight** » Sat Nov 02, 2019 2:53 pm

TelegraphGo wrote:
Bill Spight wrote:Just to let you know, I am not defining winrate as probability.
I had a feeling you weren't but I wasn't quite sure exactly what you were going for. I'm mostly just trying to get some kind of agreed groundwork, so we can get to the fun stuff.

It's not that I don't regard it as a probability, but I rely upon the definition of others.

Taking the margin of error for Elf as around 4% is by guess and by golly. It may be educated guesswork, but it is guesswork, faute de mieux. Nobody has done the research on how good winrates are, while commentators utilize them uncritically.

I wouldn't expect anything more than guesswork here, since it's probably almost as hard to figure an exact number as it is to completely solve the game of Go itself. But where particularly can we apply the number? Is it one of the M that I suggested, or perhaps a different definition of M? What do you think the most useful kind of M is with regard to effective human-AI analysis?[/quote]

My main concern is with comparing candidate plays. Theoretically, I would like to see how the number of playouts affects the margins of error for winrate estimates.

Tryss · Post by **Tryss** » Sat Nov 02, 2019 2:57 pm

TelegraphGo wrote:
The key is that your function f is an interpolation of the training data.
True, f is technically nondeterministic, but I think it shouldn't severely weaken the AI to assign an arbitrary deterministic choice to positions where the AI might come up with mildly different results on different evaluations. The AI seems to me to be relatively consistent on repeated analysis (at least compared to the magnitude of the errors it typically makes). If I don't misunderstand, shouldn't this make the f completely well-defined so that this isn't really a problem?

f:X->[0,1] is totally deterministic. That's the "0 playout" evaluation.

What is non-deterministic is the winrate calculated from more than 1 playout : it's computed from values f(x'), where x' are positions further in the tree, the randomness come from what positions x' you use. But it's still based on the interpolation of your training data, but instead of just the current position, you use it on possible future positions.

Lightvector could tell us what Katago winrate are an interpolation of, and how they are computed (only the node at the end of the best branch? something else ?)

jann · Post by **jann** » Sun Nov 03, 2019 7:11 am

Tryss wrote:f:X->[0,1] is totally deterministic. That's the "0 playout" evaluation.

This depends on the random choice of "which rotation the position was fed with" (which is actually the most significant random factor for search too).

Life In 19x19

Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)