Find the best move (AI will be no help)

Uberdude · Post by **Uberdude** » Fri Nov 01, 2019 12:35 am

Bill Spight wrote: Normally we expect that between evenly matched opponents the first player's advantage will increase as their level of play increases. The opposite is the case here.

Bill, you've said this before, but I still don't see why you say that. If the 7.5 Komi is slightly too big, as seems to be the case, then as a bot gets stronger I would expect its winrate for White on the empty board to increase as the stronger player is better able to carry that advantage through to the endgame.

Gomoto · Post by **Gomoto** » Fri Nov 01, 2019 1:00 am

When the bot was at a local maximum before, it would also be possible that the stronger bot shows a lower winrate.

The bot realizes that go is more difficult, than it anticipated

Gomoto · Post by **Gomoto** » Fri Nov 01, 2019 1:15 am

The blindspot in the analyzed joseki is about 4 points better than the normal AI move according to KataGo when you continue the variations.
That is quite a gap.

jlt · Post by **jlt** » Fri Nov 01, 2019 1:22 am

dfan wrote:I assume that the Leela Zero network that comes with Lizzie 0.7 is stronger than the Leela Zero network that came with Lizzie 0.5, and stronger networks tend to be more opinionated (the usual gedankenexperiment is to imagine an infinitely strong network, which would give White a winrate of either 0% or 100% at the start of the game).

On the other hand, Alphago says that on an empty board, White's winrate is 52.9%. More likely, Leela Zero is going through a path in which the playing style favors White.

Gomoto · Post by **Gomoto** » Fri Nov 01, 2019 1:26 am

KataGo attaches about +1 point to white with Komi 7.5 and -1 with Komi 6.5

Bill Spight · Post by **Bill Spight** » Fri Nov 01, 2019 2:39 am

Uberdude wrote:
Bill Spight wrote: Normally we expect that between evenly matched opponents the first player's advantage will increase as their level of play increases. The opposite is the case here.
Bill, you've said this before, but I still don't see why you say that. If the 7.5 Komi is slightly too big, as seems to be the case, then as a bot gets stronger I would expect its winrate for White on the empty board to increase as the stronger player is better able to carry that advantage through to the endgame.

Well, this is one reason why knowing the margin of error of winrate estimates is important. Back in the 1970s someone wrote an article in the AGA Journal claiming that the proper statistical komi was 7. (Actually, the author did not qualify as I did. He said that it was "as plain as a pikestaff" that komi was 7.) Ing's statistics indicated to him that for area scoring it was 8. In the early 1980s some people thought that for area scoring it was as high as 9.

As for today's bots, they seem to believe that 7.5 komi is a little too high, giving an advantage to White. However, as far as I can tell, the estimated advantage is within the margin of error. Winrate estimates always assume errors, we just don't know much about those errors.

We know that komi is roughly half of the temperature of the empty board, of how much the initial gote or reverse sente gains. That's why we guess that the first play gains around 14 pts. However, suppose that Black is not good enough to gain that much, but only gains 12 pts. Maybe Black is a kyu player. White is at the same level. Then we can estimate the statistical komi for them as only 6 pts. instead of 7. For random players statistical komi might be 3 pts. If both players are so weak that their statistical komi is 6 pts. and Black gives 7.5 pts. komi, then White will have an advantage. Now suppose that both players improve, so that their statistical komi is 7. Then White's advantage will be less.

We know that as bots get better, their statistical komi will approach correct komi, the board result with perfect play. If correct komi is 7, then as they get better Black will not gain an advantage from that. But if it is 8 or 9 then Black will. On an odd parity board correct area komi is unlikely to be 8, but it is possible. I don't think that it is 9, but quien sabe?

Now, none of today's top bots has given an initial advantage to Black, but 3 or 4% is likely to be within their margin of error. If LZ now estimates White's winrate as 57% with 7½ komi, that estimate is likely to be greater than the margin of error. If so, that's news.

Yakago · Post by **Yakago** » Fri Nov 01, 2019 6:24 am

Would you care to elaborate what you mean by margin of error? (although you may have done so before)

Wouldn't the term 'margin of error' imply that there is a 'true' winrate? With respect to perfect play, that makes no sense.

So then it would have to be the winrate of what? Some match with fixed parameters of a certain bot? The parameters should greatly affect the outcome.

In any case it's not too difficult to test

Bill Spight · Post by **Bill Spight** » Fri Nov 01, 2019 7:35 am

Yakago wrote:Would you care to elaborate what you mean by margin of error? (although you may have done so before)

It is true that I have used margin of error to refer to different things. Here I am taking a player's winrate to be an estimate of the probability of a win by that player, given perfect play from the given position, and given our state of knowledge at this time. That is not exactly how the winrate is defined. It is an estimate of the probability of a win by that player, given self play by the bot in question from the given position, given our current state of knowledge. OC, we do not assume that the bot plays perfectly, but it is taken as playing as close to perfect play as we can come at this time. (For weaker bots I would not use margin of error in this way.)

Wouldn't the term 'margin of error' imply that there is a 'true' winrate? With respect to perfect play, that makes no sense.

Actually, it does in Bayesian terms, since the probability is conditioned on our state of knowledge. You may not want to call that a true winrate.

So then it would have to be the winrate of what? Some match with fixed parameters of a certain bot? The parameters should greatly affect the outcome.

In any case it's not too difficult to test

I agree. It just takes time and careful research. By which time there may be a new bot on the block. Nobody seems interested in doing that research. A lot of people take winrates as gospel. I talk about margin of error to emphasize that they are not.

Uberdude · Post by **Uberdude** » Fri Nov 01, 2019 10:25 am

Bill Spight wrote:
Yakago wrote:Would you care to elaborate what you mean by margin of error? (although you may have done so before)
It is true that I have used margin of error to refer to different things. Here I am taking a player's winrate to be an estimate of the probability of a win by that player, given perfect play from the given position, and given our state of knowledge at this time. That is not exactly how the winrate is defined. It is an estimate of the probability of a win by that player, given self play by the bot in question from the given position, given our current state of knowledge. OC, we do not assume that the bot plays perfectly, but it is taken as playing as close to perfect play as we can come at this time

Isn't that definition of margin of error thus only 100%-x or x% because with perfect play it's either a win or a loss?

When speaking of margin of error I think of these more tractable ideas that one could actually measure:
1) sampling error, as in what is the variance of the measurement I am taking (which is e.g. what does LZ network 234 give as the winrate for this move after 20k playouts). I can repeat that measurement several times, maybe use a different environment like a different graphics card or different number of threads so there's a little variation from the randomness of multi-threading but in my experience this tends to be quite small, not more than a percentage point or two.
2) error in using fewer playouts as an estimation of what the bot would think with more playouts. e.g. how good is giving LZ 20k playouts at estimating what LZ would think after much more analysis. I'm not even sure if AG-0 algorithm asymptotes to perfect play with infinte time like classic MCTS is supposed to, but let's say LZ can play at its best at 1 billion playouts. Then the margin of error is asking, how good is asking LZ the winrate after 20k playouts at estimating what LZ would think after 1 billion? This error can be larger than above, because it can take LZ many many platouts to overcome blindspots, find refutations, read ladders etc and these can make drastic changes in its evaluation of a position.

dfan · Post by **dfan** » Fri Nov 01, 2019 11:18 am

lightvector had a couple of good detailed comments a few months ago (1, 2) about all the difficulties (or at least a lot of them) involved in trying to define and/or calculate "margin of error" when it comes to win rates, which are good background for anyone interested in the topic.

Bki · Post by **Bki** » Fri Nov 01, 2019 11:20 am

When I think about the margin of error, it's obviously compared to what the real probability of the bot winning from this position is (EDIT : through from what Lightvector said apparently I might be wrong in this). Which is something that clearly exists (even if we don't know it exactly). And everything else really has no meaning given that it is the only thing that the bot estimate.

Uberdude wrote:
Bill Spight wrote:
Yakago wrote:Would you care to elaborate what you mean by margin of error? (although you may have done so before)
It is true that I have used margin of error to refer to different things. Here I am taking a player's winrate to be an estimate of the probability of a win by that player, given perfect play from the given position, and given our state of knowledge at this time. That is not exactly how the winrate is defined. It is an estimate of the probability of a win by that player, given self play by the bot in question from the given position, given our current state of knowledge. OC, we do not assume that the bot plays perfectly, but it is taken as playing as close to perfect play as we can come at this time
Isn't that definition of margin of error thus only 100%-x or x% because with perfect play it's either a win or a loss?

When speaking of margin of error I think of these more practical ideas:
1) sampling error, as in what is the variance of the measurement I am taking (which is e.g. what does LZ network 234 give as the winrate for this move after 20k playouts). I can repeat that measurement several times, maybe use a different environment like a different graphics card or different number of threads so there's a little variation from the randomness of multi-threading but in my experience this tends to be quite small, not more than a percentage point or two.
2) error in using fewer playouts as an estimation of what the bot would think with more playouts. e.g. how good is giving LZ 20k playouts at estimating what LZ would think after much more analysis. I'm not even sure if AG-0 algorithm asymptotes to perfect play with infinte time like classic MCTS is supposed to, but let's say LZ can play at its best at 1 billion playouts. Then the margin of error is asking, how good is asking LZ the winrate after 20k playouts at estimating what LZ would think after 1 billion? This error can be larger than above, because it can take LZ many many platouts to overcome blindspots, find refutations, read ladders etc and these can make drastic changes in its evaluation of a position.

On the 1), you can actually get a theoretical upper bound on the variance (0.25). Then it's pretty easy to determine an upper bound of confidence intervals. Of course, you can do even more work to be more precise, but statistical inference isn't my domain, and really, it's already good enough to know, for example, that the radius of a 95% confidence interval (for example) is ~3% with 1000 playouts and ~1% with 10000.

Bill Spight · Post by **Bill Spight** » Fri Nov 01, 2019 11:48 am

Uberdude wrote:
Bill Spight wrote:
Yakago wrote:Would you care to elaborate what you mean by margin of error? (although you may have done so before)
It is true that I have used margin of error to refer to different things. Here I am taking a player's winrate to be an estimate of the probability of a win by that player, given perfect play from the given position, and given our state of knowledge at this time. That is not exactly how the winrate is defined. It is an estimate of the probability of a win by that player, given self play by the bot in question from the given position, given our current state of knowledge. OC, we do not assume that the bot plays perfectly, but it is taken as playing as close to perfect play as we can come at this time
Isn't that definition of margin of error thus only 100%-x or x% because with perfect play it's either a win or a loss?

That's the maximum error, OC. But with research we can come up with better margins of error.

When speaking of margin of error I think of these more tractable ideas that one could actually measure:
1) sampling error, as in what is the variance of the measurement I am taking (which is e.g. what does LZ network 234 give as the winrate for this move after 20k playouts). I can repeat that measurement several times, maybe use a different environment like a different graphics card or different number of threads so there's a little variation from the randomness of multi-threading but in my experience this tends to be quite small, not more than a percentage point or two.
2) error in using fewer playouts as an estimation of what the bot would think with more playouts. e.g. how good is giving LZ 20k playouts at estimating what LZ would think after much more analysis. I'm not even sure if AG-0 algorithm asymptotes to perfect play with infinte time like classic MCTS is supposed to, but let's say LZ can play at its best at 1 billion playouts. Then the margin of error is asking, how good is asking LZ the winrate after 20k playouts at estimating what LZ would think after 1 billion? This error can be larger than above, because it can take LZ many many platouts to overcome blindspots, find refutations, read ladders etc and these can make drastic changes in its evaluation of a position.

As for 2), that's the approach I took with Leela 11 a few years ago in regard to the cheating question. I did not just assume that more playouts gave a correct answer. What I did first was show that the differences between plays and winrates for several positions was probably not random. That was evidence that the winrate differences represented real differences in evaluation, the assumption being that with more playouts the evaluation was better. That then indicated that the winrate errors of the presumed mistakes were at least 3%. As a rule thumb, then, if the winrate estimates of two plays by Leela 11 differ by less than 3% at a certain setting (100k), I do not think that we can assume that the play with the worse winrate is actually a worse play. Humans make use of winrate estimates, for better or worse. It would help if we had a good idea of how good they were.

lightvector · Post by **lightvector** » Fri Nov 01, 2019 4:22 pm

Bill Spight wrote:
Uberdude wrote:
Bill Spight wrote: It is true that I have used margin of error to refer to different things. Here I am taking a player's winrate to be an estimate of the probability of a win by that player, given perfect play from the given position, and given our state of knowledge at this time. That is not exactly how the winrate is defined. It is an estimate of the probability of a win by that player, given self play by the bot in question from the given position, given our current state of knowledge. OC, we do not assume that the bot plays perfectly, but it is taken as playing as close to perfect play as we can come at this time
Isn't that definition of margin of error thus only 100%-x or x% because with perfect play it's either a win or a loss?
That's the maximum error, OC. But with research we can come up with better margins of error.

Ummm, I'm misunderstanding something. Since Go has no inherent randomness, in a given position perfect play will always win or will always lose. So say a bot says 60%. Then as Uberdude said, if comparing to probability of winning under perfect play, either 40% or 60% is precisely the error. It is not merely the "maximum" error, one of those two *is* the error. Aside from somehow divining optimal play to know which of those two it is, there is nothing further to research and no better estimate is possible.

Did you mean something different by "perfect play" than what people usually mean by it? Or was "with research we can come up with better margins of error" already an acknowledgement the definition you gave wasn't actually the definition you wanted, and presumably the "research" would also involve finding what definition would be better? Or am I just misreading something?

Edit: As quoted, you did mention that winrates in practice are based on self-play, not perfect play, which is true. But then I don't understand what "taking a player's winrate to be an estimate of the probability of a win by that player, given perfect play from the given position", if not meaning that "perfect play" is the proposed benchmark to compare error against, which leads to the highly-non-useful 40% or 60% above.

Side note: self-play in training is very, very far from perfect play, and it is even very far from "as close to perfect play as we can come at this time", because self-play in training is very deliberately made noisy and weaker than bots are actually capable of playing (due to performance reasons, and due to noise actually being desirable in the dynamics of training).

Bill Spight · Post by **Bill Spight** » Fri Nov 01, 2019 6:40 pm

lightvector wrote:Ummm, I'm misunderstanding something. Since Go has no inherent randomness, in a given position perfect play will always win or will always lose. So say a bot says 60%. Then as Uberdude said, if comparing to probability of winning under perfect play, either 40% or 60% is precisely the error. It is not merely the "maximum" error, one of those two *is* the error. Aside from somehow divining optimal play to know which of those two it is, there is nothing further to research and no better estimate is possible.

What I think you are missing is that the probabilities are conditioned on our state of knowledge. Perfect play might be deterministic, but our knowledge of it is uncertain.

Here is a simple example. Suppose that we have a randomly shuffled standard deck of 52 cards. What is the probability that the 9th card in the deck is the Jack of Diamonds? Let's say that it is 1/52. Suppose now that we check the 9th card and it is the Jack of Diamonds. Does that mean that our error is 51/52? No. Given our assumptions our error is 0. That reflects our knowledge about standard decks of cards.

Now suppose that we know nothing about standard decks of cards. What is the probability for us that the 9th card in the deck is the Jack of Diamonds? All we can say is that it lies between 0 and 1. Now let us check the 9th card in randomly shuffled decks 1,000,000 times. Doing so lets us estimate the physical probability that it is the Jack of Diamonds.

IIUC, one definition of a winrate estimate is the probability that the player whose winrate it is will win with self play if the game is played out from that point. Let's say that 160 moves have been played and the winrate estimate for Black is 60% with 10k playouts. From our previous discussion I believe that you said that self play from that position is deterministic, that we cannot replay it and get a different result. But we can look at other positions after 160 moves with a winrate estimate for Black of 60% with 10k playouts. Doing so many times will allow us to estimate the error of that winrate estimate under those conditions.

We can also check the error of the difference in winrates between two plays under certain conditions in a similar manner. How often does the play with the higher winrate win while the other play loses, and vice versa? My hunch is that early in the game there are many plays that produce the same win/loss result with perfect play, and even with very good play, but if LZ say that the winrate difference between the two is 7% with a comparable number of playouts greater than 10k, I think that it is very likely that the play with the worse winrate estimate is a mistake. If LZ says that the winrate difference is only 2%, I am inclined to think that the difference could easily be noise.

Edit: If you are interested in a Bayesian approach, I can recommend The Estimation of Probabilities by I. J. Good.

lightvector · Post by **lightvector** » Fri Nov 01, 2019 8:44 pm

Bill - ah, that makes sense, I see what you're getting at now. Thanks!

Yes, I'm already familiar with Bayesian reasoning, but your phrasing in your earlier posts did not suggest to me that this is what you meant. Partly since when I discuss things in a Bayesian perspective with others in actual work or in real life (which is actually not infrequently), I'm very used to everyone speaking linguistically in a way that reflects how Bayesian probabilities are model-dependent and observer-dependent - that they are a foremost a quantification of beliefs and of uncertainty rather than of objective probabilities in the world.

E.g. a phrase like "the error in the winrate as an estimate of the probability of a win by a perfect player" linguistically suggests that there is a correct probability out there in reality, and that we are comparing the winrate against that. In objective reality, the only relevant probabilities for a perfect player are 0% or 100%, so the error (i.e. the difference) between a winrate of say 60% and this must be either a full 60% or 40%, though we do not know which. And of course, this is unuseful. The object being reasoned about itself is not probabilistic, the only uncertainty is in our knowledge about it. And it is the latter uncertainty that we want to discuss and compare against, but this is not implied by the plain English meaning of that wording.

By contrast, a phrasing like "the error in the winrate relative to our best belief/credence that perfect play would win" or even just "relative to our probability for perfect play winning (given our uncertainty)" linguistically implies what we are comparing the winrate against is something to do with us and our knowledge rather than something objective.

Paying attention to this sort of word/syntax choice takes effort, but I find it helps a lot in clarity when I discuss things like this.

Life In 19x19

Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)

Re: Find the best move (AI will be no help)