It is currently Fri Mar 29, 2024 3:38 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 80 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #21 Posted: Fri Apr 17, 2020 3:55 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
I'm still doubtful.

moha - if I understand correctly, you argue that the number of Elo points to perfect play can be very approximately bounded by 400 Elo * the number of points difference to perfect play, coming from the fact that 1 point is the minimum discrete unit that can be lost/gained.

This would seem to imply that there should not be more than 800 Elo points range in Chess. since there are only two units difference total in Chess. Define the "score difference" of a chess game as 1, 0, or -1 points depending on white wins / draw / black wins. But in actuality, we know that there are more than 800 Elo points range in Chess, even if we exclude very bad and very pathological players. Start with some decent 1400 Elo casual amateur player and we see there are still something like 1400 more Elo points to the top humans, and many many more Elo beyond that as you push into levels only achieved by bots. So the argument can't be correct.

I think this is because as players get better, even though any mistake objectively can cost you only a whole discrete unit or not at all, there are worlds of complexity and gradations hidden within the fight for that discrete unit, such that advantages and disadvantages and gains and losses from mistakes can happen at levels much finer than one whole unit for practical players, even if an omniscient objective view would not care until they finally "accumulated" to actually a whole unit of effect.

So the big flaw in the model is to treat mistakes as also being discretized to units, such as having a probability of losing a point on each move or not. In practice there exist "mistakes" that even if objectively they lose nothing, each successive one makes it increasingly "easier" to actually then make a move that does objectively lose later, unless the opponent's own mistakes cancels them out, and depending on the distributions of these "sub-unit" mistakes, this means you can get much more than a 75% winrate difference within the fight for that unit.

Now of course, there does seem to be some suggestiveness that Chess Elo range is finite - top level games have increasingly high draw percentages, particularly in the TCEC matches. But 400 Elo per "unit" is clearly too tight of a bound - exactly what the bound is should also depend on how much "subunit" gradation there is for practical real players and the mechanics of how it change or build up or not, which seems very hard to objectively pin down.

Maybe if players got within a point or two of optimal in Go it would turn out that there's no "sub-point" complexity in Go, but even only in the endgame you have things like fractional miai values, all those messy CGT problems whose only result is tedomari to round the score in your favor, and stuff like that - where one can make mistakes that are *much less* than one point - which makes me doubtful that this is true. And that's only the endgame.

So I would strongly suspect that the Elo range in Go is still finite (and not in some silly way like being exponentially large, but actually something not too unreasonable), but also that the last few points could quite easily be much more than 400 Elo per point.


Last edited by lightvector on Fri Apr 17, 2020 4:09 pm, edited 1 time in total.

This post by lightvector was liked by: sorin
Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #22 Posted: Fri Apr 17, 2020 4:07 pm 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
gennan wrote:
LeelaZero currently has a selfplay rating of 16,000 Elo (where 0 is random play). If we estimate that LZ is about 1 rank (14 points) below perfect play, the perfect player's rating would not exceed 21,600 Elo.
LZ's rating could keep climbing long after it stopped improving (it does not measure real, reproducible strength difference). Also one should be careful with half point komi - that is just a hack and not necessarily as consistent as integer komi.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #23 Posted: Fri Apr 17, 2020 4:33 pm 
Lives in gote

Posts: 388
Liked others: 416
Was liked: 198
Uberdude wrote:
As a player gets stronger they become more consistent so a particular winning percentage in games becomes less and less points / handicap stones difference.
So maybe AG0.1 beats AG0 100-0 and is 2 points stronger aka 1/7 of a handicap stone.
AG0.2 beats AG0.1 100-0 and is 1.5 points stronger.
AG0.3 beats AG0.2 100-0 and is 1.2 points stronger.
AG0.4 beats AG0.3 100-0 and is 1.0 points stronger.
AG0.5 beats AG0.4 100-0 and is 0.8 points stronger.

Depending how these point margins decrease they may have a finite or infinite sum, and even if finite might not reach 20 points or however far we currently are from perfect play in the required timeframe.

And it's all kind of hand-wavy with lots of dubious assumptions: if A is x points stronger than B and B y stronger than C it does not necessarily follow A is x+y stronger than C (I forget what this operator property is called, transitive?). And what does being 0.2 points stronger mean in a game which is ultimately scored in integers?


I see, I understand the infinite series summing to a finite value.
I think I was focusing too much on points/handicap difference, and forgot that the difference in strength is on a "continuous scale", we just measure it in points/handicap.
I am still bothered by the "100-0" part, maybe because I am thinking too much in terms of human players :-)
To me, that sounds more like a bug in the learning process (something like, say, an exploitable ladder blindness, or AlphaGo's bug that Lee Sedol found and punished in game 4, to materialize itself so decisively as zero chance to win).

_________________
Sorin - 361points.com


This post by sorin was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #24 Posted: Fri Apr 17, 2020 6:45 pm 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
lightvector wrote:
the number of Elo points to perfect play can be very approximately bounded by 400 Elo * the number of points difference to perfect play, coming from the fact that 1 point is the minimum discrete unit that can be lost/gained.
I wouldn't make this generalization. For one thing, playing to win by the most points (against large komi or handi) is not completely the same skill as playing to win the most games (highest wr on normal komi). I also didn't want to emphasize the "unit" concept, only used the 1 point as an example player at class -1. But I'm not sure now.

Quote:
This would seem to imply that there should not be more than 800 Elo points range in Chess. since there are only two units difference total in Chess. Define the "score difference" of a chess game as 1, 0, or -1 points depending on white wins / draw / black wins.
In this case, assuming chess is draw by perfect play, achieving 25% against it means drawing one game and losing one on average. This drops 1 point per 2 games, so 0.5 per game - but receiving 0.5 komi (ouch, I don't take this seriously myself :)) .. Edit: see below.


Last edited by moha on Fri Apr 17, 2020 11:28 pm, edited 2 times in total.
Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #25 Posted: Fri Apr 17, 2020 9:10 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Uberdude wrote:
As a player gets stronger they become more consistent so a particular winning percentage in games becomes less and less points / handicap stones difference.
So maybe AG0.1 beats AG0 100-0 and is 2 points stronger aka 1/7 of a handicap stone.
AG0.2 beats AG0.1 100-0 and is 1.5 points stronger.
AG0.3 beats AG0.2 100-0 and is 1.2 points stronger.
AG0.4 beats AG0.3 100-0 and is 1.0 points stronger.
AG0.5 beats AG0.4 100-0 and is 0.8 points stronger.


I think that this scenario is impossible. Given the granularity of area scores, where the just noticeable difference is 2 pts., with some exceptions, any program that is less than 2 pts. stronger than its opponent will not be able to win all of the time. I have addressed this question with bots that play incomparable strategies. But, upon reflection, even if the strategy of one bot dominates the strategy of the other, if it does so by less than 2 pts., on average, then it must lose some games to the other bot, given a non-integer komi.

Edit: Conceivably, the best player might be able to force a seki with an odd number of dame when necessary to win by ½ pt. instead of losing by ½ pt., but I strongly doubt it.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #26 Posted: Fri Apr 17, 2020 10:36 pm 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
moha wrote:
lightvector wrote:
This would seem to imply that there should not be more than 800 Elo points range in Chess. since there are only two units difference total in Chess. Define the "score difference" of a chess game as 1, 0, or -1 points depending on white wins / draw / black wins.
In this case, assuming chess is draw by perfect play, achieving 25% against it means drawing one game and losing one on average. This drops 1 point per 2 games, so 0.5 per game - but receiving 0.5 komi (ouch
Hm. I think I see how to score that now. Also both in your distorted chess example and in go, being at class -1 is a very strong player, close to perfect play. And these example players DID use up half point komi range to cover their one class distance.

The problem with your example seems not with this, but with the other assumption - that the second class step (achieving 25% against the first one) necessarily uses up more pointwise distance than the first. This is clearly false in your distorted chess: getting 2*0.5=1 point komi covers much more classes - that is enough for the worst player to draw with perfect play. But this assumption may still be true in go, there are reasons to think so (it is true lower in the range, even though there may be exceptions near these class-1 levels, who knows).

Your example seem to have fallen into the trap I mentioned earlier about half point komis: even though you declare a linear relationship between scores -1 .. 0 .. +1, this is simply not true. Turning draws to wins (what half point does) is not the same or half amount of utility as turning ALL losses into draws (what a whole point does - but only in your example, much less so in go.)

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #27 Posted: Sat Apr 18, 2020 3:11 am 
Lives in gote

Posts: 476
Location: Netherlands
Liked others: 270
Was liked: 147
Rank: EGF 3d
Universal go server handle: gennan
lightvector wrote:
I'm still doubtful.

moha - if I understand correctly, you argue that the number of Elo points to perfect play can be very approximately bounded by 400 Elo * the number of points difference to perfect play, coming from the fact that 1 point is the minimum discrete unit that can be lost/gained.

This would seem to imply that there should not be more than 800 Elo points range in Chess. since there are only two units difference total in Chess. Define the "score difference" of a chess game as 1, 0, or -1 points depending on white wins / draw / black wins. But in actuality, we know that there are more than 800 Elo points range in Chess, even if we exclude very bad and very pathological players. Start with some decent 1400 Elo casual amateur player and we see there are still something like 1400 more Elo points to the top humans, and many many more Elo beyond that as you push into levels only achieved by bots. So the argument can't be correct.
Winning/losing the game may not be the smallest discrete error that can be made in chess. Errors may be smaller, such as losing a pawn (often considered a basic unit in chess), or losing the right to castle or exchanging a bishop for a knight, wich may be a only a fraction of this pawn unit. So I think chess is not really suitable for this approach of considering a smallest discrete error size.
But you could still use an error approach in chess and in go by ignoring the size of the error and only counting the number of mistakes that potentially affect the game result. A near perfect player (1 "class" below perfect play) would make 1 such mistake in 2 games and draw in the other game against a perfect player, so he would score 25% against a perfect player and maintain a rating 200 Elo below the perfect player. The size of the game losing error doesn't matter, nor if the error has some discrete fundamental unit. For this near perfect player, only the frequency matters.
lightvector wrote:
I think this is because as players get better, even though any mistake objectively can cost you only a whole discrete unit or not at all, there are worlds of complexity and gradations hidden within the fight for that discrete unit, such that advantages and disadvantages and gains and losses from mistakes can happen at levels much finer than one whole unit for practical players, even if an omniscient objective view would not care until they finally "accumulated" to actually a whole unit of effect.

So the big flaw in the model is to treat mistakes as also being discretized to units, such as having a probability of losing a point on each move or not. In practice there exist "mistakes" that even if objectively they lose nothing, each successive one makes it increasingly "easier" to actually then make a move that does objectively lose later, unless the opponent's own mistakes cancels them out, and depending on the distributions of these "sub-unit" mistakes, this means you can get much more than a 75% winrate difference within the fight for that unit.

Now of course, there does seem to be some suggestiveness that Chess Elo range is finite - top level games have increasingly high draw percentages, particularly in the TCEC matches. But 400 Elo per "unit" is clearly too tight of a bound - exactly what the bound is should also depend on how much "subunit" gradation there is for practical real players and the mechanics of how it change or build up or not, which seems very hard to objectively pin down.
Using this error frequency approach, I think 400 Elo per unit is not too tight. Perhaps top chess engines are fairly close to making only 1 potentially game losing error per game (400 Elo below perfect play). Casual chess players probably make many potentially game losing errors per game. As a 1st order approximation, taking 4000 Elo as a top engine rating and 1200 Elo as a casual player rating, the casual player would be separated from the top engines by 14 "classes", so the frequency of potentially game losing errors might be 7 times higher than once per game (i.e. about 7 times per game).
lightvector wrote:
Maybe if players got within a point or two of optimal in Go it would turn out that there's no "sub-point" complexity in Go, but even only in the endgame you have things like fractional miai values, all those messy CGT problems whose only result is tedomari to round the score in your favor, and stuff like that - where one can make mistakes that are *much less* than one point - which makes me doubtful that this is true. And that's only the endgame.

So I would strongly suspect that the Elo range in Go is still finite (and not in some silly way like being exponentially large, but actually something not too unreasonable), but also that the last few points could quite easily be much more than 400 Elo per point.
Using the error frequency approach still leaves the question to determine what qualifies as a potentially game losing error. Perhaps in chess and go we could use this 1st approximation: count the number of times a player makes a mistake that potentially changes the game result (like failing to convert a decisive advantage to a win, failing to hold a draw or throwing away a game where you had a decisive advantage).

For go, there probably exists a correlation between the frequency of potentially game losing moves and average error size per move. This would allow an approximate conversion between the error size distribution (in units of komi points or 14 point ranks or handicap stones) and the error frequency (assuming 400 Elo points for each error frequency increment per game).

Such a conversion may also increase the accuracy of measuring the quality of play from a small number of games, because the signal to noise ratio of frequency would be small for low frequencies. Also, when the opponents are badly matched (even game with large skill gap), the frequency approach would be unreliable, because chances are that the stronger player quickly builds up such a big lead that none of the mistakes is large enough to potentially swing the result, resulting in underestimating the error frequency and thus overestimating the quality of play. Perhaps we should use a less conservative measure of what counts as a potentially game swinging error (e.g. also including errors with a size above some threshold, like a 14 point mistake in go or hanging a knight in chess, even when it doesn't affect the game result).

A correlation between error frequency (assuming this is represented by Elo rating) and error size (average centipawn loss per move) also seems to exist in chess. See average centipawn loss per move vs Elo rating: https://chess-db.com/public/research/qualityofplay.html.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #28 Posted: Sat Apr 18, 2020 6:46 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
It's a little confusing to see people try to argue that my translation of (a naive version of) moha's argument to Chess gives a flawed argument... when demonstrating that the argument is flawed the entire point of my post.

One cannot purely mechanically tell from any apparent discretization in the rules to what extent there exists a notion of "mistake" that is less than that discretization, and that can accumulate or not in practice. And neither can one reason only about mistakes only in the sense of mistakes that affect the idealized game-theoretic value of the position and the frequency with which they occur per move. Things like "losing a pawn" or even more minor positional mistakes can be almost invisible to the "did-the-game-theoretic-value-change" notion of mistake, yet they matter for practical players and can accumulate over time to add up to a game-theoretic mistake, and whether and to what degree in practice mistakes exist that don't change the game theoretic value of a game seems to matter immensely for how widely that game spans on the Elo scale.

So yes, my point is that a-priori one can't rule out the same thing happening to Go. It could for example be that there are 3000 Elo points (15 "classes") between essentially-perfect play and a player that almost never does more than 1 point worse than perfect, if it turns out that just like "pawn advantage" is an emergent notion of advantage in Chess on which one can accumulate or lose gradually before finally enough of it builds up to switch the game-theoretic outcome of the game, perhaps in Go for such near-perfect players could be very notions of advantage/disadvantage/mistakes that emerge in Go for such players far finer than granularity of 1 point that accumluate to determine who finally gets that point. Or maybe not - it's hard to say. Maybe 7x7 and 9x9 and 11x11 could be ways to study this.

But anyways, that means that it's way too early and very shaky to say things like this:
Quote:
As mentioned I think 1 stone (14 pts) cannot worth more than 28 classes, thus quite finite Elo-wise.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #29 Posted: Sat Apr 18, 2020 8:39 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
lightvector wrote:
It's a little confusing to see people try to argue that my translation of (a naive version of) moha's argument to Chess gives a flawed argument... when demonstrating that the argument is flawed the entire point of my post.
I don't think your example demonstrates this as long as it has a different flaw. You cannot randomly assign numeric values to things (-1 = chess loss, 0 = chess draw, 1 = chess win) and expect that these will magically start to honor basic math about distances, addition and multiplication (0.5+0.5 = 1-0 = 0 - -1), such needs specific conditions met to work.

Also there is no real 0.5 pt komi in a game that is scored in integer units. You can treat draws as B wins or W wins, but this is not the same (not even in go but it's closer there). You may try 0.5 komi as alternating between 0 and 1 komi - and again your example doesn't seem work.

Quote:
So yes, my point is that a-priori one can't rule out the same thing happening to Go. It could for example be that there are 3000 Elo points (15 "classes") between essentially-perfect play and a player that almost never does more than 1 point worse than perfect, if it turns out that just like "pawn advantage" is an emergent notion of advantage in Chess on which one can accumulate or lose gradually before finally enough of it builds up to switch the game-theoretic outcome of the game,
Sure, mistakes can occur in subpoint level, but in go we DO have a numeric point scoring system (regardless its name) where basic math is supposed to work to an extent. Whichever arrangement of those wr-classes you hypothetise near/below perfect play, you will need to find a pattern of wins/losses - even in go's discrete endpoint-scoring terms! - that honors both the chain of 25% wins, the 50% wins with the correct integer komi over a chain or subchain, the near-linear additivity of komi/handicap for subchains (as experienced in practice in go) AND achieve this without any rock-paper-scissor like effect, ie. with players showing the same performance against any opponent (which OC is not completely true in reality, but the model needs to work under these assumptions as well).

In go there is also the observation/assumption that classes have monoton widening point range downwards, which we don't know yet if true over the whole range or just below 9d - but clearly not in your example.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #30 Posted: Sat Apr 18, 2020 9:23 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Okay let's be more explicit. :)

We know from miai-counting theory that even without fully getting into CGT-levels of complexity, we can have locally fractional move values, and that these fractional values accumulate in a way that, along with tedomari for the last point, the final score "rounds" to an integer at the end. You can have moves that differ in value by 1/4 of a point, such that if you make 4 such errors cumulatively you lose precisely 1 point, and if you make fewer than 4 such errors you may or may not lose a whole point depending on how close the "current" value your position was to the edge and how tedomari plays out. Importantly, it *doesn't* behave like independent random noise - if you make 4 such mistakes, you end up precisely 1 point worse 100% of the time, rather than it behaving like 4 separate independent random losses of 1 point with 1/4 probability each. If you make fewer than 4, it can depend on where the position was initially balanced, perhaps you are close enough to the edge that just 1 such mistake loses you a whole point, unless your opponent cancels it out with a corresponding mistake.

In the midgame (and even the macroendgame) positions are too inter-correlated to decompose them into independent local parts, but just like in the endgame, it seems highly plausible to me that in the midgame and opening there is is also some emergent notion of moves being able to differ in value "fractionally" in points, or "on average", but with those fractional values being globally additive in the same way as above.

Suppose it happens to be that Go with with a certain ruleset and 6.5 komi happens to be balanced at almost knife-edge with respect to some emergent notion of "fractional" averaged points like this, and with the property that even a slight deviation from optimal by this notion of fractional points will cause you to fully round to winning by 0.5 versus losing by 0.5. Then you could plausibly observe that player A wins 90% of the time against B by 0.5 points if A on average makes 0.1 points fewer worth of mistakes with some slight variability, and the game "rounds" in their favor to a whole 0.5 points. And similarly B wins 90% of the time against C, and C against D, D against E. Now if you compare A versus E, A would win almost all the time, but still almost always by 0.5 points.

This is of course an overly simplistic model, but it's seems hard to be sure that something broadly like it won't happen. You can't rule it out by observing approximately linear additivity at the current levels of play on 19x19, because current play is too low-level and noisy - with current levels of play we'd expect approximate linear additivity even with the above model too, which would gradually break as you get to within a few points of optimal. If you think about it, linear additivity is precisely one of the weaker assumptions you'd expect to have some chance of not holding once the discretization of results becomes so large compared to the level of play.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #31 Posted: Sat Apr 18, 2020 9:36 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
And I guess I'm saying - if linear additivity does break (even in ways that aren't exactly like the above model), then you're potentially back to how Chess behaves where win/loss/draw doesn't obey arithmetic in the same way and you can't really bound the amount of Elo present by such arguments.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #32 Posted: Sun Apr 19, 2020 7:15 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
I think even against perfect play, you cannot assume that the game is balanced on the razor's edge in a way that one 0.1 pt mistake would always turn the draw into a 1 pt loss. If this is the case from B's view it follows that there are room for 0.9 pts total mistakes when playing with W (0.1-0.9 balance), which is 50% of time. So if you make 0.1 pt mistake average, not only would you not lose 90% but not even 50% (drawing the rest), less than a whole class.

It's also probably more healthy to start with a 0.5-0.5 subpoint balance assumption against perfect play. In that case you need to make 0.5 pt worth of subpoint mistakes to lose a game with any color - so to get 25% you need to be over 0.5 in half of cases and under 0.5 in other half (0.5 avg, same as the integer view - coincidence?).

But I'm not ruling out the possibility of similar or different tricky phenomenons happening near perfect play. This does not need any subpoint logic though. It seems enough to say that as perfect play does not (may not) include randomness, it probably not explores the whole distribution from class-1. I think this is similar to what your example aimed at so let's borrow part of your idea: Suppose it is easier to go astray and enter a losing line with B than with W (with perfect komi). We have little randomness because we are close to perfect play. Following your logic, it could seem that class-1 .. class-2 .. class-3 enters into a pattern where they start to lose most of their games (against slightly stronger opponent who does not cancel out small mistakes?) as B (many losing lines, one drawing line) while still drawing with W. This could give them near 25% against each other despite being only fractionally weaker pointwise.

This also shows why this argument is invalid though. To be at class-1 it is not enough to get 25% against perfect play when played directly. For this strength (and Elo) you need to perform accordingly against a wide panel of (all possible) opponents and situations (even when one does not actually test all in practice). It is common that a player performs unexpectedly good or bad against a certain single opponent or situation, but this does not change his (theoretical) strength or class. The 1 point example was just an example, in reality "rounding" your mistakes to integers cannot affect (or affect differently) your class. If you only make average 0.1 pt worth of mistakes per game, you will be rated much closer than a class from perfect play once tested widely and seriously - exactly because 0.1 pt mistakes has, on average, much less effect on the game than one point mistakes.

Edit: I'm not sure if this can not be turned against the original hypothesis somehow, though. :scratch:

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #33 Posted: Sun Apr 19, 2020 12:45 pm 
Tengen

Posts: 4380
Location: North Carolina
Liked others: 499
Was liked: 733
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 4k
In the extreme case, consider deterministic strategies. There is (at least one) perfect strategy. Take one game featuring that strategy against itself. For each move in that game, create a new strategy that differs only in virtue of playing a losing move in that particular position. Then you've created a series of strategies that are each (at least) 400 points stronger than the previous strategy, and (at least) 400 points weaker than the next strategy, until you reach the perfect one.

That seems to show a lower bound of (something like) 200 * 400 ELO points. But we can do much more: for each of those imperfect players, we can do the same cloning for all points after the first deviation. I can't quite think through how far this point goes right now, but I think we're dealing with the "kind of" astronomical numbers you see when calculating the number of possible states of the go board, or number of games.

Our human brains are not deterministic, so human ELO scales are quite compressed. Bots are more predictable, which is one contributor to the 16000 ELO that you hear about with Leela, but they still aren't deterministic. So I expect the bot range is more like the human range than the astronomical range. But I don't know what it would be.

Related to this point: ELO scales aren't really a measure of strength. They're a measure of strength relative to a population and a pairing strategy for games. The ELO difference between players depends on the other players they're surrounded by. If that's true, then the original question isn't really about how long bots can improve, it's a question about how many different strategies those improving bots can implement.

_________________
Occupy Babel!

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #34 Posted: Sun Apr 19, 2020 2:07 pm 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
hyperpape wrote:
Then you've created a series of strategies that are each (at least) 400 points stronger than the previous strategy,
These would be almost identical in strength as strength or Elo is not measured against a single opponent but as average performance against all possible opponents.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #35 Posted: Sun Apr 19, 2020 2:47 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
moha wrote:
I think even against perfect play, you cannot assume that the game is balanced on the razor's edge in a way that one 0.1 pt mistake would always turn the draw into a 1 pt loss. If this is the case from B's view it follows that there are room for 0.9 pts total mistakes when playing with W (0.1-0.9 balance), which is 50% of time. So if you make 0.1 pt mistake average, not only would you not lose 90% but not even 50% (drawing the rest), less than a whole class.


You might have missed that my example was with a non-integer komi, so no draws. And/or maybe you haven't quite understood the model?

To illustrate, suppose the model is that at the end of the game, whatever hidden fractional advantage is deterministically rounded to the nearest of -1.5, -0.5, 0.5, 1.5, etc for the final observable outcome in points. Suppose it luckily happens to be the case that naturally the game "starts at" 0.01 fractional advantage - extremely close to 0.

Say player A over the course of any game loses 0.05 fractional advantage with a standard deviation of 0.01, independently and roughly normally distributed.
Say player B over the course of any game loses 0.1 fractional advantage with a standard deviation of 0.02, independently and roughly normally distributed.
Say player C over the course of any game loses 0.2 fractional advantage with a standard deviation of 0.04, independently and roughly normally distributed.
Say player D over the course of any game loses 0.4 fractional advantage with a standard deviation of 0.08, independently and roughly normally distributed.

Then A beats B, B beats C, C beats D, each time with high probability, averaging more than 90%. Each one will reliably achieve a small net fractional advantage over the next player much more than how much the color affects it (only 0.01), and with a small enough variation that it will almost always round to a win by 0.5 points (by being closer to 0.5 than to -0.5).

And if A plays D, then A will achieve average net fractional advantage of +0.34 or +0.36 depending on color, and again with a small enough standard deviation that virtually always it will still round to A winning by 0.5, rather than either -0.5 or 1.5.

We can do the same with draws. Suppose we use an integer komi, so that now the model is that at the end of the game, whatever hidden fractional advantage is rounded to the nearest of -2, -1, 0, 1, 2, etc for the final observable outcome in points. Suppose it luckily happens to be the case that naturally the game "starts at" 0.51 fractional advantage - extremely close to 0.5, from white's perspective.

Let's use A,B,C,D as above exactly the same. Now what happens is each player will win win as white 90+% of the time, and draw as black 90+% of the time against the previous, for a net outcome of close to 75% wins if you count draws as half. So we've squeezed almost 3 "classes" into the span of less than half of a point, and it's clear we could have squeezed in more.

So this:
Quote:
0.1 pt mistakes has, on average, much less effect on the game than one point mistakes.


might simply not be true. The above model demonstrates one way in which it might not be true - namely if the game happens to be balanced near an edge and the other player is sufficiently closer to perfect than you so that even if they also make many mistakes, statistically the sum of your mistakes almost always exceed theirs and put you on the wrong side of the edge. (and we *know* already that miai counting theory shows us that endgames may sometimes behave this way, with this deterministic-like rounding of tiny theoretical differences to an integer or half-integer)

Maybe it requires some "luck" for this game balance, and also the above model makes lots of assumptions, so I won't say that it's even a likely or good model, but it seems very hard to be sure that something roughly like it can't happen.

And notice that I didn't need to suppose that any of the players is particularly specialized at exploiting any of the other players. Each one independently simply makes mistakes according to a certain distribution regardless of who it's playing. :)

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #36 Posted: Sun Apr 19, 2020 4:02 pm 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
I think how I first interpreted your example (0.1-0.9) may be closer to go reality (and without players who basically play with a constant per-game error). But even with these extreme conditions, I don't think the example works: these players are not classes apart. Direct results against each other is secondary (such upsets are not uncommon), but if you test and rate them against a wider set of opponents who make all kinds of errors (so that their whole distribution, real strength, all possible rounding ranges/cases gets explored), their performance will not be significantly different, so will get almost the same Elo rating. Or am I still misunderstanding you?

BTW both Elo and my error distribution model is flawed in the immediate vicinity of perfect play. Both assume a symmetrical performance distribution, that if average is X then rare cases of performing X-5 is equally likely than X+5. This is never completely correct but may became an intolerable flaw in the immediate vicinity of 0 error (no negative errors), so care must be taken here. (Cf your example: if your avg is 0.1 you pretty much have to play a perfect game quite often so sd 0.02 sounds weird.)

What seems possible though is taking a player (or various players) 1 stone or a few points below perfect play (who achieves 50% against it with the extra stone or komi), and testing how it performs against perfect play on perfect komi (how much rare draws it can catch). This could show their Elo distance (the equivalent of that point distance). And would also hint a bit about the role of whole point scoring, and the "hardness" difference between draw and the smallest loss.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #37 Posted: Sun Apr 19, 2020 7:42 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
moha wrote:
I don't think the example works: these players are not classes apart. Direct results against each other is secondary (such upsets are not uncommon), but if you test and rate them against a wider set of opponents who make all kinds of errors (so that their whole distribution, real strength, all possible rounding ranges/cases gets explored), their performance will not be significantly different, so will get almost the same Elo rating. Or am I still misunderstanding you?


Sort of. I feel like the above model isn't sticking in your head because of some deep assumption you're making, or that I'm not getting across well. I don't think you have any guarantee that testing against a wider range of opponents (that are also similarly "close" to optimal so as to not just lose all the time) will get you other roundings and ranges and cases.

Please take the model I stated above literally at face value for a moment - consider if perhaps maybe between ten and thirty times per game a player faces a choice between very similarly good moves in which occasionally they could make a mistake that loses 0.01 "points", or 0.003 "points", or other tiny amounts "worse" by some metric. Suppose that the way you might lose 0.2 "points" on average turns out often not to be by a single random blunder of 1 point with 20% probability over the course of a game, but rather by statistically on average 10 "mistakes" of 0.02 "points" over the course of the game that give you a normalish distribution. And suppose that in the endgame it will deterministically turn out regardless of what variety of opponent you're playing against, whoever made the smaller sum of "mistakes" by this hidden emergent internal metric of "points" will turn out to get tedomari for the last real point and win by a whole 0.5 real points in terms of the actual game result. Again: no matter what variety of opponent, the game always roughly rounds in this way, because it is a function of how the game itself turns out to work at such levels of play, not of any particularly special class of opponents.

Adjust my numbers to reflect this if you like - you can widen all the standard deviations to be sqrt(10)x smaller than the mean instead of 5x smaller. You still get 3 or 4 "classes" within half a point, rather than just one or two.

A stupid model yes, hard to confidently rule out this behavior. For example, in Chess the emergent metric of "sum of material values of pieces plus various standard positional factors" matters for all practical players, even though nowhere in the rules is it written that "1 * pawn + 3.5 * bishops + 5 * rooks + ..." should have anything to do with whether your king will eventually be checkmated. It doesn't matter whether the opponent thinks in terms of material or some entirely different way - so long as you and they are of similar strengths, in practice, "4 pawn equivalents" of advantage will almost always round to "the will be checkmated many moves later". So there is a hidden emergent "metric" (indeed, many of them, all highly correlated) such that almost all practical matchups of players experience the game "rounding" to a discrete outcome the same way based on it, even players that don't explicitly "think" in terms of things like material count.

Why can't this happen in Go, for the fight over the last point? Perhaps once you get within a few points of optimal there ends up being some advanced and sophisticated emergent concepts in Go that apply to almost all non-pathological players for "how to globally coordinate positions to get the last point to round in your favor". Perhaps it turns out that fighting for it is less like "each turn independently, your imperfections give you a 10% chance to throw away the final point, and more like "gradually you accumulate a succession of advantages and optionality that give you increasing control over the parity for the final point" so that whichever player is slightly better at it can reliably convert that to winning by 0.5, regardless of the quirks of the opponent. Then there could be room for hundreds and hundreds of Elo in being incrementally and reliably better than the opponent at this. And again, if this is emergent to the game itself (like material is in chess), it could be shared across all natural players, and not go away simply when you average across a population of opponents.

One last way very different to visualize it maybe - imagine again we're at the fairest possible integer + 0.5 komi, and we had some oracle that magically would categorize all moves as "will win or lose by 0.5" or "losing by more than that" or "winning by more than that", and both we and the opponent could always use the oracle on every turn, and thereby neither would ever lose by more than 0.5. Consider pruning the game tree down to just the positions resulting from that. How much strategy would be left for fighting over which of "win by 0.5" and "lose by 0.5"? Suppose we're also talking about the bots XXX years in the future that are not far away from this level anyways.

* It could be highly chaotic with few learnable generalizations about which branches lead to what. Then Elo measures would be dominated by non-transitive pathological trees of players better than other players in particular ways that don't transfer.
* It could be that the +0.5 and the -0.5 positions were clustered in a learnable way but in a way that any +0.5 position wasn't much "farther" from -0.5 than any other. So any occasional mistake could always flip the result, unless the opponent made an occasional mistake back later. So it would all come down to the probability each turn that you flip the result, which averaged across many opponents and positions would get you your 1 or 2 classes fitting within the last point, but no more, exactly as you said.
* It could turn out they were clustered in a very-graduated way such that such that some +0.5 positions could be "distant" from any -0.5 positions, some would be "nearer", and so on, such that making a mistake would only incrementally cost you by putting you nearer, and a usually only a mistake near the boundary would actually flip you into the other side, a mistake while already at -0.5 would put you deeper, etc. Then the resulting strategy could be tug-of-war-like, such that the player making a lower rate of small incremental errors would reliably win, but a player making an even lower rate would reliably beat them and so on. Then you have the potential for many more than 2 "classes" of Elo to fit within the fight over this 1 point.

The last case could be less likely than the others, but it seems a-priori possible. And again it doesn't matter whether you average across many opponents or not.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #38 Posted: Sun Apr 19, 2020 11:18 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Despite knowing next to nothing about the Elo system, I am strangely drawn to this discussion, with the hope that I might have something meaningful to say. ;) I may have something to add about fractional errors.

moha wrote:
The 1 point example was just an example, in reality "rounding" your mistakes to integers cannot affect (or affect differently) your class. If you only make average 0.1 pt worth of mistakes per game, you will be rated much closer than a class from perfect play once tested widely and seriously - exactly because 0.1 pt mistakes has, on average, much less effect on the game than one point mistakes.


lightvector wrote:
{That} might simply not be true.


Note: Below I am going to use octal notation. I know that the discussants are familiar with it, but not all of the readers may be. In octal notation, 0.7 = ⅞, 0.6 = ¾, 0.4 = ½, etc. Outside of ko positions, fractional go values will be in terms of powers of ½. A mistake of 1/10 is very unlikely to occur.

There are at least two possible meanings to an error of some number of points in go. One sense is that the player makes a play that is dominated by a play that gains more points, on average. Another is that the player makes a play that, after correct play, gets a worse score. I don't know if the players are talking about either one of these, or if they have something else in mind.

Here is an example of domination. Suppose that there are two plays left on the board, one that gains 0.6 pts. and one that gains 0.4 pts. by territory scoring. Then the play that gains 0.6 dominates the play that gains 0.4. IOW, the play that gains 0.6 pts. will never score worse than the play that gains 0.4 pts., but the play that gains 0.4 pts. may score worse than the play that gains 0,6 pts., and we consider the play that gains only 0.4 pts. to be a mistake. Using | notation, let the plays be these: {1|0} and {2|1||0}. The local scores are pts. for Black. If you don't know this notation, I think it will be clear after I go through this example.

Suppose that Black plays first. Then if she makes the 0.4 pt. play she will move to the left in {1|0} to get a local score of 1 pt. and {2|1||0} will remain on the board. White will next take the remaining, 0.6 pt. play, moving in {2|1||0} to a local score of 0, for a combined local score of 1 + 0 = 1. Now suppose instead that Black makes the 0.6 pt. play first. She will move in {2|1||0} to {2|1}. That will leave two 0.4 pt. plays, {1|0} and {2|1}. They are miai, so the combined local score will be 2 + 0 when White moves to 0 in one and Black moves to 2 in the other, or 1 + 1 when White moves to 1 in one and Black moves to 1 in the other. The local scores add to 2 in either case. Black does 1 pt. better by making the 0.6 pt. play.

Now suppose that White plays first. If he makes the 0.4 pt. play there will be a local score of 0 and {2|1||0} will remain on the board. Black will next take the remaining, 0.6 pt. play, leaving the {2|1} play on the board, which White will take, for a combined local score of 0 + 1 = 1. Now suppose instead that White makes the 0.6 pt. play first, leaving a local score of 0 and the {1|0} play. Black will take that play, for a combined local score of 0 + 1 = 1. In this case it does not matter which play White makes first, the score is the same. Still, we may consider the 0.4 pt. play to be a mistake in the first sense.

OC, bots, as currently written, do not calculate the value of plays this way, so they don't make mistakes of this sort, except as viewed by humans. ;)

As I have been writing this, I have come up with a third possible meaning of a fractional mistake. It involves the concepts of an environment and its temperature. What the environment is for any board depends upon the player. The environment consists of background plays, and there at least one play in the foreground. The player knows or assumes the following. There is a play in the environment that gains a certain number of points, T, on average, along with other plays that gain T or less, such that we may estimate that playing in the environment will add T/2 pts. to the current value of the board. T is the temperature of the environment, or ambient temperature. T/2 plays a similar role in the concept of the environment as komi does at the start of the game.

Now suppose that we have one gote in the foreground that gains T + 0.1, octal, on average, for each player. We may then estimate the result of making that play and then letting our opponent play in the environment as T + 0.1 - T/2 = T/2 + 0.1. But if instead we play in the environment and let our opponent take that gote, we make estimate the result as T - T - 0.1 + T/2 = T/2 - 0.1. That is 0.2 pts. worse, on average, than if we take the gote. So we may think that not taking the gote is a 0.2 pt. mistake. OC, we are talking about estimates and average scores, so it may not be a mistake at all. Also, note that the error associated with the estimation of the final score is associated with the environment, not with the small difference. That error is on the order of T/2. Yes, if we have a lot of small errors, our estimation of their sum should be approximately normal and dependent on the size and number of the errors, but our uncertainty about the environment trumps that of the size of any particular small error. Besides, bots do not think this way, either. But humans might.

Another sense of the size of an error is the eventual difference in the score, given the following assumptions. Suppose that if you make play A. then Black can play to guarantee that the final score will be at least S for Black and White can play to guarantee that the final score will be at most S for Black, we may consider S to be the theoretical value of the game after you make play A. Following Berlekamp, I refer to such play as canonical. Similarly, if you make play B, the theoretical value is T. If S > T and you are Black, then play B is a mistake that costs at least S - T, and if you are White, then play A is mistake that costs at least S - T. OC, this is not what our discussants have in mind, because by both territory and area scoring S - T is an integer, not a fraction. But there is a form of go that has fractional scores, Chilled Go. (Environment, temperature, and chilled go all have explanations on SL, if not here. :))

In chilled go each move costs 1 pt. by comparison with territory scoring. In territory scoring each move costs 1 pt. by comparison with area scoring. So we may think of territory scoring as chilled go for area scoring. ;) If there are no complications by ko, correct play in chilled go is also correct play in territory go, which is also correct play in area go. It is quite possible, then, that the theoretical value of a go game is 7 pts. for Black in all three forms of go. For simplicity and convenience I will make that assumption. A perfect player playing Black can guarantee a final board score of at least 7 pts. for Black and a perfect White can guarantee a final board score of at most 7 pts. for Black.

Let me show how fractional scores can arise in chilled go. Here are the two positions we looked at before, {1|0} and {2|1||0}, using territory scores. In chilled go we subtract a point for each move, so {1|0} becomes {0|1} and {2|1||0} becomes {0|1||1}. {0|1} = 0.4, octal, and {0|1||1} = 0.6. Together they add up to 1.2. That's the same as their average value by territory scoring and by area scoring, as well. It's just that neither player wishes to lose a fraction of a point in chilled go by playing in either position. (This is also discussed on SL and here.)

Now let's suppose that 7 pts. for Black is the game theoretical value of chilled go and that we have a perfect player for it. Suppose that this player is Black, and White is almost as good. Then if White makes a 0.2 pt. mistake and otherwise each player plays canonically, the final chilled go score on the board will be 7.2. We do not have to rely upon estimates, we can take the final score to evaluate fractional errors. In this case, what would the final territory score be? That depends on who got the last play in chilled go. If it was Black, then White can round the 7.2 score down to 7, and if it was White, then Black can round the 7.2 score up to 8, which will usually round up to 9 by area scoring. That is true for any fractional value, with no kos. 7.2 is just as good or bad under territory scoring and area scoring as 7.8.

Now, I am not suggesting that bots switch to playing chilled go. If ko makes headaches for territory scoring, think of the headaches it will make for chilled go. ;) But if we are talking about near perfect play and fractional errors, chilled go offers a way of thinking about them.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #39 Posted: Mon Apr 20, 2020 6:35 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Thanks, Bill.

I was a little fuzzy on exactly how I was defining "fractional points" because for the Elo discussion it doesn't matter as much. If there exists any naturally learnable strategic concept or pattern or feature that lets players discriminate between "better" and "worse" positions within the same game-theoretic category - e.g. "winning 0.5 points easily, even allowing for tiny technical inaccuracies and variety in move choice" vs "winning by 0.5 points, but any inaccuracy will switch the result", and also such these "inaccuracies" by you or your opponent are commonplace and often switch you between these shades of "better or worse", and net out against each other...

Then you should have at least the potential to have more Elo "classes" than predicted by the simple model of players losing points according to a certain error distribution, with fractional points rounding sort of "randomly" based on the particular position and opponent. Because rather than mistakes "randomly" causing you to lose or not discretely, you now have room for more tiers of players with different "inaccuracy rates" relative to this natural strategic feature of the game, where a player with a lower rate may reliably beat a player with a modestly higher one.


This post by lightvector was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #40 Posted: Mon Apr 20, 2020 9:41 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Thanks, lightvector. :)

lightvector wrote:
Thanks, Bill.

I was a little fuzzy on exactly how I was defining "fractional points" because for the Elo discussion it doesn't matter as much. If there exists any naturally learnable strategic concept or pattern or feature that lets players discriminate between "better" and "worse" positions within the same game-theoretic category - e.g. "winning 0.5 points easily, even allowing for tiny technical inaccuracies and variety in move choice" vs "winning by 0.5 points, but any inaccuracy will switch the result", and also such these "inaccuracies" by you or your opponent are commonplace and often switch you between these shades of "better or worse", and net out against each other...

Then you should have at least the potential to have more Elo "classes" than predicted by the simple model of players losing points according to a certain error distribution, with fractional points rounding sort of "randomly" based on the particular position and opponent. Because rather than mistakes "randomly" causing you to lose or not discretely, you now have room for more tiers of players with different "inaccuracy rates" relative to this natural strategic feature of the game, where a player with a lower rate may reliably beat a player with a modestly higher one.


Taking random in the sense of unknown, even if determined, it is plausible, at first glance, that a chilled go score of 7.2 should be more likely to round down to a territory score of 7 than a score of 7.8. As against that, since in either case it boils down to whose turn it is, which it is plausible is a 50-50 proposition, it is plausible that the closeness to one integer or the other does not matter. Furthermore, when it boils down to a chilled go position worth 7 pts. plus an infinitesimal, it still comes down to whose turn it is; my guess in all of these cases whether it rounds up or down is a 50-50 proposition. It's an empirical question, OC.

IIUC, if it is a 50-50 proposition in all of these cases, then, outside of chilled go the fractional scores come down to chance. So as we approach perfection, these small differences between the almost perfect players wash out. But if we are far enough from perfection, then small errors can add up to 1 pt. of territory. For instance, two 0.4 pt. errors (octal) by the same player in the opening add up to 1 pt., so there is no rounding to do as far as they are concerned. That should be enough to give the opponent a chance to win.

It is natural to assume that the territory scoring error of a multitude of fractional mistakes has a normal distribution. Actually, for go it has a tighter distribution than normal, because when some number of the fraction add to exactly 1, which is not unusual because, kos aside, the fractions are in powers of 2, they have an error of 0. For instance, suppose that the distribution of small errors extends to ±3. With a normal distribution, we would expect that when we estimate a win by one player to be 1.3, sometimes that player will lose. However, once the small errors add up to 1 or more, that point acts as a barrier to "rounding down". Since there are still estimation errors, it is possible that the player will lose, but much less likely than with a normal distribution. This may be behind my sense that Monte Carlo bots tend to underestimate the winning chances of the player who is ahead. {shrug}

So there may be a natural classification of go skill based upon multiples of 1 pt. score differences. Do we then have 360 classes? :mrgreen:

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 80 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group