It is currently Thu Mar 28, 2024 1:53 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 80 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #41 Posted: Mon Apr 20, 2020 9:45 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
I'm not sure if I'm still missing something about lightvector's idea but rereading it doesn't seem I was too far off. The important part seems to be:
moha wrote:
I think even against perfect play, you cannot assume that the game is balanced on the razor's edge in a way that one 0.1 pt mistake would always turn the draw into a 1 pt loss. If this is the case from B's view it follows that there are room for 0.9 pts total mistakes when playing with W (0.1-0.9 balance), which is 50% of time. So if you make 0.1 pt mistake average, not only would you not lose 90% but not even 50% (drawing the rest), less than a whole class.
Which seems to match his opinion:
lightvector wrote:
We can do the same with draws. Suppose we use an integer komi, so that now the model is that at the end of the game, whatever hidden fractional advantage is rounded to the nearest of -2, -1, 0, 1, 2, etc for the final observable outcome in points. Suppose it luckily happens to be the case that naturally the game "starts at" 0.51 fractional advantage - extremely close to 0.5, from white's perspective.

Let's use A,B,C,D as above exactly the same. Now what happens is each player will win win as white 90+% of the time, and draw as black 90+% of the time against the previous, for a net outcome of close to 75% wins if you count draws as half.

The reason I felt this argument is unconvincing was that even if you assume the 0.1-0.9 balance from the empty board (which this example seems to do, ie. it's easy to lose with one color but accordingly hard to lose with the other color), this balance changes as long as you play against weaker players who make different and larger errors. So if we consider performance in all situations and against all opponents, cases that we will be tested from an 0.5-0.5 balance situation (equally hard to lose a point with either color) or 0.4-0.6 etc are overall more frequent than the 0.1-0.9 extreme. And given his [-0.1,0.02] and similar distributions in those cases [-0.1] will not be able to significantly outperform [-0.2]. So their overall performance differences and Elo differences will still be bound to the width of the smallest scoring unit (the amount that small errors need to accumulate to to become meaningful).

Maybe he excludes performance in those situations, such as what would only matter with different komi/handicap, not in even games (where near perfect players win anyway against 1-2 pt weaker opponents). But I'm not sure this is the case. The small winrate differences that a class-1 player MUST show above a class-2 one against weak opponents means the very rare cases where he fails to win should be significantly rarer even. OC even in a 0.5-0.5 balance situation making 0.1 or 0.2 avg mistakes matter a little, since later another swing may bring the fractional close to the rounding boundary again. But I feel that the final rounding step will still eliminate much of the required differences (compared to what would happen WITHOUT the final rounding, which would be the REAL class performance advantage).

But maybe there is a factor of roughly 2 here, coming from the fact that cases closer to 0.5-0.5 balance are similarly frequent than cases close to 0.1-0.9 (or conversely, that rounding to nearest only introduces half point inaccuracy)? Now, after Bill's post (which I will still need to read again) it seems rounding to nearest may not be a good model to start with? Getting the last move is connected to dame parity and territory score parity, thus for perfect play and it's vicinity komi parity, isn't it?

As I wrote I'm not ruling out anything atm, just openly curious. I'm not sure if this strengthen or weaken the hypothesis, but instead of CGT-like subpoints we can think something like path cardinality (which reminds me). Indeed a weakness of the whole point error model is that it assumes the same distribution in all cases, whereas in some cases reaching the minimax optimal score can be harder (compared to losing 1 pt) depending on how many paths lead there (as lightvector also posted meanwhile). But I'm not sure if this is bad enough since, as a distribution, it includes all cases averaged (the best and the worst as well). Also as I wrote in the other thread I think the weakness of the assumption of independence of per-move errors may be much more significant in practice (and likely affects practical class counts more). The worse the model approximates reality the worse predictions drawn from it will be.

lightvector wrote:
have room for more tiers of players with different "inaccuracy rates" relative to this natural strategic feature of the game, where a player with a lower rate may reliably beat a player with a modestly higher one.
But can we say that he will also reliable outperform it against weaker players (the vast majority of test cases) by at least the amount that a class difference would demand from him?

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #42 Posted: Mon Apr 20, 2020 10:00 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
moha wrote:
Now, after Bill's post (which I will still need to read again) it seems rounding to nearest may not be a good model to start with? Getting the last move is connected to dame parity and territory score parity, thus for perfect play and it's vicinity komi parity, isn't it?


Parity does not affect the chilled go result. which is why I think it will support a potentially countable infinity of skill classes. But for territory and area scoring rounding the chilled go score matters. :)

I have posted a speculative post in reply to lightvector, which may, OC, be way off base. ;)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #43 Posted: Mon Apr 20, 2020 11:06 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
moha wrote:
The reason I felt this argument is unconvincing was that even if you assume the 0.1-0.9 balance from the empty board (which this example seems to do, ie. it's easy to lose with one color but accordingly hard to lose with the other color), this balance changes as long as you play against weaker players who make different and larger errors.


No, 0.1 for the empty board in my example (the one without draws, that would round at the end to either -0.5 or 0.5) corresponds to it being easy to lose or win with *either* color, depending sensitively on the error rates of the players. 0.1 is NOT saying something like "white should win 90% of the time and black should win 10% of the time" - that would be giving one player a huge advantage and be the exact opposite of being "razor's edge". I don't know if you want to spend any more time on this, but if you do, can you try re-reading without this mis-assumption?

Bill Spight wrote:
Taking random in the sense of unknown, even if determined, it is plausible, at first glance, that a chilled go score of 7.2 should be more likely to round down to a territory score of 7 than a score of 7.8. As against that, since in either case it boils down to whose turn it is, which it is plausible is a 50-50 proposition, it is plausible that the closeness to one integer or the other does not matter. Furthermore, when it boils down to a chilled go position worth 7 pts. plus an infinitesimal, it still comes down to whose turn it is; my guess in all of these cases whether it rounds up or down is a 50-50 proposition. It's an empirical question, OC.

IIUC, if it is a 50-50 proposition in all of these cases, then, outside of chilled go the fractional scores come down to chance.

Yep, exactly. If it is sort of "random or unpredictable" and always 50-50, then might not be much room any Elo classes fitting smaller than 1 point so long as that's the case. But alternatively suppose that players near this level would discover some way to judge far better than chance which way the final point goes and to play moves that affect it. Then there could be room for *many* Elo classes fitting smaller than 1 point. Indeed, true optimal players would how to judge it better than 50-50 - they know the exact result - so the big open question is what do natural player populations "in-between" look like (e.g. the players you'd get from AlphaZero iterated to almost to infinity with a wide variety of different random seeds and hyperparameter settings), as you transition from players for whom it's 50-50 to optimal players for whom it's certain? They could behave like moha's model suggests, or like my model suggests, or there could just be too many pathologies/non-transitivities.

Or something else entirely may happen. I have little confidence in any of the above models saying anything sane about how you transition from the player for whom the last point is seemingly random to the optimal player, or how one can be confident enough in any particular model or from trends only observed from current players what will happen at such an extreme, so any behavior seems hard to rule out - there could be only one "class" here, or many.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #44 Posted: Mon Apr 20, 2020 12:09 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Maybe one more attempt, in case my earlier attempts are too hard to make sense of. :)

Let's make up a new 2-player game. The game lasts 200 turns, 100 by each side. On each turn, a computer randomly generates 100 cards and shows them to the current player face down. That player must select a card and turn it over. Each card has a number on its face that is either zero, or some possibly-fractional positive number, (e.g. 7/4, 1/16, 3/256, etc.). At least one of the cards is always generated to be a zero.

The game has a single global score, which starts at exactly 0, and which player 1 wants to be positive, and player 2 wants to be negative. Every time player 1 turns over a card, the number is *subtracted* from the global score. Every time player 2 turns over a card, the number is *added* to the global score - so both players prefer to turn over zeros as much as they can.

The backs of the cards are all decorated with patterns in ways such that any pattern uniquely determines the number on the face, but there are exponentially many possible patterns and the relationship is extremely complex, although to get started that there are at least some basic recognizable elements of the relationship that can let one imperfectly predict the number. Skill in this game consists of learning over time the relationships and becoming increasingly able to identify cards with zeros or low numbers.

At the end of 200 turns, the score determines the winner, by determining the selection of a gridpoint from the grid ..., -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, .... If the score falls exactly on a gridpoint, that gridpoint is chosen. Otherwise, if it falls between two gridpoints:

Variant A: Regardless of how close the score is to one gridpoint or the other, a fair coin is flipped and determines which of the two gridpoints is chosen.

Variant B: One of the two gridpoints is chosen with probability inversely proportional to the distance. For example, a score of 1.3 would cause 1.5 to be chosen 80% of the time, and 0.5 to be chosen 20% of the time.

Variant C: The closer of the two gridpoints is always chosen. If the score is exactly equidistant between two gridpoints, then a fair coin is flipped.

And Player 1 is declared the winner if a positive gridpoint is chosen, player 2 if a negative gridpoint is chosen.

Since this game is non-interactive (actually, it is very slightly interactive if players can see the score so far), any player can be mostly characterized by the final distribution of the sum of numbers they choose averaged over all the possible deals the computer could give them. Suppose that although some players are higher "variance" than others, among players with similar mean sum, the variance of their sum tends to only differ a little in practice.

Exercise: qualitatively determine how variants A,B,C may differ in many Elo there can be "per gridpoint" as players approach levels where they are consistently able to select a set of cards summing up to no more than a small number each game.

Bill's intuition is that chilled scores in Go may behave like A. Seems plausible to me. I think B is closest to Moha's model. But I think C-like behavior could also be possible, as well as a myriad of other possibilities, especially once you add in opponent interaction. Also, in Go, since there in fact is no randomness, as players approach perfection, the game cannot keep behaving like A forever, so something else has to happen.


This post by lightvector was liked by 2 people: marvin, Waylon
Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #45 Posted: Mon Apr 20, 2020 4:52 pm 
Tengen

Posts: 4380
Location: North Carolina
Liked others: 499
Was liked: 733
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 4k
moha wrote:
hyperpape wrote:
Then you've created a series of strategies that are each (at least) 400 points stronger than the previous strategy,
These would be almost identical in strength as strength or Elo is not measured against a single opponent but as average performance against all possible opponents.
I think that's ok. These players are nearly perfect. They'll win against almost all other players. But since they reliably beat each other, each of them should end up >= 400 points better than the previous one. Otherwise, playing against weaker opponents who you always beat would lower your ELO rating.

_________________
Occupy Babel!

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #46 Posted: Mon Apr 20, 2020 6:47 pm 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
I just cannot like examples without draws. I feel this is an unallowable inaccuracy when we talk about perfect go and perfect komi - why look for trouble? One limiting factor of classes and Elo performance may be exactly the fact that there is (may?) a significant prob mass at no win no lose even vs (near) perfect players. That's why I preferred (and referred to) this version:
lightvector wrote:
We can do the same with draws. ... hidden fractional advantage is rounded to the nearest of -2, -1, 0, 1, 2, etc for the final observable outcome in points. Suppose it luckily happens to be the case that naturally the game "starts at" 0.51 fractional advantage - extremely close to 0.5, from white's perspective.
My 0.1-0.9 was also the momentary balance of free room before the next worse rounding (not the nominal fraction) - and OC small mistakes by the opponent can move or even flip the balance suddenly (after they change the minimax solution, so became a whole point error but now with ease to lose back, hard to win more). Unless you round away from zero or similar there is never easy to lose (a minimax point) by both sides I think. (I understand you focused on the flipfloping above.)

One more point to make clearer or stress again (not that I think anybody misunderstood) is that with the example 0.5 player of 0/1 (rarely 2) pt errors I didn't mean a player who exhibits this particular pattern from all positions and viewpoints (even though this could be a frequently observed pattern), but meant a distributed collection of all kinds of players with averages, variances, weaknesses and strengths that (in a kind of normal-ish sense) can be approximated together as summing to this imaginary player.

The last example seem to have returned to a game without draws. Almost impossible to draw before the final transform, but even a nonperfect player can beat perfect play after it. I find it hard to see this as a good model for go with perfect komi, from what we know from smaller boards, solvable positions etc. Could you change it to match a perfect komi game with several potential subpoint (CGT or other) mistakes for both sides but with integer rounding, according to your interpretation?

I'm reluctant to use it as is, but some of my doubts would translate here as: suppose the opponent is weaker, and already moved the count by a significant part of a point in their losing way (the most common occurrence). Our variance is almost negligibly small (you assumed almost constant 0.1 avg mistake - a bit doubtful imo). Can we be sure that we can perform whole classes better or worse from these winning positions than our (near or perfect) neighbours, even if our and their mistakes will almost never amount to anything near a single point, and the final granulation is whole points so some of our tiny advantage over neighbours will surely get canceled?

(Btw to me A doesn't seem to match Bill's chilled go which I interpreted as allowing fractional final scores without any rounding - thus naturally several subpoint classes. But I guess you meant chilled-then-rounded to territory or area.)


Last edited by moha on Mon Apr 20, 2020 10:35 pm, edited 7 times in total.
Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #47 Posted: Mon Apr 20, 2020 6:50 pm 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
hyperpape wrote:
I think that's ok. These players are nearly perfect. They'll win against almost all other players. But since they reliably beat each other, each of them should end up >= 400 points better than the previous one. Otherwise, playing against weaker opponents who you always beat would lower your ELO rating.
I don't think so. Any player can win against any other player except perfect one, but no player can win always (assuming a normal distribution and perfect komi). The difference in that tiny fraction of upset losses matters IMO, exactly because strength and performance is to be considered across all opponents and situations. Simply chain beating is not hard with some tricks about reducing randomness, for example.

I also think that while practical Elo and similar systems often round the rating points that can be gained/lost in a match to integers (so for large differences the stronger side can not gain for a win but lose for an upset), this is not necessary. So in theory you could measure the Elo distance between very far opponents by (extremely long) direct matches, based on comparing exactly that tiny fraction of upset frequency to its expectations, via the proportion of fractional rating points gained vs lost per match (not that this would be a good or reliable thing to try in practice).

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #48 Posted: Mon Apr 20, 2020 10:42 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
moha wrote:
The last example seem to have returned to a game without draws. Almost impossible to draw before the final transform, but even a nonperfect player can beat perfect play after it. I find it hard to see this as a good model for go with perfect komi, from what we know from smaller boards, solvable positions etc. Could you change it to match a perfect komi game with several potential subpoint (CGT or other) mistakes for both sides but with integer rounding, according to your interpretation?


Yep! The game is exactly the same except now the grid is on all the integers (...-2,-1,0,1,2,...) and the game starts at 0.499999 (so with perfect play with both players always flipping zeros on cards, the game is a draw).

The flip flopping that you get is now between wins and draws, rather than wins and losses. But that's fine, in variant C it's still pretty easy to fit multiple "75%" classes within the span of the last point, much more than 2 classes.

Let's ignore whether that many consecutive 9s is realistic as a model for Go - I'll settle for simply having exhibited a game where:
* Mistakes behave cumulatively/additively.
* The final result is a discrete integer or half-integer score in a way that seems on the larger scale to be stochastic and roughly linear in cumulative mistakes.
* The discretization of the score in this way does NOT tightly bound the number of possible "classes", in variant C (not even in the version with draws).
* Variants B and C are very hard to distinguish from only observing data from players whose error distributions are many integers wide, as they are right now for even current superhuman Go bots.
* None of it depends on any special pathologies between particular players or sets of players when matched versus each other.

Even if one doubts this is a great match for Go, because of more Go-specific details besides the above properties, isn't it fascinating that this is all explicitly possible at the same time? :)

As I said before many times, I don't want to try to argue this is a great match, because I don't think it is myself, at least not to this extreme. So I agree the specific numbers are doubtful, they're chosen to just make a point that this is possible in theory... and in a way still consistent with current observational evidence being too coarse to have actually ruled out!

moha wrote:
I'm reluctant to use it as is, but some of my doubts would translate here as: suppose the opponent is a bit weaker, and already moved the count to a significant bit in their losing way (the most common occurrence). Our variance is almost negligibly small (you assumed almost constant 0.1 avg mistake - a bit doubtful imo). Can we be sure that we can perform whole classes better or worse from these winning positions than our (near or perfect) neighbours, even if our and their mistakes will almost never amount to anything near a single point, and the final granulation is whole points so some of our tiny advantage over neighbours will surely get canceled?


Yep! If you take the model at face value. Still yes, but only to a smaller degree (e.g. maybe you only squeeze in an extra 1 class) if you think the behavior is enough closer to B but still maybe could have a small component of C-like behavior, in addition to "noise" of other sorts.

moha wrote:
hyperpape wrote:
I think that's ok. These players are nearly perfect. They'll win against almost all other players. But since they reliably beat each other, each of them should end up >= 400 points better than the previous one. Otherwise, playing against weaker opponents who you always beat would lower your ELO rating.
I don't think so. Any player can win against any other player except perfect one, but no player can win always (assuming a normal distribution and perfect komi). The difference in that tiny fraction of upset losses matters IMO, exactly because strength and performance is to be considered across all opponents and situations.


Oh I didn't realize you wanted to also consider vastly differently-skilled players to also measure how a model corresponds to Elo. I guess I had implicitly assumed that you meant a variety of not-too-distant players (eliminating all rock-paper-scissors effects, but still giving sane differences).

We already know that real life often increasingly deviates from Elo in the tails. And real ranking systems sometimes do use different tail models, and therefore do make orders-of-magnitude different predictions in the extremes from each other! Yet mostly it doesn't matter - there isn't a presumption that the model is supposed to be used for or should precisely match reality for estimating such tiny probabilities.

For what it's worth, the earlier example I had with normally distributed sum of errors will err on the side of the tails being *too thin*. Strong players will win *too often* against weaker players, so the rating difference as measured by strong vs vastly weaker player will actually be *even larger* than with chains of closer players. The opposite of the worry, hopefully making it more obvious that no matter how you slice it, you have more than 2 classes per point, not fewer? But if you wanted precisely Elo like tails, you could posit a logistic distribution, and if you thought that large upsets were actually more common in real life than merely logistic (plausibly true in some activities, deviating from Elo the other way), you could try a t-distribution.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #49 Posted: Tue Apr 21, 2020 11:10 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
lightvector wrote:
The game is exactly the same except now the grid is on all the integers (...-2,-1,0,1,2,...) and the game starts at 0.499999 (so with perfect play with both players always flipping zeros on cards, the game is a draw).
Thanks! I'm not sure how you meant your last comment? It seems variants A and B collapse immediately as it is now possible to beat perfect play, even for nonperfect players (and the game is NOT always draw even between perfect players). Variant C remains.

OC we can still look at A and B in a game theoretical sense, but note that losing the mental crutch of a deterministic minimax solution / score not only cuts ties to any board game but makes it harder to verify the model.

Anyway, let's take a reference player who (ie. the whole subset of playerbase that normally sums to him) is the smallest pointwise distance from perfect play, 1 pt weaker. To demonstrate the possibility of extra subpoint classes we need to find a player between him and perfect play, who is either more than one class away perfomance-wise from both sides, or at least the sum of these two distances is more than 2 classes. Is this possible?

OC we may find cases of chain-beating players. For example P1-P2-P3 may be increasingly stronger by a smaller half of classes, but have styles that play into the weakness of the previous when matched directly, while P3 may still be strong enough to beat P1 (maybe even by a wider than expected margin). But again strength difference does not equal direct results between (or a small subset of) players.

Your almost-constant almost-perfect players did a bit more than this, even allowing to consider performance vs a few more players in the immediate vicinity. But can they demonstrate their class advantage/differences over each other against the -1 pt player?

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #50 Posted: Tue Apr 21, 2020 1:05 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
FWIW, here is an easy game. Not completely irrelevant, I think. :)

There are some stacks of chips on the table; each stack has some number of Red chips or Blue chips. Black plays first. On her turn Black can take one stack of Blue chips off the table, or take the top chip from one stack of Red chips, and White on his turn can take one stack of Red chips or the top chip from a Blue stack. OC, play stops when there are no more chips on the table. Each player's score is the number of chips she has collected from the table. The game can be played with komi or reverse komi.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #51 Posted: Tue Apr 21, 2020 4:52 pm 
Lives in sente

Posts: 1037
Liked others: 0
Was liked: 180
The number of chips in these stacks might be relevant?

In the degenerate case where the stack size is 1, it does not matter which action the player takes as top chip and entire stack the same. And when the game ends, each will have taken exactly the same number.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #52 Posted: Tue Apr 21, 2020 4:57 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
moha wrote:
lightvector wrote:
The game is exactly the same except now the grid is on all the integers (...-2,-1,0,1,2,...) and the game starts at 0.499999 (so with perfect play with both players always flipping zeros on cards, the game is a draw).
Thanks! I'm not sure how you meant your last comment? It seems variants A and B collapse immediately as it is now possible to beat perfect play, even for nonperfect players (and the game is NOT always draw even between perfect players). Variant C remains.

Sure, just focus on C then, ignore A and B.

moha wrote:
But can they demonstrate their class advantage/differences over each other against the -1 pt player?


Oh, I think I see what you're getting at, and why you've been insistent on adding draws. Yes, you're right about this objection. I hadn't considered enough the difference between draws and drawless games, thanks for pushing on this detail. :tmbup:

The issue is no *single* intermediate player who wins strictly more than 75% against both endpoints if you count draws as 1/2, even though there are potentially a very large number of "tiers" of 75%s in between. In that example, if you randomly pick almost *any* sequence of two or more intermediate players between the perfect player and the 1 point, if the standard deviation of performance is not too large for "natural" players, then they will highly likely form a chain that adds up to >= 2 classes of difference end to end. It's not really that the players have particular styles that exploit each other in ways that fail to generalize to most other players, because it would happen for almost any random sequence of natural players. But if you only look at one player alone, then that player is not separated by more than one class from each end.

So I guess what you have here is that by nature of the game itself, the Elo model itself isn't very good here. Depending on exactly what population of players you select, even very generic and varied populations, I agree now the Elo range from fitting a model will be constrained by the fact that players near the ends of the range will still only 75% each other, exactly how constrained and what ratings you get from attempting to fit a model could vary based on many details simply because the model as a whole is a poor fit - there isn't a consistent way to assign ratings here. :)

In the real world, if losses almost never happened beyond a certain level of skill, and wins and draws were about equally split depending on color like this, then in terms of evidence about which player was "stronger", strangely draws (as, say, white) would be almost like losses - drawing as white would be a clear bit of evidence suggesting black was stronger rather than equal, and vice versa. Also, amusingly, adjusting the komi off of the game theoretic optimum by half a point would increase the Elo range greatly (the example still should work fine for non-integer komi), despite none of the game, the strategy, or the players being otherwise any different than before.

So I guess with draws you can try to squeeze in more classes than 2 in this only in a "nontransitive" way if everything else behaves linearly enough. And importantly because it doesn't depend on player's special styles or rock-paper-scissors effects, but rather works for all players, it causes the naive Elo model itself to break down and become a not so great model, for all players.

Thoughts?

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #53 Posted: Tue Apr 21, 2020 6:54 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Mike Novack wrote:
The number of chips in these stacks might be relevant?


Indeed it might. :D

Quote:
In the degenerate case where the stack size is 1, it does not matter which action the player takes as top chip and entire stack the same. And when the game ends, each will have taken exactly the same number.


Yup. :)

BTW, you can chill this game by making the player pay one chip for each move, from what he has taken. In that case the game might end with an odd number of 1 chip stacks on the table. ;) As usual, play in the chilled game is also correct in the unchilled game, with the proviso that you should take that last odd chip. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #54 Posted: Tue Apr 21, 2020 7:51 pm 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
About the drawless version: doesn't variant A and B collapse there as well? Or are you ok with nonperfect players beating perfect play with BOTH colors? And - I didn't realize this at first :) - version C is pure illusion, a swindle. It rounds away from zero, exactly as I mentioned earlier, which is both a logical no-no usually, and in this case a no-op: you are not rounding at all. Game result remains the high res result - and naturally you don't get any class-bounding effect of nonexistent rounding.

Back to the "complete" version with draws: What makes apart a rock-paper-scissors, a chain-beating phenomenon (where the best still beats the worst), either arised naturally or artifically, and real (in any sense) subpoint classes? My first hunch would be to look at the AVERAGE performance vs the COMPLETE playerbase. This is what strength (and Elo) is about. You might want to exclude players too far away to be meaningful - it's ok. You might also want to exclude [-1] as too far - this would raise an eyebrow. But maybe you also want to exclude players around [-0.1] who have unusually high sd? And how would you relate such almost-constant players to a sequence of 100-0 Alphazeros with NN symmetry and other randomization turned off, deterministic players always choosing the same move in a given position? They could still have decent sd (and correct and close Elo if tested vs a rich playerbase), yet may form a natural chain-beating among themselves, from not exploring each other's complete distribution (cf. behavior in all rounding cases).

lightvector wrote:
In the real world, if losses almost never happened beyond a certain level of skill, and wins and draws were about equally split depending on color like this, then in terms of evidence about which player was "stronger", strangely draws (as, say, white) would be almost like losses - drawing as white would be a clear bit of evidence suggesting black was stronger rather than equal, and vice versa.
Yes this is just how I expect the perfect komi game to turn out - some tenth or two of point advantage for W or B, easier/harder to draw like in chess (and maybe differently for area and territory).

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #55 Posted: Tue Apr 21, 2020 11:18 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
moha, I'm actually not sure what you're arguing about any more? I did indeed overlook the effect that draws would have on direct comparisons between further players, so basically I agree with you that with draws seem that they do cap Elo range (cool!), or else if you try to violate that, you still can a little, but seemingly only in a way the Elo model heavily breaks down and simply fails to be a good model at all any more, so in practice there is a cap if you're talking about Elo still being practical and meaningful.

And also I think it's pretty clear now that the drawless version should actually have theoretically no limit to the Elo depth for practical player populations, despite having linear and discrete scores, so long as the game mechanically behaves in a particular way - one that is slightly quirky, but would also appear similarly linear for players with error distributions many points wide. Which also makes a lot of sense too - without draws taking a big chunk of the possiblity space, the line between win/loss can be arbitrarily thin with the right mechanics.

moha wrote:
And - I didn't realize this at first :) - version C is pure illusion, a swindle. It rounds away from zero, exactly as I mentioned earlier, which is both a logical no-no usually, and in this case a no-op: you are not rounding at all. Game result remains the high res result - and naturally you don't get any class-bounding effect of nonexistent rounding.


Not sure what you mean. "Choose the nearest discrete increment to a given continuous value" is definitely a well-defined operation and is certainly not a no-op. And usually people use the word "rounding" to describe this operation?

Quote:
My first hunch would be to look at the AVERAGE performance vs the COMPLETE playerbase. This is what strength (and Elo) is about.

I would normally consider Elo as a useful approximation to the real world. The most basic model is that players have ratings such that the expected win chance between any two players is a logistic in the of the difference in their ratings. Surprisingly often given a population of players, this will produce reasonable rankings and predictions of future results between pairs of players. This is because usually the population of human players, or the population of humans and bots that arise "naturally" via attempts to learn and play the game well, are transitive enough and don't contain "weird" behaviors.

Once have too many pathological players in the population (The player that plays like AlphaZero except it resigns if you open on the 1-1 point, the player that plays uniformly randomly 50% of games and like Lee Sedol 50% of games, the player who is pro level but if they see a distribution of moves that are high-probabiilty for an average human DDK in the first 100 moves, will let them win but otherwise will play their best, etc.), it can easily stop making good predictions, since the model simply is not a reasonable fit for the data any more. And I guess as we found, the structure of a game itself may force the model to be poor, even without specific attempt to create pathological players. You could obviously speculate about what an Elo model would produce if fed games for "all possible probabilistic finite state machines" or some other crazy "complete" population, but I think nobody has any idea whether any particular variant of the Elo model will produce anything sane with such data.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #56 Posted: Tue Apr 21, 2020 11:44 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
@moha - also by the way thanks for the discussion. This was actually pretty fun to think about and explore. And also from playing around with the numbers also have a better quantitative feel for things too now - e.g. the logarithmic way that things need to scale to fit Elos within ranges different numbers of points wide (not just sub-points), some of Bill's thoughts on how chilled scores relate to outcomes in Go, and under what conditions to expect more or fewer "levels of skill" in a game, the way that the sigmoid naively imposed by Elo actually deviates from linear when you try to stack classes, etc. I think I'm pretty satisfied with things at this point.

It might be interesting to try to mathematically work out how a "complete" set of players would behave if one tried to fit an Elo model anyways (e.g. via maximum likelihood), but that at first glance seems really hard to gain any traction on, so if you actually figure out how this could work, I guess it would be interesting. :)

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #57 Posted: Wed Apr 22, 2020 12:27 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
Thanks from me too, and sorry, didn't want to keep arguing (not about the draw version at least), just wrote my thoughts on how I interpret your phenomenon from Elo and other strength metric viewpoint. I also learned new things (also about Bill's chilled scores indeed), and even got a fun idea from your seemingly-narrow variances - will try to post later.

lightvector wrote:
moha wrote:
[About the drawless version:] .. And - I didn't realize this at first :) - version C is pure illusion, a swindle. It rounds away from zero, exactly as I mentioned earlier, which is both a logical no-no usually, and in this case a no-op: you are not rounding at all. Game result remains the high res result - and naturally you don't get any class-bounding effect of nonexistent rounding.
Not sure what you mean. "Choose the nearest discrete increment to a given continuous value" is definitely a well-defined operation and is certainly not a no-op. And usually people use the word "rounding" to describe this operation?
I think it is implied that we take the sign of the final score (komi included) as THE result [-1, 0, +1]. For score rounding to become anything other than no-op, it must differ in some way from no-score-rounding. This means it has to change the sign for at least SOME fractional values. Rounding 0.11 to 0, for example, is an operation with consequences. Rounding 0.22 to 0.5 is not, it is no-op (in this context).

Games that decide by the sign of a fractional, unrounded score won't have limits on number of classes exactly because of the lack of a "smallest scoring unit" from where those bounds could arise from. With normal distributions on a continuous score scale, any performance can be exceeded by arbitrary extent (except 100% which however is unreachable). We know games like this, and drawless/C, which simply passes through the sign of the fractional untouched seems one of them.

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #58 Posted: Wed Apr 22, 2020 7:15 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
moha wrote:
With normal distributions on a continuous score scale, any performance can be exceeded by arbitrary extent (except 100% which however is unreachable). We know games like this, and drawless/C, which simply passes through the sign of the fractional untouched seems one of them.


Yep. The clever "swindle" is to show that in theory one can have observed outcomes matching the outcomes Go has with half-integer komi, that also behave roughly additively and linearly, where so long as the best players so far still cumulatively make large and numerous errors, they can't tell the difference from other ways that don't allow as much Elo depth. You do this by having the discretization be a "no-op" on the sign. And maybe by some chance Go with half-integer komi could have some partial element of this - or not, it's hard to tell right now since the best players aren't good enough. So I think everything is resolved on the object level, I'm happy if you prefer to use different words. ;-)

moha wrote:
even got a fun idea from your seemingly-narrow variances - will try to post later

:tmbup:

Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #59 Posted: Wed Apr 22, 2020 10:21 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
lightvector wrote:
maybe by some chance Go with half-integer komi could have some partial element of this - or not, it's hard to tell
But there IS actual, meaningful rounding in go. As I wrote earlier, there is no real half point komi in integer games. Chinese with 7.5 komi is actually komi 7 with W winning ties. What happens here is that we play the game, THEN the score gets rounded (with ties retaining their prob mass), THEN we add the komi (which is integer), THEN we decide to treat final draws as W wins. Order matters.

If we would decide to round board scores to 100 or 1000 points (even using half point komi which is a hack) the game and the class count would drastically reduce - unlike when we try this in drawless/C, which in reality does no rounding at all.


Last edited by moha on Wed Apr 22, 2020 10:54 am, edited 2 times in total.
Top
 Profile  
 
Offline
 Post subject: Re: "Indefinite improvement" for AlphaZero-like engines
Post #60 Posted: Wed Apr 22, 2020 10:31 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 45
Rank: 2d
Btw doesn't CGT fail here? If I understood Bill correctly a chilled score of 6.8 could be seen as better than 6.7 (and it actually is if we stop chilled), but two chilled scores of 6.6 are the same. But since the rounding direction will matter for territory (an insanely lot at these levels), chilled 6.6 with W to move is different to chilled 6.6 with B to move - exactly what CGT wanted to avoid (with the principle that correct chilled play = correct territory play).

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 80 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group