“Decision: case of using computer assistance in League A”

jlt · **#701**

HermanHiddema · **#702**

At this point, it feels like people will just fit everything into to their personal opinion of what happened.

1. Carlo plays like Leela?
A: He's using Leela
B: He's trained with Leela a lot

2. Carlo does not play like Leela?
A: He's cleverly hiding his use of Leela
B: He's not using Leela

3. Carlo plays a good move?
A: Using Leela!
B: He's just strong.

3a. But not suggested by Leela?
A: Other AI probably
B: See, not using Leela

4. Carlo plays a bad move?
A: Hiding his use of Leela
B: Not using Leela

5. Carlo uses a lot of time?
A: Allowing Leela enough playouts or visiting the toilet or hiding his use of Leela
B: Obviously thinking for himself

6. Carlo uses 2 seconds?
A: Already checked the branch with Leela on an earlier slower move
B: Couldn't have put it in Leela that fast, so not using Leela

At this point, it all seems rather pointless.

If people want to move forward, they need to turn this thing around. Don't fit every data point to your conclusion, fit the conclusion to the data. First describe a method and predict its outcome, then perform the test and see if it fits your prediction.

Want to check the opinion of strong players? Design a good unbiased double-blind way to test their opinion.

E.g: Select a number of strong players, present each player with a collection of random anonymized games (only rank data, no names) with some deliberate cheating games added in, have them give their opinion on whether either player cheated, and have them identify moves they felt were obvious cheating, then see if they agree widely on the same moves in the same games, and check whether those identified moves are from the deliberately cheated games. If they do agree, you've identified a potential test for cheating. If they don't then their opinion is not reliable evidence of cheating.

Bill Spight · **#703**

HermanHiddema wrote:

Want to check the opinion of strong players? Design a good unbiased double-blind way to test their opinion.

E.g: Select a number of strong players, present each player with a collection of random anonymized games (only rank data, no names) with some deliberate cheating games added in, have them give their opinion on whether either player cheated, and have them identify moves they felt were obvious cheating, then see if they agree widely on the same moves in the same games, and check whether those identified moves are from the deliberately cheated games. If they do agree, you've identified a potential test for cheating. If they don't then their opinion is not reliable evidence of cheating.

dfan · **#704**

AlesCieply wrote:

Uberdude · **#705**

So 1 is a 0 point gote move (and taking the liberty doesn't have any threat on black outside), but perhaps could be played by a player thinking it was a 1 point reverse sente move if they didn't see that black h1 is not sente due to the snapback. A 10k could probably find the snapback if the position was posed as a problem though might not in a game, and a stronger player might overlook the snapback in a moment of dumbness or play it if very short of time. Hard to judge from this one move (even pros play the occasional self-atari blunder) but the rest of the shapes on the board look sensible so both players probably dan level. Actually this reminds me of a pair go game I played last year: my 1d partner missed a snapback in middlegame fighting so then in endgame I made an unnecessary defence, but it was only unnecessary if you could see another snapback and I was wary of another case of snapback blindness. That move meant we lost the game by half a point (I knew it was close but hadn't counted that accurately). :sad:

bugsti · **#706**

It seems from Pandanet record that Carlo had 1 minutes left for the next 4 moves (canadian byo style). Maybe Pandanet countdown started to raise an alarm causing that blunder.

Fun fact, the game was already over for him even with a perfect play (lost by 1.5 instead 2.5 or something like that) so it proves nothing but a mistake in byo yomi. I think it's pretty normal if you look at dozens of games of the same player searching for a single huge mistake soon or later you will find one, of any kind, for any player

AlesCieply · **#707**

HermanHiddema wrote:

First describe a method and predict its outcome, then perform the test and see if it fits your prediction.

I believe that's what my analysis does. It generates a histogram of how frequently the player makes a mistake of a given magnitude in his games. If the histograms generated for internet games and regular games played by the same player are compatible the player is not cheating. If there is a significant difference it is an evidence of possible cheating. I have done it for Carlo and a chi2 test on the two histograms says that there is an about 0.0001 chance that the games were played by the same player. Though, the analysis is far from perfect. It remains to be checked how similar histograms generated for other players look like, if analysis performed by a different bot would give the same (or at least similar) result etc. I am still working on it.

AlesCieply · **#708**

bugsti wrote:

It seems from Pandanet record that Carlo had 1 minutes left for the next 4 moves (canadian byo style). Maybe Pandanet countdown started to raise an alarm causing that blunder.

Carlo spent 40s on the move ...

bugsti wrote:

Fun fact, the game was already over for him even with a perfect play (lost by 1.5 instead 2.5 or something like that) so it proves nothing but a mistake in byo yomi. I think it's pretty normal if you look at dozens of games of the same player searching for a single huge mistake soon or later you will find one, of any kind, for any player

Sure, I do not think it proves anything. It just shows that judging a strength of the player from a singular move (or several moves) may be quite misleading.

AlesCieply · **#709**

dfan wrote:

I want to make sure I understand the proposed narrative. Is the hypothesis that Metta played very strongly all game, but then realized after 200 moves of excellent play that it would be too suspicious if he won, so he started playing terrible endgame moves to ensure that he lost?

In fact, Carlo did not play up to his "internet strength" in this game (first one after he was accused of cheating) and neither his opponent did. That's if I am to believe what my mistakes-histogram analysis tells me. Both players have about 50% good moves in the interval from move 31 to 180, much below the average of about 70% Carlo has in his previous internet games.

Bill Spight · **#710**

AlesCieply wrote:

HermanHiddema wrote:

First describe a method and predict its outcome, then perform the test and see if it fits your prediction.

I believe that's what my analysis does. It generates a histogram of how frequently the player makes a mistake of a given magnitude in his games. If the histograms generated for internet games and regular games played by the same player are compatible the player is not cheating. If there is a significant difference it is an evidence of possible cheating. I have done it for Carlo and a chi2 test on the two histograms says that there is an about 0.0001 chance that the games were played by the same player. Though, the analysis is far from perfect. It remains to be checked how similar histograms generated for other players look like, if analysis performed by a different bot would give the same (or at least similar) result etc. I am still working on it.

Let me applaud your research.

I have looked at your results for 8 games, and would like to offer a brief review.

First, you were hampered by the limitations of Leela in evaluating the magnitude of mistakes. It is not only imperfect, it has certain blind spots, so that amateurs can sometimes pick better plays. You could have used a better bot and probably a better method, but you used the tools you had.

Second, some of your cells in your histograms have too few occurrences for a reliable test. That is easy to correct by combining cells. I did so, and it hardly affects the tail probability.

Third, you should have analyzed entire games, and only restricted the range of plays, if you did so at all, based upon your results.

Fourth, you were hampered by the inability to evaluate the difficulty of specific positions. Obviously, a dan player can find the right play for a 10 kyu position without cheating. AFAIK, no one had developed a methodology for such evaluations, so this is not a criticism of you, just a limitation of the research at the present time.

----

Also, FWIW, using your results I did a couple of other comparisons. You found that it was possible to distinguish Metta's online play from his offline play, using a ChiSquare test, and that, based upon Leela's evaluations, he played better in the online games. I compared Metta's online play with the online play of his opponents, and they were not distinguishable by a ChiSquare test, nor was Metta's offline play distinguishable from the offline play of those opponents. From those results we might think that his online opponents played better (according to Leela) than his offline opponents, but their play was not distinguishable, either.

bugsti · **#711**

AlesCieply wrote:

I believe that's what my analysis does.

Nope, it doesn't because you are not blind to Carlo moves and games. As long as you try to shape a new method to prove if Carlo cheated or not your calibration of such a method will suffer from huge bias. You need to broaden your perspective.

AlesCieply wrote:

It generates a histogram of how frequently the player makes a mistake of a given magnitude in his games. If the histograms generated for internet games and regular games played by the same player are compatible the player is not cheating.

Absolutely not. Players can cheat just for few moves without altering their style and strenght too much.

AlesCieply wrote:

If there is a significant difference it is an evidence of possible cheating.

Nope, it's an evidence that he played better in some occasion and worse in some other.

AlesCieply wrote:

I have done it for Carlo and a chi2 test on the two histograms says that there is an about 0.0001 chance that the games were played by the same player.

There is no study and no proves that this method tell us if two games are played by two different players or by the same one.
Any player can play really different games depending on hundred of factors, the biggest factor among all are his opponent moves and strenght.

Again you are not blind to such two histogram. You probably picked two histogram that your senses recognize as different from two games that you already want to prove they are differently played, then you shaped a model that says they are indeed different.

Did you applied this chi2 test to some other random player games? And even if you did, why a chi2 test? there are tons of other possible test. I remember that chi2 test is used to determine whether there is a difference between expected frequencies and observed frequencies. We know the observed frequencies but we completely ignore the expected frequencies for a random 4 dan player (we even ignore them for a single go player ...)

Moreover, chi2 test is usually used to reject independence hypotesis for some random variable. Are Go moves independent random variables? I hardly imagine a more correlated variable of Go players moves :scratch:

Bill Spight · **#712**

bugsti wrote:

AlesCieply wrote:

I believe that's what my analysis does.

Nope, it doesn't because you are not blind to Carlo moves and games. As long as you try to shape a new method to prove if Carlo cheated or not your calibration of such a method will suffer from huge bias. You need to broaden your perspective.

Actually, Ales did his best, with the tools at his disposal, to utilize the methodology that Ken Regan has used for chess. His methodology is not based upon his beliefs about the results he might hope to find. OC, others should review his methodology with his possible bias in mind, and I have done so. We cannot take it for granted that Ales would let his beliefs color his methods. Fortunately, his methods are simple enough that it is easy to familiarize yourself with them.

Quote:

AlesCieply wrote:

It generates a histogram of how frequently the player makes a mistake of a given magnitude in his games. If the histograms generated for internet games and regular games played by the same player are compatible the player is not cheating.

Absolutely not. Players can cheat just for few moves without altering their style and strenght too much.

True, players can cheat in such a way. What Cieply should have said was that in that case there is no evidence that the player is cheating.

Quote:

AlesCieply wrote:

If there is a significant difference it is an evidence of possible cheating.

Nope, it's an evidence that he played better in some occasion and worse in some other.

True. Rejecting a null hypothesis does not mean accepting any particular hypothesis. Cheating is one explanation for playing better online than offline. Thus, it gains some support from Cieply's finding, but that's all.

Quote:

AlesCieply wrote:

I have done it for Carlo and a chi2 test on the two histograms says that there is an about 0.0001 chance that the games were played by the same player.

There is no study and no proves that this method tell us if two games are played by two different players or by the same one.

Yes, two different players may have distinctively different histograms, but that is only one possible explanation for the low tail probability.

Quote:

Again you are not blind to such two histogram. You probably picked two histogram that your senses recognize as different from two games that you already want to prove they are differently played, then you shaped a model that says they are indeed different.

No, he followed in Regan's footsteps to shape his model.

Quote:

Moreover, chi2 test is usually used to reject independence hypotesis for some random variable. Are Go moves independent random variables? I hardly imagine a more correlated variable of Go players moves :scratch:

Generally speaking errors are considered to be random. But we know that go errors are not random. I addressed that question without actually saying so when I mentioned that, unlike Regan, Cieply had no measure of the difficulty of positions. Both the size and probability of errors depends upon how difficult positions are (for the player).

AlesCieply · **#713**

Bill Spight wrote:

I have looked at your results for 8 games, and would like to offer a brief review.
...

Many thanks for your review and for clarifying some points bugsti raised. I agree with your comments, finding debatable only the point about analysis of the whole game. In particular, the first game moves are practically meaningless for the analysis and so are joseki sequences. And as it was already noted here, the analysis loses its merit when one side is already winning so much that even relatively big mistake does not alter the outcome of the game (and does not affect the winrate significantly). So, I prefer to start the analysis "after fuseki" but I am less sure when exactly stop it. Still, your point is a valid concern. I am also not sure to what extend is the analysis affected by the complexity of the in-game positions, definitely, the abilities of the available bots do restrict the research. I have already tried to analyze two games with AQ but found Leela's winrate estimates much more reliable/stable.

bugsti · **#714**

AlesCieply wrote:

I have already tried to analyze two games with AQ but found Leela's winrate estimates much more reliable/stable.

This is not surprising at all. Are you familiar with the term supervised learning? Leela 0.11 is trained from human games (supervised learning), other software learned from scratch, the outcome is completely different in term of stability and human like moves.

Leela is trained to fit human data, it is not surprising that you found more similarity between Leela and human middle dan games than other software. Leela first purpose is to predict human middle dan moves not to find super human move.

Bill Spight · **#715**

AlesCieply wrote:

I have already tried to analyze two games with AQ but found Leela's winrate estimates much more reliable/stable.

bugsti wrote:

Leela is trained to fit human data, it is not surprising that you found more similarity between Leela and human middle dan games than other software. Leela first purpose is to predict human middle dan moves not to find super human move.

As bugsti indicates stability does not mean reliability. Remember the old joke about a stopped clock being right twice a day? Very stable, very unreliable.

In the games you analyzed you found several human moves that were better (according to Leela) than Leela's first choices. One was even estimated as more than 10% better. :shock:

OC, Leela does provide an estimate of the "winrate" of a play, but an imperfect estimate. Since we have tools such as AQ, Leela Zero and Leela Zero Elf that provide better estimates, there is no reason not to use one of them.

Even at chess, humans sometimes find better moves than the superhuman chess engines. How do we know? The evaluation after the human move is better than the evaluation of the engine's top choice.

At this point I think that it is worth talking about precision and accuracy. As the stopped clock illustrates the two are not the same. But our estimates cannot be more accurate than their precision. As you know, we can increase both the accuracy and precision of our evaluations by increasing the number of rollouts. Accuracy is hard to determine, particularly as we do not know exactly what "winrate" means. But we do have a pretty good idea of the precision of our evaluations.

Bojanic's RSGF file for the Metta-Ben David game provides a good example. For move 30 Leela 11 looks at only two options. It's top choice yields a winrate for White of 49.80% with 9818 playouts, its second choice yields a winrate of 42.96% with 1318 playouts. Considering their precision, those estimates would better be presented as 50% ± 1% and 43% ± 3%. That degree of precision is enough for us to be confident that Leela prefers the first play to the second, given the 7% difference. However, it is not enough for us to be confident that the first play is 7% better than the second (according to Leela). For getting precise differences in winrate estimates we require more playouts than we need just to compare two plays. In general we want the error of our estimates to be 1/10 the precision that we report. To reach that precision for these estimates would probably require around 1,000,000 playouts each. (Precision probably increases as the square root of the playouts.) For our purposes we probably do not need that degree of precision. Maybe around 100K playouts for each choice that we compare is good enough.

IIRC, for the top choice to be stable you used 200,000 playouts. That number addresses accuracy as well as precision, since more playouts mean that a larger search tree is being built, IIUC.

For move 31 Bojanic's Leela looked at several plays. It top choice had a winrate of 51% ± 1% with 8221 playouts. Metta played its 7th choice, which had a winrate of 48% ± 3% with 1054 playouts. Since the difference between winrates is only 3%, that isn't really enough to say that the top choice is better than the 7th choice, is it?

Now, when we look at the winrate after :b31:

was played, we get 51.22%. That is clearly wrong. All that happened is that the winrate for the top choice was copied into the winrate for the actual play. Bug in the program. (Not Leela itself, surely.)

But when we look at the estimate for the top choice for :w32:

we get a winrate of 48.5% ± 0.5% with 35153 playouts. That's consistent with the earlier estimate, but more precise. The total number of playouts was 50,908. I do not know how Leela estimates winrates for a position and player to play, if it does at all, but if we take the estimate for the top choice, then we would want around 3 times as many playouts to reach our desired precision. You choice of a total of 200K playouts seems reasonable.

Ben David played Leela's 4th choice, with a winrate (for Metta, not Ben David) of 55% ± 9% with 126 playouts. It seems to have been a blunder, with a loss of 6.5% versus best play, but with an error range of 9%, quien sabe? As I said, Leela (and other bots) are not optimized for evaluation. The winrate for Leela's top choice for :b33:

is 57% ± 1% with 7004 rollouts. So Ben David's play does seem to have been a blunder.

Anyway, one lesson is that we can't just take the difference between the estimates of Leela's top choice and the human's play, if we want to get a desirable precision. We have to make the human's play and evaluate the resulting position. Edit: In fact, for a fair comparison we should probably also make Leela's top choice and recalculate.

Fedya · **#716**

Quote:

Click Here To Show Diagram Code: [go]$$Wc $$ +---------------------------------------+ $$ | . X O O O . . . . . . O . O X . X . X | $$ | . X O X O . . . O X X X O O X X . X . | $$ | X X O X O X . . O . X . X O O X . . X | $$ | O X X X O O O . O X X X O . . O X . X | $$ | O O X X X O . . O . . O . O . O X X . | $$ | . O O . X X O O X X X O . . . O X X . | $$ | . O X X . X O . . . X X O O X O X . X | $$ | . O O X O X O X X . . . X O O . O X . | $$ | . . . O O X X O . . . . X O . O O O . | $$ | . . . O X X X X X , . X X O O X X . . | $$ | O O O X X O X O O X X O X X X X . X . | $$ | O . O O O O X X O . X O . X O O O X . | $$ | X O X X X O O O O . . . O O . . . O . | $$ | X X . . X O . . O O O O . . . . . O . | $$ | . . X X X X O O X O X X O O . O O X . | $$ | X X X O X O X X X X . X X X O O X . X | $$ | X O O O O O O O X . . . . . X X X X . | $$ | O . O O X X . O X . . . . . . . . . . | $$ | . . . . . . O 1 X . . . . . . . . . . | $$ +---------------------------------------+[/go]

What better moves are there? (Seriously, I don't see anything that gains more points or gains points and is sente.)

Edit: I posted before seeing there was another page of replies, and didn't see the snapback until dfan pointed it out. I'm 7k KGS, so there's a data point for you about missing the snapback, although I generally feel better about finding such things in my own games.

Bill Spight · **#717**

AlesCieply wrote:

I agree with your comments, finding debatable only the point about analysis of the whole game. In particular, the first game moves are practically meaningless for the analysis and so are joseki sequences. And as it was already noted here, the analysis loses its merit when one side is already winning so much that even relatively big mistake does not alter the outcome of the game (and does not affect the winrate significantly). So, I prefer to start the analysis "after fuseki" but I am less sure when exactly stop it.

This is actually an easy question to address. The basic rule is simple: Do not discard any data. Now it is true, under some circumstances, that you may decide not to use, or sometimes even to alter data, to reach your conclusions. But you have to make the case, given the data.

For instance, let's look at the early fuseki in the Metta-Ben David game.

Click Here To Show Diagram Code: [go]$$Bc Ben David (W) vs. Metta $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . 1 . . | $$ | . . . 2 . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . 6 . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . 5 . . . . . . . . . . . . . . . . | $$ | . . . , 7 9 . . . , . . . . . , . . . | $$ | . . a 4 8 . 0 . . . . . . . . . 3 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

The first five moves are standard options; we may consider them good enough. :w6:

is a bit lenient, but we are concerned with Metta's play. :b7:

is worthy of comment. Traditionally, tenuki is normal. But Black's choice of joseki is significant. AlphaGo likes :b7:

, even though Black makes a wall without much prospect of attacking :w6:

. (Probably Leela likes it, too, but neither you nor Bojanic analyzed this play.) Certainly :b7:

at "a" would be questionable, since there is no rush for Black to make territory or eyes. I have used the AlphaGo Teaching Tool at https://alphagoteach.deepmind.com for my comments about these plays. In a similar position, with the two Black stones on the 4-4 pts. instead of the 3-3 pts., AlphaGo reckons that "a" would lose almost 6%. Even thought :b7:

and

are preferred, we can ignore them in our analysis, since copying them from AlphaGo or other bots is normal. (Making the case.)

Click Here To Show Diagram Code: [go]$$Bcm11 Mistake? $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . a . . . . . . . . . . . . . X . . | $$ | . . . O . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . O . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . X . . . . . . . . . . . . . . . . | $$ | . . . , X X . . . , . . . . . , . . . | $$ | . . 1 O O . O . . . . . . . . . X . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

But if this 3-3 attachment was a mistake for :b7:

, isn't it a mistake for :b11:

? In the similar position, AlphaGo says that it loses 7% by comparison with the invasion at "a" in the top left corner. The extra loss may be accounted for by the fact that :b7:

and

have already strengthened Black in the lower left, so that Black has even less need to make a base here. An error of around 7%, if that is the result of evaluating this exact position, is significant. But if we on principle ignore the fuseki we will miss it.

Now let us consider :b27:

.

Click Here To Show Diagram Code: [go]$$Wcm26 Black to play, move 27 $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . X O . . . . . . . . . . . . | $$ | . . X X X O . . . . . . . . . . X . . | $$ | . c O O X O . . . , . . . . . , . . . | $$ | . . . b W a . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . O . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . O . . X . . | $$ | . . X . . . . . . . . . . . . . . . . | $$ | . . . X X X O . . , . . . . . , . . . | $$ | . . X O O . O . . . . . . O . . X . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

White has just played :wc:

. Where to play? Waltheri indicates that "a", "b", and "c" are all joseki. But surely we are not indifferent to these choices. Just because Black plays joseki does not mean that we can ignore his choice.

Click Here To Show Diagram Code: [go]$$Bcm27 Joseki deviation $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . X O . . . . . . . . . . . . | $$ | . . X X X O . . . . . . . . . . X . . | $$ | . a O O X O . . . , . . . . . , . . . | $$ | . . . . O 1 3 . . . . . . . . . . . . | $$ | . . . . 2 . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . O . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . O . . X . . | $$ | . . X . . . . . . . . . . . . . . . . | $$ | . . . X X X O . . , . . . . . , . . . | $$ | . . X O O . O . . . . . . O . . X . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

and

are Waltheri's most frequent choices, but :b29:

is not in its database. "a" is its most frequent choice. How bad is :b29:

? With no analysis, we cannot say.

There are likely to be significant choices of plays in the fuseki, even as there are likely to be choices that we can afford to ignore. But we have to consider each choice on its merits. Don't discard data.

AlesCieply · **#718**

I have created a special topic on my analysis, so it might be better to discuss its merits and flaws there. For a comparison, I also intend to include analysis of games played by some other players, not only by Carlo, so the topic may not concentrate exclusively on Carlo's games. For now I have already put there an extended analysis of Carlo's internet games and results of two games analyzed by using AQ.

Bojanic · **#719**

Bill Spight wrote:

Bojanic's RSGF file for the Metta-Ben David game provides a good example. For move 30 Leela 11 looks at only two options. It's top choice yields a winrate for White of 49.80% with 9818 playouts, its second choice yields a winrate of 42.96% with 1318 playouts. Considering their precision, those estimates would better be presented as 50% ± 1% and 43% ± 3%. That degree of precision is enough for us to be confident that Leela prefers the first play to the second, given the 7% difference.

Bill,
this analysis could explain something I noticed in Leela and posted here earlier.
In Metta-David game, move 85 is tenuki, and it 50k analysis it is rated as D choice, although not a bad move since deviation is -1%. However, I noticed that this move was top choice for quite a long during Leela's analysis before forced move w84 was played, and early in the move 85 analysis.
For important moves, simply analyzing moves with 50k, 200k or 1m variations is not good enough to show similarities to program. Leela's choice varies over time. For important moves I followed how it evolves.

Bill Spight · **#720**

Bojanic wrote:

Bill Spight wrote:

Bojanic's RSGF file for the Metta-Ben David game provides a good example. For move 30 Leela 11 looks at only two options. It's top choice yields a winrate for White of 49.80% with 9818 playouts, its second choice yields a winrate of 42.96% with 1318 playouts. Considering their precision, those estimates would better be presented as 50% ± 1% and 43% ± 3%. That degree of precision is enough for us to be confident that Leela prefers the first play to the second, given the 7% difference.

Bill,
this analysis could explain something I noticed in Leela and posted here earlier.
In Metta-David game, move 85 is tenuki, and it 50k analysis it is rated as D choice, although not a bad move since deviation is -1%. However, I noticed that this move was top choice for quite a long during Leela's analysis before forced move w84 was played, and early in the move 85 analysis.
For important moves, simply analyzing moves with 50k, 200k or 1m variations is not good enough to show similarities to program. Leela's choice varies over time. For important moves I followed how it evolves.

Similarity to Leela is not the question Cieply and I are addressing. It's error estimates are.

“Decision: case of using computer assistance in League A”

Who is online