“Decision: case of using computer assistance in League A”

General conversations about Go belong here.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: “Decision: case of using computer assistance in League A

Post by Bill Spight »

BTW, last night I found an interesting approach to detecting online cheaters at chess, by Brendan Norman, at https://www.youtube.com/watch?v=RTfH5gntsug . He notes that run of the mill online chess cheaters mainly do it to boost their egos and to put down their opponents. He uses the lichess computer analysis tool on suspect games, and typically finds that the suspected cheaters make 0 blunders, 0 mistakes, and 0 to few inaccuracies. (These categories are not apparently based upon matches to any single chess engine, nor to matches per se. Norman does not go into the workings of the analysis tool.) Grandmasters do not play that well.

One thing that I find interesting is that Norman will play games against his suspected cheaters and play poorly on purpose, so that his opponent does not have to cheat to win the game. ;) Then his opponent plays picture perfect chess, anyway. :lol: Another fish caught in the net. :D
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
sybob
Lives in gote
Posts: 422
Joined: Thu Oct 02, 2014 1:56 pm
GD Posts: 0
KGS: captslow
Online playing schedule: irregular and by appointment
Has thanked: 269 times
Been thanked: 129 times

Re: “Decision: case of using computer assistance in League A

Post by sybob »

jeromie wrote: That’s only true if you consider the likelihood of cheating in one game to be independent of cheating in other games AND you think there is nothing to learn from a player’s performance in other games. But that’s probably not true.

At the very least, a person’s general level of play adds some important data. If I were to suddenly start beating dan level players on KGS after a long period of stable play as a 3 kyu, you’d have good grounds to be suspicious of my improvement.
That's the difficulty others have already pointed out: the difference between a legal view (what's the accusation, what's the evidence, how hard is the evidence), and the probability/statistical view (what are the chances).
These two views are very hard to reconcile.

Others are better than me to express views on the statistical view. But statistics are very difficult to provide convincing evidence in individual cases. And yes, from a statistical point of view, you want more data and comparisons. But even so, it may not provide conclusive evidence from a legal point of view.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: “Decision: case of using computer assistance in League A

Post by Bill Spight »

sybob wrote:
jeromie wrote: That’s only true if you consider the likelihood of cheating in one game to be independent of cheating in other games AND you think there is nothing to learn from a player’s performance in other games. But that’s probably not true.

At the very least, a person’s general level of play adds some important data. If I were to suddenly start beating dan level players on KGS after a long period of stable play as a 3 kyu, you’d have good grounds to be suspicious of my improvement.
That's the difficulty others have already pointed out: the difference between a legal view (what's the accusation, what's the evidence, how hard is the evidence), and the probability/statistical view (what are the chances).
These two views are very hard to reconcile.
That's not the only difference. The "legal" question is did he cheat? The statistical question that was posed is did he play like Leela? Common sense tells us that if he cheated he probably did so using a bot, so the questions are related. But they are still different questions.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
BlindGroup
Lives in gote
Posts: 388
Joined: Mon Nov 14, 2016 5:27 pm
GD Posts: 0
IGS: 4k
Universal go server handle: BlindGroup
Has thanked: 295 times
Been thanked: 64 times

Re: “Decision: case of using computer assistance in League A

Post by BlindGroup »

Bill Spight wrote:That's not the only difference. The "legal" question is did he cheat? The statistical question that was posed is did he play like Leela? Common sense tells us that if he cheated he probably did so using a bot, so the questions are related. But they are still different questions.
I disagree, and it depends a bit on what you mean by "did he cheat?". If by that you mean "Can we know with certainty through some sort of fact finding process that he used Leela?", I argue that this is an unanswerable question and not a useful way to frame the question. To wit, under any fact pattern in a legal setting, there will always be some grounds for doubting he cheated. They may not be "reasonable" doubts, but they will exist. It's impossible to answer this question with certainty.

I think the answerable question is "Under what circumstances (i.e. under what evidence, patterns of play, outcomes of statistical inferences) are we comfortable concluding that he cheated and operating under that assumption to deliver a punishment." From this perspective the statistical and legal questions are logically isomorphic -- the structure of the decision problem is the same.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: “Decision: case of using computer assistance in League A

Post by Bill Spight »

BlindGroup wrote:
Bill Spight wrote:That's not the only difference. The "legal" question is did he cheat? The statistical question that was posed is did he play like Leela? Common sense tells us that if he cheated he probably did so using a bot, so the questions are related. But they are still different questions.
I disagree, and it depends a bit on what you mean by "did he cheat?". If by that you mean "Can we know with certainty through some sort of fact finding process that he used Leela?", I argue that this is an unanswerable question and not a useful way to frame the question. To wit, under any fact pattern in a legal setting, there will always be some grounds for doubting he cheated. They may not be "reasonable" doubts, but they will exist. It's impossible to answer this question with certainty.
Agreed. :) Edit: To be clear, I stick by what I said, but I agree that we cannot answer that question with certainty.
I think the answerable question is "Under what circumstances (i.e. under what evidence, patterns of play, outcomes of statistical inferences) are we comfortable concluding that he cheated and operating under that assumption to deliver a punishment." From this perspective the statistical and legal questions are logically isomorphic -- the structure of the decision problem is the same.
Depends upon what you mean by statistical.

Take the question, will the sun rise tomorrow? If we do not know about celestial mechanics, that is iffy. Will Phoebus Apollo have a hangover tomorrow and sleep in?

A statistical answer was attempted, based upon historical (i.e., biblical) evidence that the sun had risen every day for 6,000 years or so. Using a Laplacian prior, the probability is near certainty. But based upon knowledge that the earth revolves on its axis, the probability is even closer to 1. Keynes and Good would have been happy to combine astronomical knowledge with statistical knowledge in terms of Bayesian probability. (Maybe not in this particular instance, but generally, utilizing non-statistical knowledge in the prior. Keynes's priors were not necessarily numerical.) Moi, I distinguish between the types of evidence. (As I did in these sentences.) :) For cheating, Regan's physical and behavioral evidence I do not consider to be statistical.

Edit: Also, it is important to distinguish between the question of cheating and the question of whether Carlo played like Leela. As Regan points out, the key statistical question of cheating is whether Carlo played better than his non-cheating self. In this tournament Carlo played less like Leela when he beat stronger players than the one he beat in the game in question. That certainly raises questions about the particular statistical question asked, a certain way of matching Leela's choices, and the question of cheating. If you just say that the statistical and legal questions are isomorphic, you can't ask those questions.

I meant to add, examination of the game record is also important to the question of cheating. In the CIT case I did so and came to the opposite conclusion than that indicated by the statistical evidence alone. In an example of suspected cheating in chess (sorry, I don't have a link right now) examination of a lost game offered a clue. This was not online cheating, but FTF. It was suspected that the player was signaled somehow to make the moves recommended by a chess engine. Then there was this loss after a stupid blunder. We count that as negative evidence of cheating, statistically. However, if the chess piece had been placed on an adjacent square to the one played, he could have won brilliantly. If we assume that he either was sent the wrong signal, or misinterpreted it, the move is forensic evidence of cheating. (Also, how could a player that good make a blunder that bad?) This is reminiscent of the 90 ft. tall man paradox. Good has an example with some biological range and a border. (Are there butterflies on the other side of a political border, as well as on this side? Something like that.)
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: “Decision: case of using computer assistance in League A

Post by Uberdude »

BlindGroup wrote: Uberdude, your taking the time to go through even these 10 games seems to be more than we've seen anyone else doing to systematically assess these decisions. ...
1. As you note a sample size of 10 data points is VERY small. I think even "inept statisticians" would be uncomfortable move forward with only these data.
My worry is that the 10 data points (from 5 games*) is more than the referees looked at, and that they are being inept / lacking rigour (Hanlon's razor applies)! I hope I am wrong, and there's plenty of mathsy analytical types in the Go population who presumably got involved in the investigation which should avoid it, but fear that I may be right. This fear is not assuaged by the fact my question as to if there was a control group was ignored (rather than cheerily answered, "Of course! With 100 games. We've done stats 101.") as were other concerned comments on the facebook thread. We also now learn the report of the investigation "won't be published as long as not all parties agree on it.". I see a few plausible explanations:
1) They (as in league organisers/EGF officials) don't read facebook/L19/reddit so are unaware of the large amount of discussion/opinion/concern. Or off on Easter holidays.
2) They read it but don't care about so ignore us (the chattering classes, not directly involved; though I think as many of us are league participants we are).
3) There was no control group (or absurdly small), so stay silent to avoid admitting they messed up.
4) There was a good-sized control group, but it showed 98% was not significant. Stay silent as above or for some reason e.g. not communicating on unofficial platforms, discuss amongst selves first, thinking silent justice is better than engaging with a raucous community.
5) There was a good-sized control group, and it showed 98% was a significant outlier. But stay silent for reasons above, even though releasing info would placate community. This would mean my results with high %s are a fluke (which I could believe if a large study was released and could be verified, but I'm doubtful).

The Machiavellian streak in me thinks I should accuse my opponent with the 88% match (or whoever I next find with a higher %) of Leela cheating. Even better if I find a 98% from an old season before Leela existed! ;-) The problem of cheating using bots, either real cases or spurious accusations, is unfortunately here to stay and we need to form robust processes for dealing with it. Schayan Hamrah (an Austrian 5d who plays in the league) pointed out that the existing rules of the tournament have too little detail on dealing with bot cheating and this needs to be rectified and agreed by EGF members. I believe this is best done with an open and frank approach, not hiding from scrutiny.

* 20 data from 10 games now! Just did my game vs Victor Chow. And Cornel vs breakfast. And Daniel vs crazy Jonas. And my first pro game, Lee vs Park.
BlindGroup
Lives in gote
Posts: 388
Joined: Mon Nov 14, 2016 5:27 pm
GD Posts: 0
IGS: 4k
Universal go server handle: BlindGroup
Has thanked: 295 times
Been thanked: 64 times

Re: “Decision: case of using computer assistance in League A

Post by BlindGroup »

Bill Spight wrote:Depends upon what you mean by statistical.

Take the question, will the sun rise tomorrow? If we do not know about celestial mechanics, that is iffy. Will Phoebus Apollo have a hangover tomorrow and sleep in?

A statistical answer was attempted, based upon historical (i.e., biblical) evidence that the sun had risen every day for 6,000 years or so. Using a Laplacian prior, the probability is near certainty. But based upon knowledge that the earth revolves on its axis, the probability is even closer to 1. Keynes and Good would have been happy to combine astronomical knowledge with statistical knowledge in terms of Bayesian probability. (Maybe not in this particular instance, but generally, utilizing non-statistical knowledge in the prior. Keynes's priors were not necessarily numerical.) Moi, I distinguish between the types of evidence. (As I did in these sentences.) :) For cheating, Regan's physical and behavioral evidence I do not consider to be statistical.
I think we may be trying to make slightly different points. If I understand you correctly, what you are saying is that you prefer to distinguish between two types of evidence: evidence that is easily quantifiable and evidence that while relevant does not lend itself to mathematical treatment. In our current context, the former would be the kind of analysis that Uberdude is pushing and the latter would be something like finding out that a player had a network connection in their private lavatory or had recently visited sites entitle "How to Cheat at Go". I agree with that. I do not believe in forcing things into mathematical frameworks when it seems unnatural.

My point though is a bit different. Acknowledging that there are both types of evidence, there is a tendency to say, because we can't quantify everything, let's ignore statistics. I'm arguing that is a mistake. Statistics has more to offer than just a quantification tool. Even if it is not possible to calculate actual probabilities for things using statistical formulas, the mathematical properties can still guide us in how to evaluate evidence and set up decision rules even when considering non-statistical evidence. These are things like the inherent trade-off between false convictions and failure to convict the guilty, Uberdude's point that unlikely events do happen, and that we even have to consider whether observed evidence is really rare. (For the latter, if the webpage "How to Cheat at Go" caused a stir, it's possible that many people in the profession may have visited the site just to see it. It is then harder to argue that this suggests cheating.) This is what I meant by the processes being isomorphic -- that the relationships from they hypothesis testing framework can provide a useful guide in reasoning through these issues even if one cannot quantify the data to do formal statistical analysis.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: “Decision: case of using computer assistance in League A

Post by Uberdude »

BlindGroup wrote: [evidence that is easily quantifiable] ... would be the kind of analysis that Uberdude is pushing.
Just to be clear, I agree with Regan/Bill that this kind of statistical evidence (particularly of the rather broad and dumb "count how many moves were in Leela's top 3" rather than e.g. looking for moves which are low policy network probability but Leela likes which he played or other ideas similar to Regan's for chess) is unsatisfactory if it is used to convict on its own (but being an online tournament physical evidence is harder to come by). It could be useful as an automated screening process for all games in an event/server to flag suspicious games for further investigation. The 98% was the only piece of evidence that was publicly released with the announcement of his conviction/punishment. So I want to know how significant an outlier it is. Even if it is significant (at whatever level we choose), I think the suspicious game should be examined further, such as Stanislaw did to see how plausible a skilled human thinks the play is, were there moves Leela liked that the human didn't play, did he make big mistakes according to Leela etc. Comparison to other games played by the accused player is also useful as "this player has been consistently performing above his expected level so we think he is cheating in many of his games, so will find suspicious behaviour in many of them" is an easier proposition to prove beyond reasonable doubt than "this player has been consistently performing above his expected level, but we think he was cheating in only one of them" [and that wasn't a fluke of the comparison statistic: 1 game being a 1 in 100 chance is not surprising in a tournament with over 100 games, but 4 games each of 1 in 100 chance by the same player is much harder to explain innocently].
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: “Decision: case of using computer assistance in League A

Post by Uberdude »

And here's a histogram of the top 3 similarity metric with 24 data points.
Leela similarity histogram.png
Leela similarity histogram.png (7.53 KiB) Viewed 12293 times
Interestingly, the top 1 match % has a flatter distribution, here they are together.
Leela similarity histogram top 1 and 3.png
Leela similarity histogram top 1 and 3.png (11.38 KiB) Viewed 12268 times
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: “Decision: case of using computer assistance in League A

Post by Bill Spight »

Bill Spight wrote:Depends upon what you mean by statistical.

Take the question, will the sun rise tomorrow? If we do not know about celestial mechanics, that is iffy. Will Phoebus Apollo have a hangover tomorrow and sleep in?

A statistical answer was attempted, based upon historical (i.e., biblical) evidence that the sun had risen every day for 6,000 years or so. Using a Laplacian prior, the probability is near certainty. But based upon knowledge that the earth revolves on its axis, the probability is even closer to 1. Keynes and Good would have been happy to combine astronomical knowledge with statistical knowledge in terms of Bayesian probability. (Maybe not in this particular instance, but generally, utilizing non-statistical knowledge in the prior. Keynes's priors were not necessarily numerical.) Moi, I distinguish between the types of evidence. (As I did in these sentences.) :) For cheating, Regan's physical and behavioral evidence I do not consider to be statistical.
BlindGroup wrote:I think we may be trying to make slightly different points. If I understand you correctly, what you are saying is that you prefer to distinguish between two types of evidence: evidence that is easily quantifiable and evidence that while relevant does not lend itself to mathematical treatment.
The social sciences distinguish between quantitative and qualitative evidence, and today a good bit of research involves "triangulation", i.e., a combination of both. The current replication crisis in the social sciences comes in part from a realization that in the past too much weight was given to statistical evidence alone. Rejecting a null hypothesis is disconfirmatory, but that is only confirmatory for any other hypothesis. It is hardly surprising that results based upon weak evidence are not replicated.

A good example of that -- not, I repeat, not -- an example of social science research comes from a Science and Consciousness talk I went to back in the 1990s at the University of California in San Francisco. A mathematician had made a study of a psychokinesis experiment at Princeton ( :shock: ) and found that the data were very close to a normal distribution (p << 0.001), among other findings, which he took to be indicative of ESP. One physicist stood up and roundly criticized the mathematician's conclusions on the basis of physical theory. As a Bayesian, I was not terribly concerned about the fact that the guy had obviously gone looking for a low p values which had not been specified beforehand. He had found a good one. :mrgreen: However, I did not take it as evidence for ESP, but as evidence that the data had been faked. :D
My point though is a bit different. Acknowledging that there are both types of evidence, there is a tendency to say, because we can't quantify everything, let's ignore statistics.
My experience is the opposite, at least among those trying to do science. Maybe we run with different crowds. :)
I'm arguing that is a mistake.
I agree. :D
Statistics has more to offer than just a quantification tool. Even if it is not possible to calculate actual probabilities for things using statistical formulas, the mathematical properties can still guide us in how to evaluate evidence and set up decision rules even when considering non-statistical evidence.
I agree, as well. :)

But confirmatory statistics about 50 possible matches in one game is not good statistical evidence. It may be good enough to raise suspicions and invite the collection of further evidence, but that's all.

Uberdude did go looking for further evidence, including the matches to Leela's choices in other games that Carlo won in the same tournament. Those games were against stronger players than Carlo's opponent in the game in question and had lower numbers of matches than that game. To me, those results cast further doubt upon the assertion that Carlo had been cheating.

Let me go back to the ESP research. The mathematician had no theory as to why a close fit of the data to a normal distribution would indicate ESP. It just did. I, OTOH, had a good theory as to why that close fit would indicate faking the data. It is well known that a large amount of data usually conform to a normal distribution, so if you are faking it, you want the fake data to conform, as well. The question of too good a fit was not a concern to the faker or fakers, because who -- except maybe a crank mathematician -- would test that goodness of fit? :lol:

Based upon online cheating at chess (outside of tournaments) it seems like a lot of that involves using a chess engine to pick the plays. Because the top plays fluctuate as the engine does its calculations, and because different engines might differ slightly in choice of plays, one among the 3 top choices, as long as it is not too bad, will produce nearly a 100% match. Perhaps that is where the idea of using a match to the 3 top choices comes from.

Suppose we accept that theory. Then Carlo's moves in the games against the stronger players should also show a nearly 100% match. They don't. So what do we say about that? Carlo chose to cheat against a 4 dan, but not to cheat against 6 dans?

There is an analogy to Rasch testing here. In Rasch testing if a test taker does better on harder questions than easier questions, it may be that the meaning of some of those questions is different for that person than for others. Games against 6 dans are like harder questions, a game against a 4 dan is like an easier question. If any theory explains the matching results, how can it be to cheat by playing Leela's choices against the 4 dan but not against the 6 dans? OC, an explanation may be possible, but one has not been given.
Last edited by Bill Spight on Sun Apr 08, 2018 4:01 pm, edited 2 times in total.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: “Decision: case of using computer assistance in League A

Post by Bill Spight »

Uberdude wrote:And here's a histogram of the top 3 similarity metric with 24 data points.
What are the data points? Thanks.

Edit: You also show a histogram of matching Leela's top choice.

Now, with 24 data points, we can, even without knowing the underlying distribution, take the game and player with the highest number of matches and say, in this game the player play like Leela on moves 50 - 149. Both the top choice and top 3 choices are for Metta vs. Reem. If we do this for every tournament, we can take a look at the top matching 5% of games or so. That may be a reasonable thing to do, but concluding without further evidence that the player cheated is not reasonable.

Edited for accuracy. :)
Last edited by Bill Spight on Sun Apr 08, 2018 10:40 pm, edited 1 time in total.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: “Decision: case of using computer assistance in League A

Post by Uberdude »

Bill Spight wrote: What are the data points? Thanks.

Code: Select all

+-----------------+------+----------------+------+---------+---------+---------+---------+
|      Black      | Rank |     White      | Rank | B top 3 | W top 3 | B top 1 | W top 1 |
+-----------------+------+----------------+------+---------+---------+---------+---------+
| [Carlo Metta]   |  4d  | Reem Ben David |  4d  |    * 98 |      80 |    * 72 |      54 |   http://pandanet-igs.com/system/sgfs/6374/original/WWIWTFDSGS.sgf
| Andrey Kulkov   |  6d  | [Carlo Metta]  |  4d  |      80 |    * 86 |      68 |    * 62 |   http://pandanet-igs.com/system/sgfs/6314/original/AMTRMFSDAB.sgf
| Dragos Bajenaru |  6d  | [Carlo Metta]  |  4d  |      74 |    * 78 |      50 |    * 60 |   http://pandanet-igs.com/system/sgfs/6354/original/JRZPCWSANY.sgf
| [Andrew Simons] |  4d  | Jostein Flood  |  3d  |      80 |      88 |      54 |      62 |   http://pandanet-igs.com/system/sgfs/6612/original/XSJUGZZTOX.sgf
| Geert Groenen   |  5d  | [Daniel Hu]    |  4d  |      74 |      66 |      40 |      46 |   http://britgo.org/files/pandanet2016/mathmo-GGroenen-2017-01-10.sgf
| [Ilya Shikshin] |  1p  | Artem Kachan.  |  1p  |      56 |      76 |      38 |      60 |   http://pandanet-igs.com/system/sgfs/6384/original/RYSGTEGMXT.sgf
| [Andrew Simons] |  4d  | Victor Chow    |  7d  |      84 |      76 |      44 |      44 |   http://britgo.org/files/pandanet2014/RoseDuke-Egmump-2015-01-13.sgf
| Cornel Burzo    |  6d  | [A. Dinerstein]|  3p  |      74 |      66 |      40 |      48 |   http://pandanet-igs.com/system/sgfs/6349/original/SCNSFSJXTI.sgf
| Jonas Welticke  |  6d  | [Daniel Hu]    |  4d  |      54 |      64 |      34 |      42 |   http://britgo.org/files/pandanet2017/mathmo-iryumika-2017-12-12.sgf
| [Park Junghwan] |  9p  | Lee Sedol      |  9p  |      74 |      64 |      64 |      38 |   http://www.go4go.net/go/games/sgfview/68053
| Lothar Spiegel  |  5d  | [Daniel Hu]    |  4d  |      66 |      58 |      48 |      42 |   http://britgo.org/files/pandanet2016/mathmo-Mekanik-2017-04-25.sgf
| Gilles v.Eeden  |  6d  | [Viktor Lin]   |  6d  |      82 |      70 |      56 |      46 |   http://pandanet-igs.com/system/sgfs/6616/original/FMKVQBHBBV.sgf
+-----------------+------+----------------+------+---------+---------+---------+---------+
Some notes on recent games.
- Ilya Shikshin 1p vs Artem Kachanovskyi 1p. These players are quite possibly stronger than Leela 0.11 on 50k nodes. So not matching could mean they are playing better rather than worse moves than Leela. As expected the more territorial and orthodox Artem was more similar than creative fighter Ilya. This was also, I think, the first game I analysed to feature a ko (which makes a lot of obvious matches for taking the ko, but also threats can differ).
- my game vs Victor Chow 7d from a few years ago as another example of a weaker player scoring an upset against a stronger one with a solid style. I played well in the opening and middlegame and got a good lead (but only won by half a point when he turned on super endgame and I was under time pressure, after move 150). For over 50 moves of the game Leela really wanted me to invade the left side at c7, which I was aware of but as I was leading against a 7d I knew was a strong fighter I didn't invade there to avoid complications I would well mess up. This was responsible for a lot of my failed matches with Leela's top 1 (often still top 3, but a few times not), plus of course some straight out mistakes from both of us.
- Cornel Burzo 6d vs Alexander Dinerstein 3p. Cornel has an elegant honte style, whilst Dinerstein is territorial and lead the whole time with a territory lead and ways into Cornel's flaky centre. As with Kulkov and Groenen games the player with highest top 3 match wasn't the same as with highest top 1 match.
- Daniel vs Jonas Welticke. Jonas is known for crazy openings and weird style, which he did here opening on the sides, only 25% win after 50 moves. As expected his wacky moves didn't match much. Daniel played solidly and matched a lot, except Leela got confused by a simple semeai so wanted to be stupid. Also despite having won the semeai already, in calm positions Leela wanted to keep playing the semeai rather than some profitable move elsewhere (but Daniel was winning so much maybe he could essentially pass and still win).
- First (non EGF) pro game. My expectation was pros might score lower matching against Leela than us mid-high amateur dans as they are much stronger and could be playing unexpected better moves. I chose Park Junghwan and Lee Sedol's last game at some festival. Park is a fairly conventional player, whilst Lee is more creative, so I expected Park to match more. Park did match more, but they were both similar to us amateurs. Maybe Leela is stronger than I realised. Leela did not expect the moves which made me feel "Wow, cool pro moves" (often tenuki), but she did better than I did (with brief thinking) in predicting the contact fighting.
- Another of Daniel's from last year, vs Lothar Spiegel 5d from Austria who is a fairly sensible player. Lots of matching during long but joseki-ish middle game invasions, but also misses from mistakes and also both players overlooking important sente exchange for a while (f11/g10).
- Gilles van Eeden 6d (classic good shape Dutch 6d) vs Viktor Lin 6d. Most mismatches were due to a ko fight, and a few disagreements in early yose. Going into yose Leela gave Gilles 77% win, but this looks like a misunderstanding of his dead group at top left: if I played out a few more moves to make it clearly dead then the win% collapsed to 57%. In the end he lost by 2.5.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: “Decision: case of using computer assistance in League A

Post by Bill Spight »

Thanks, Uberdude. So can we say that these are informally chosen recent games?
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: “Decision: case of using computer assistance in League A

Post by Uberdude »

Bill Spight wrote:Thanks, Uberdude. So can we say that these are informally chosen recent games?
Yes, not randomly, but with criteria such as
- Daniel's 2 games (Groenen, Spiegel) from last season during his Leela period (thought they might have high match, but didn't)
- mine vs Victor for lower ranked upset (and I wanted to see Leela's opinion), few years ago, remainder of games are from this season of the league
- vs Jonas because crazy style
- Ilya vs Artem for classic match up of top Europeans with contrasting styles
- Cornel vs Dinerstein another top 2 Europeans
- van Eeeden vs Lin recent 6d game in league B
- Park vs Lee for top pros, chose their most recent game

I checked the games didn't end in an early resign before analysing (discared le Calve vs Bajenaru for this reason). I think I should do some 5d games next.

Edit: I attached the spreadsheet so Javaness can make the chart hot pink or whatever is his favourite colour.
Attachments
Leela similarity analysis.xlsx
(120 KiB) Downloaded 420 times
Javaness2
Gosei
Posts: 1545
Joined: Tue Jul 19, 2011 10:48 am
GD Posts: 0
Has thanked: 111 times
Been thanked: 322 times
Contact:

Re: “Decision: case of using computer assistance in League A

Post by Javaness2 »

Might not a metric such as "Average distance from Leela's Goodness Value for its first choice move" be a more interesting metric?
Also, can the chart please use a different colour than blue.
Post Reply