Derived Metrics for the Game of Go

Bill Spight · **#21**

NordicGoDojo wrote:

I already have a number of 'cheater profiles' in my mind, but not in written form, and I should get around to classifying and describing them better. I think I should omit writing anything hastily here for now, but I expect to get back on the subject later (at the latest in a future paper).

Anecdotally, I can say that 'I have been studying AI a lot these past few months and that is why my play looks similar' is the most common counterargument I hear when I accuse somebody of cheating – it comes up almost invariably.

I remember the flap some year ago over a cheating allegation that a player had used Leela11 to cheat. There were real problems with the evidence of cheating. For one thing the accusers reported only plays between move 50 and move 150, for no apparent reason. That raises the question of cherry picking their data. For another, their main evidence was plays matching one of the top three of Leela11's choices. One thing that lay people do not understand, without having been taught, is that confirmatory evidence is weak. In fact, it is very weak. So much so that Popper discounts it entirely.

Obviously, one way of cheating is to copy a bot's play, and matching several of its plays can raise suspicions of cheating. But suspicion is only suspicion. OC, it could justify more investigation. In the case I am talking about, the accusers broadened their search for evidence of cheating to include matching Leela11's second choices. That raises suspicions about the accusers. They have switched their theory of cheating. (OC, they may, for reasons unknown, started with looking for more matches than matches to Leela's top choices. But bear with me, please.) Their theory now is that the cheater sometimes played the top choice, sometimes the second choice. Well, that is a possible way to cheat, OC. But two things happen by broadening what is considered to be confirmatory evidence. First, the weak evidence is made even weaker. Second, by being able to match more moves, the evidence of cheating is made to sound more convincing. Instead of reporting, say, 60% matches, you can report, say, 90% matches. And by including matches to Leela's third choices, as they also did, you weaken the evidence further, but make it sound more convincing by reporting, say, 95% matches. (I don't remember the exact number of matches reported in the accusation, but you get the general picture. Weaker evidence that sounds better.) Now, I am not saying that the accusers purposely presented misleading evidence, but I do think that their investigation was biased by looking for evidence of cheating. The right approach is, once your suspicions have been aroused, to look for evidence that the player is not cheating. You try to disprove your theory. Such an investigation may provide further evidence of cheating, OC.

This does not mean that matching a bot's top choices may not be good evidence of cheating. It simply must be disconfirmatory evidence. One avenue that may be helpful came to light in a discussion here some time back. It turns out, at least preliminarily, that any two of today's top bots match each other's top choices around 80% of the time — IIRC, this was fairly early in the game. The term for that is a concordance rate of 80%. Research with several bots over many, many games could produce a good estimate, along with probability distributions and error estimates. Now suppose that a suspected cheater's plays matched a particular bot 95% of the time, while several other top bots each matched that bot's plays around 80% of the time. That would be strong evidence that the player was matching that bot's plays. Not because of the matching, per se, but because of the difference between the player's percentage of matching and that of the other bots. Confirmatory evidence is weak. Disconfirmatory evidence is strong.

Edit: This illustrates why loose matching to any of a bot's three top choices is a bad idea. For such matching, the concordance rate between any two of today's top bots would approach 100%. Even if a player's plays matched one of a bot's three top choices at a very high rate, the difference between the player's matching rate and that of another top bot would be too small to find a significant difference.

NordicGoDojo · **#22**

Bill Spight wrote:

May I strongly suggest that you add an empirical researcher to your team? Without empirical validation it is easy to go astray.

I've been wanting to blind-test my model for a long time, and that is why I recently made the initial test with 5 AI v. AI games, 5 human v. human games, and 10 AlphaGo v. human games (reaching a hit rate of 87.5%, with one false positive and four false negatives). The problem is that proper material for blind testing is extremely difficult (or expensive) to gather: we need to have cheaters who confess their cheating, we need to be able to trust their confession (or else have proof of their cheating), and we also need proof that the non-cheating players really did not cheat.

Bill Spight wrote:

For one thing the accusers reported only plays between move 50 and move 150, for no apparent reason.

The Hawkeye program on the Yike server does something similar. I understand the rationale is that (stronger) humans can memorise opening moves and count accurately in the endgame to avoid larger mistakes, meaning that middle game content should be the most informative for identifying cheaters. My own experience generally suggests the same, although from time to time there can be useful material in the opening and the endgame as well.

Bill Spight wrote:

In the case I am talking about, the accusers broadened their search for evidence of cheating to include matching Leela11's second choices. That raises suspicions about the accusers. They have switched their theory of cheating. (OC, they may, for reasons unknown, started with looking for more matches than matches to Leela's top choices. But bear with me, please.) Their theory now is that the cheater sometimes played the top choice, sometimes the second choice. Well, that is a possible way to cheat, OC.

One major challenge at the time was that online cheating as a phenomenon was new, and there was nothing nearing a 'best established practice' for cheat detection. Checking how consistently a player's moves rate within an AI's top candidates is an easy idea to come up with – besides this case, it is also used in the Yike Hawkeye program.

As per my understanding of what happened, the accused actually changed their play after the accusation; when before they tended to pick the AI's top suggestion, and that was pointed out, then they started opting for lower candidates instead, which the accuser then pointed out. If this description is accurate, surely the change increases, rather than decreases, the likelihood of the accused cheating. However, my understanding is based on hearsay from people not directly involved in the case, so I will restrict myself to the hypothetical.

Bill Spight wrote:

But two things happen by broadening what is considered to be confirmatory evidence. First, the weak evidence is made even weaker. Second, by being able to match more moves, the evidence of cheating is made to sound more convincing. Instead of reporting, say, 60% matches, you can report, say, 90% matches. And by including matches to Leela's third choices, as they also did, you weaken the evidence further, but make it sound more convincing by reporting, say, 95% matches. (I don't remember the exact number of matches reported in the accusation, but you get the general picture. Weaker evidence that sounds better.) Now, I am not saying that the accusers purposely presented misleading evidence, but I do think that their investigation was biased by looking for evidence of cheating. The right approach is, once your suspicions have been aroused, to look for evidence that the player is not cheating. You try to disprove your theory. Such an investigation may provide further evidence of cheating, OC.

A big challenge in the whole field of cheat detection is that it is essentially an endless cat-and-mouse game: when you come up with a model that catches a high ratio of cheaters with a small ratio of false positives and make it publicly known, the smart cheaters will adjust their play so that the model doesn't find them. Presumably for this observer effect, chess servers such as chess.org don't publicly state how exactly their cheat detection mechanism works.

If you monitor concordance with the AI's top moves, smart cheaters start avoiding them. If you monitor the development of the winrate, smart cheaters start controlling the 'story' of the game that it is not straightforward, but that it seems they are losing at one or several parts of the game. If you monitor the scoremean, smart cheaters start playing suboptimally, so that they perform just a little bit better than the opponent. And so on, and so on.

I am aware that the above can be interpreted as 'all players being guilty until disproven', so I would like to stress that this is completely not how I operate. I always assume that an alleged cheater is innocent until shown otherwise, and for that a good number of 'cheat filters' have to come off as positive. (E.g., straightforward win – check; unexpectedly high quality of the game – check; concordance with AI's recommendations at key points – check; etc.) Unfortunately, at this point the model relies on my human 'gut feeling' in the final call, rather than, say, an accurate probability distribution with confidence thresholds.

Bill Spight wrote:

Research with several bots over many, many games could produce a good estimate, along with probability distributions and error estimates. Now suppose that a suspected cheater's plays matched a particular bot 95% of the time, while several other top bots each matched that bot's plays around 80% of the time. That would be strong evidence that the player was matching that bot's plays. Not because of the matching, per se, but because of the difference between the player's percentage of matching and that of the other bots. Confirmatory evidence is weak. Disconfirmatory evidence is strong.

Edit: This illustrates why loose matching to any of a bot's three top choices is a bad idea. For such matching, the concordance rate between any two of today's top bots would approach 100%. Even if a player's plays matched one of a bot's three top choices at a very high rate, the difference between the player's matching rate and that of another top bot would be too small to find a significant difference.

As far as I know, top pros already have concordance rates of above 80% in Hawkeye, and a European amateur player was recently suspected for reaching 76%. Personally I see too many potential problems in this model to rely on it in my own analysis – it makes for a singular, weak 'cheat filter' at best.

Bill Spight · **#23**

NordicGoDojo wrote:

Bill Spight wrote:

May I strongly suggest that you add an empirical researcher to your team? Without empirical validation it is easy to go astray.

I've been wanting to blind-test my model for a long time, and that is why I recently made the initial test with 5 AI v. AI games, 5 human v. human games, and 10 AlphaGo v. human games (reaching a hit rate of 87.5%, with one false positive and four false negatives). The problem is that proper material for blind testing is extremely difficult (or expensive) to gather: we need to have cheaters who confess their cheating, we need to be able to trust their confession (or else have proof of their cheating), and we also need proof that the non-cheating players really did not cheat.

I understand the difficulties. The questions you raise are ones that an experienced empirical researcher can address.

Now, we have a lot of data of strong human players not using AI to cheat, going back before AlphaGo. In addition, thanks to the Elf team and GoGoD, we have a lot of data of differences between human play back then and modern bot play. Yes, humans are learning a lot from the bots, and will continue to do so until the law of diminishing returns kicks it. Which might take a decade or two. That amplifies the problems involved. And, OC, you don't have to stick to the Elf data, you can use KataGo and other bots for analysis, as well. It is just that there is a lot of data already available off the shelf. How much and in what ways human play has changed in the AI era is an important question, but it is important to establish an empirical baseline against which to measure changes.

NordicGoDojo wrote:

Bill Spight wrote:

For one thing the accusers reported only plays between move 50 and move 150, for no apparent reason.

The Hawkeye program on the Yike server does something similar. I understand the rationale is that (stronger) humans can memorise opening moves and count accurately in the endgame to avoid larger mistakes, meaning that middle game content should be the most informative for identifying cheaters. My own experience generally suggests the same, although from time to time there can be useful material in the opening and the endgame as well.

Maybe so, but that is an untested hypothesis. Now, there may be good evidence for that in chess, for human play versus that of pre-neural net chess engines. But it is not a good methodology. As one of my profs stressed, don't throw away data. (Yes, you may have to deal with outliers, but we are not talking about that now. Besides, you deal with them, you don't just throw them out.) In the case in question, the suspected cheater had, according to Leela, taken a 70-30 lead by move 50, up from perhaps 50-50 or 45-55 or so. By move 180 or so, when the game ended, Leela gave his lead as 85-15. Even if you started off looking at move 50 and later, in percentage terms most of the player's advantage had already accrued, and in half as many plays or less. Say what you will, in that game his best play was already behind him. Wouldn't that be a good place to look for cheating?

NordicGoDojo wrote:

Bill Spight wrote:

In the case I am talking about, the accusers broadened their search for evidence of cheating to include matching Leela11's second choices. That raises suspicions about the accusers. They have switched their theory of cheating. (OC, they may, for reasons unknown, started with looking for more matches than matches to Leela's top choices. But bear with me, please.) Their theory now is that the cheater sometimes played the top choice, sometimes the second choice. Well, that is a possible way to cheat, OC.

One major challenge at the time was that online cheating as a phenomenon was new, and there was nothing nearing a 'best established practice' for cheat detection.

Well, yes, the accusers were feeling their way around. But that didn't stop a lot of people from drawing very strong conclusions.

Let me say that Ales Cieply did some very careful and meticulous analysis.

NordicGoDojo wrote:

Bill Spight wrote:

But two things happen by broadening what is considered to be confirmatory evidence. First, the weak evidence is made even weaker. Second, by being able to match more moves, the evidence of cheating is made to sound more convincing. Instead of reporting, say, 60% matches, you can report, say, 90% matches. And by including matches to Leela's third choices, as they also did, you weaken the evidence further, but make it sound more convincing by reporting, say, 95% matches. (I don't remember the exact number of matches reported in the accusation, but you get the general picture. Weaker evidence that sounds better.) Now, I am not saying that the accusers purposely presented misleading evidence, but I do think that their investigation was biased by looking for evidence of cheating. The right approach is, once your suspicions have been aroused, to look for evidence that the player is not cheating. You try to disprove your theory. Such an investigation may provide further evidence of cheating, OC.

A big challenge in the whole field of cheat detection is that it is essentially an endless cat-and-mouse game: when you come up with a model that catches a high ratio of cheaters with a small ratio of false positives and make it publicly known, the smart cheaters will adjust their play so that the model doesn't find them. Presumably for this observer effect, chess servers such as chess.org don't publicly state how exactly their cheat detection mechanism works.

Yes, there is a cat and mouse game. As John Fairbairn has pointed out, you need to establish a penumbra around cheating so that certain things that non-cheaters do may be disallowed, and certain things that non-cheaters do not currently do must be required. Honest players need to bend over backwards to avoid the appearance of cheating. Such is life.

But if cheat detection is essentially the cat and mouse game, the atmosphere is already poisoned. You see this in the world of espionage, where there is very little in the way of proof, and you never know whether you are being paranoid enough.

Quote:

I am aware that the above can be interpreted as 'all players being guilty until disproven', so I would like to stress that this is completely not how I operate.

I didn't think so.

We need people who are immersed in the cat and mouse game. But that immersion perforce tends to produce a suspicious mind set, which may be necessary to play that game well. This is why we have different people assess the evidence in the end. That's also why we develop empirical methods of testing the evidence.

NordicGoDojo wrote:

Bill Spight wrote:

Research with several bots over many, many games could produce a good estimate, along with probability distributions and error estimates. Now suppose that a suspected cheater's plays matched a particular bot 95% of the time, while several other top bots each matched that bot's plays around 80% of the time. That would be strong evidence that the player was matching that bot's plays. Not because of the matching, per se, but because of the difference between the player's percentage of matching and that of the other bots. Confirmatory evidence is weak. Disconfirmatory evidence is strong.

As far as I know, top pros already have concordance rates of above 80% in Hawkeye, and a European amateur player was recently suspected for reaching 76%. Personally I see too many potential problems in this model to rely on it in my own analysis – it makes for a singular, weak 'cheat filter' at best.

The point is that by establishing an empirical baseline we can turn weak evidence into strong evidence. Not because the evidence matches our suspicions, but because it differs from the baseline. Confirmatory evidence is weak, disconfirmatory evidence is strong.

RobertJasiek · **#24**

If you lack (non-)cheating games, how about emulating them?

Anti-cheating means are weak if they have to be kept secret to work well. They must be open and survive attempts of cheaters' adapted styles.

RobertJasiek · **#25**

Bill Spight wrote:

certain things that non-cheaters do may be disallowed

Such as entering a tournament hall without passing a metal detector?

Surely we must not prohibit any legal move.

Quote:

Confirmatory evidence is weak, disconfirmatory evidence is strong.

You keep repeating this gospel but please explain the theory why, IYO, this necessarily must be so, must it?

NordicGoDojo · **#26**

Bill Spight wrote:

Now, we have a lot of data of strong human players not using AI to cheat, going back before AlphaGo. In addition, thanks to the Elf team and GoGoD, we have a lot of data of differences between human play back then and modern bot play. Yes, humans are learning a lot from the bots, and will continue to do so until the law of diminishing returns kicks it. Which might take a decade or two. That amplifies the problems involved. And, OC, you don't have to stick to the Elf data, you can use KataGo and other bots for analysis, as well. It is just that there is a lot of data already available off the shelf. How much and in what ways human play has changed in the AI era is an important question, but it is important to establish an empirical baseline against which to measure changes.

This is part of the research plan we have established with Mr Egri-Nagy, but for now I personally don't know how useful for cheat detection the results can be. In my experience, it is not difficult to tell apart humans from AIs (with the exception of AI-savvy top humans, as shown by the false positive I got for a Ke Jie game – but luckily such players are usually not the target group), but the difficulties start when a player starts deliberately mixing human and AI play.

Bill Spight wrote:

Maybe so, but that is an untested hypothesis. Now, there may be good evidence for that in chess, for human play versus that of pre-neural net chess engines. But it is not a good methodology. As one of my profs stressed, don't throw away data. (Yes, you may have to deal with outliers, but we are not talking about that now. Besides, you deal with them, you don't just throw them out.)

I can completely agree with this.

Bill Spight wrote:

In the case in question, the suspected cheater had, according to Leela, taken a 70-30 lead by move 50, up from perhaps 50-50 or 45-55 or so. By move 180 or so, when the game ended, Leela gave his lead as 85-15. Even if you started off looking at move 50 and later, in percentage terms most of the player's advantage had already accrued, and in half as many plays or less. Say what you will, in that game his best play was already behind him. Wouldn't that be a good place to look for cheating?

It seems to me that you are assuming that percentage differences are linear, rather than for example logarithmic. Comparing them is further made difficult by the fact that different AIs' percentages seem to mean different things; Elf OpenGo might give a position 90% while Leela Zero might only give 75%. For a quick sample with KataGo, in one game I got an early 70% matching a scoremean lead of roughly 2 points, while a 85% matched a scoremean lead of roughly 6 points.

Another issue to me seems to be that 'best play', or 'better play', needs defining. Sure, if we set that 'better play' means 'causing a bigger effect in winrate', then your logic is sound, but I would personally attach much less value to the percentages – among other reasons, because of the above interpretation problem. In my experience, comparing scoremean values leads to a much more consistent model, and it also seems intuitively preferable because human players can better understand 'leading by six points' than '80% chance to win'.

Bill Spight wrote:

Yes, there is a cat and mouse game. As John Fairbairn has pointed out, you need to establish a penumbra around cheating so that certain things that non-cheaters do may be disallowed, and certain things that non-cheaters do not currently do must be required. Honest players need to bend over backwards to avoid the appearance of cheating. Such is life.

If it finally comes to this, I think I would rather let some smart cheaters run loose than create a culture where players have to eschew certain moves just to avoid suspicion. As we for now have little idea of just how good a cheat detection model is possible, however, personally I find little interest in contemplating the options. Compared to chess, I think go has more hope because of the longer games and larger number of relevant options.

John Fairbairn · **#27**

Quote:

If it finally comes to this, I think I would rather let some smart cheaters run loose than create a culture where players have to eschew certain moves just to avoid suspicion.

If you alter this sentence to read something more general like "I would rather let some clever manipulators have the chance to avoid detection" you will find it is accepted (grudgingly) around the world in domains outside go by administrators - people on the server side. For example, governments offer financial benefits to people in need and almost invariably take the view that it is better to ensure that 100 legitimate people get fed while one person may cheat and get away with it.

However, on the client side, people regularly show that they deeply resent the cheaters and demand more action against them. The administrators therefore try to strike a balance. Often at great expense they will set up investigation bodies, which ordinary people end up paying for via taxes. People may have to do much extra an onerous work by providing bank statements or refences and the like. More insidiously, governments tend to try to suppress the incentive to cheat by keeping the benefits much lower than they could be. A further measure governments use is the law, with severe penalties and disgrace for convicted cheaters. We end up with something that works - after a fashion. Cheating still happens, ordinary folk still bear the cost and feel disgruntled.

If we try to superimpose model back on to online go, we see that administrators are offering the service of running tournaments and are putting in some measures to detect cheating, while ordinary folk are bitterly complaining about losing games to suspected cheats. It may not cost them a fortune, but there is lost opportunity cost, the cost and hassle of buying webcams or the like, but on the whole it's the sour taste and extra hassle that bothers them most. The "law" - of a kind - operates: cheaters can be disqualified, i.e. there is a penalty.

The one thing that seems to me to be lacking in this superimposition of the model onto go is "disgrace". From what I can see, games are annulled because of suspected cheating but the perpetrators are not named. The argument seems to be that cheating can only be suspected, not proven. Even a fantastic anti-cheating programme that actually works is never going to be accepted as proof enough to out someone publicly.

In the end, therefore, if that remains the case, we can't really apply this real-world model to online go to make it work in the unsatisfactory but bearable way we accept for real-world issues.

But is there a solution? Yes. I think there are two. Imperfect, but good enough to make playing organised go bearable.

One is to eschew online go altogether. Play live. Apply a similar anti-cheating ethos, but the focus will be on smart phones, toilet visits and the line, where detection is of a level that can act as proof. That adds the element of disgrace. Chess has used this model. It seems to work - after a fashion

This solution can be reinforced by organisations such as the EGF strenuously resisting the offer of large cash prizes for online events, and even refusing to count results in online events for rating points and promotions. That may be seen as a drawback, but the compensation in sociability, travel and public exposure (i.e. PR) that live go offers seem to outweigh that heavily.

The other solution I offer is to create a form of implied disgrace. Professionals are already laying online games in which the screen shows, in real-time I believe, a graph of how well a player's moves are matching an AI bot. Both players can be allowed to see this simultaneously. If the match is suspiciously close, a player can cry foul, of course. But even without that, what he can do is take action by himself. He can refuse to play that player in future, for example. He can show his friends, team members, go associations or whatever an independent graph. This requires players to use their real names online, of course, but that should be a sine qua non anyway. This is all a little extreme, and should be combined with a downplaying of online go anyway, but this solution can (I imagine) be made available quickly and is less extreme that the problem that has called out for solutions.

Bill Spight · **#28**

NordicGoDojo wrote:

Bill Spight wrote:

Now, we have a lot of data of strong human players not using AI to cheat, going back before AlphaGo. In addition, thanks to the Elf team and GoGoD, we have a lot of data of differences between human play back then and modern bot play. Yes, humans are learning a lot from the bots, and will continue to do so until the law of diminishing returns kicks it. Which might take a decade or two. That amplifies the problems involved. And, OC, you don't have to stick to the Elf data, you can use KataGo and other bots for analysis, as well. It is just that there is a lot of data already available off the shelf. How much and in what ways human play has changed in the AI era is an important question, but it is important to establish an empirical baseline against which to measure changes.

This is part of the research plan we have established with Mr Egri-Nagy, but for now I personally don't know how useful for cheat detection the results can be.

Oh, in itself I doubt if it is worth much, either. But it is a start. And it has the advantage of examining the play of humans who we are quite sure are not using AI to cheat.

In addition, there is a lot of data against which to test hypotheses. Back in the days of rats in Skinner boxes, our first lab assignment in a course on learning was to put a rat in a Skinner box and observe its behavior without any reinforcement. By itself that showed next to nothing of interest, but it was an important first step to establish the rat's baseline behavior. To quote Rudyard Kipling, "Softly, softly, catchee monkey."

NordicGoDojo wrote:

In my experience, it is not difficult to tell apart humans from AIs (with the exception of AI-savvy top humans, as shown by the false positive I got for a Ke Jie game – but luckily such players are usually not the target group), but the difficulties start when a player starts deliberately mixing human and AI play.

Yes, clever mice are a problem in the cat and mouse game.

There is the saying about not trying to run before you have learned how to walk, but in today's cat and mouse game the cat has little choice in the matter.

And humans are becoming AI savvy rather quickly, since everybody has access to strong programs. It seems to me that most pros nowadays play nearly perfect openings, from the point of view of today's bots, because they try their ideas out on the bots before trying them out in real life, or if not, they copy popular plays and sequences (AI fuseki and joseki).

NordicGoDojo wrote:

Bill Spight wrote:

In the case in question, the suspected cheater had, according to Leela, taken a 70-30 lead by move 50, up from perhaps 50-50 or 45-55 or so. By move 180 or so, when the game ended, Leela gave his lead as 85-15. Even if you started off looking at move 50 and later, in percentage terms most of the player's advantage had already accrued, and in half as many plays or less. Say what you will, in that game his best play was already behind him. Wouldn't that be a good place to look for cheating?

It seems to me that you are assuming that percentage differences are linear, rather than for example logarithmic.

Actually, no. My preference is to use logits. (But in terms of testing evaluations, I have found that there are problems with them, as well.) In this case I was attempting to take the point of view of the naive investigators and to show that they had evidence that it would be a good idea to look at the early play.

NordicGoDojo wrote:

Comparing them is further made difficult by the fact that different AIs' percentages seem to mean different things; Elf OpenGo might give a position 90% while Leela Zero might only give 75%. For a quick sample with KataGo, in one game I got an early 70% matching a scoremean lead of roughly 2 points, while a 85% matched a scoremean lead of roughly 6 points.

The fact is that presumably objective measures, such as the probability of winning a game in self play from a specific position, have never been empirically validated. Never. As a result, like utils, the measure of human utility in economics, they are subjective. You can't compare evaluation measures across bots. They may be of use internally to the programs, but for that all they have to do is to order alternatives well enough. Validating them empirically is a waste of time if you are writing a go playing program. OTOH, if you are writing a program to analyze and evaluate positions, it is a necessity. However, at this point in time, people are happy to use strong players as analysts.

NordicGoDojo wrote:

Another issue to me seems to be that 'best play', or 'better play', needs defining.

Well, as I indicated, don't look to the bots for that. All a strong AI player needs to find is good enough plays. And humans fairly often come up with plays that the bot never considered, but which it evaluates as almost as good as its top choice, or even better, sometimes much better. And, OC, we have to take the bots' evaluations with a grain of salt, because they have not been empirically validated. Better to think of them like human feelings based on experience. The lack of empirical validation means that we do not know the significance of, say, an play that gets a 60% winrate estimate and one that gets a 55% estimate. We may think that the one with the higher estimate is probably better, but who knows? And, OC, the number of visits matters, but the number of visits is dictated by evaluations, so there is a circular logic there. We humans may care about the evaluation of specific plays and positions, but those evaluations are only part of what goes into making a strong bot. If accurate evaluations were necessary to make a strong bot, the programmers would have aimed to make accurate evaluations. They are not necessary, and they didn't.

In chess, IIUC, Ken Regan has come up with Elo evaluations of specific plays or positions, which is a remarkable achievement. :bow:

In go, I think that we are at least a decade away from anything comparable. Quien sabe?

NordicGoDojo wrote:

In my experience, comparing scoremean values leads to a much more consistent model, and it also seems intuitively preferable because human players can better understand 'leading by six points' than '80% chance to win'.

I agree that evaluations in terms of territory or area is an important step forward. Many thanks to lightvector. :salute:

But they have not been empirically validated, either.

NordicGoDojo wrote:

Bill Spight wrote:

Yes, there is a cat and mouse game. As John Fairbairn has pointed out, you need to establish a penumbra around cheating so that certain things that non-cheaters do may be disallowed, and certain things that non-cheaters do not currently do must be required. Honest players need to bend over backwards to avoid the appearance of cheating. Such is life.

If it finally comes to this, I think I would rather let some smart cheaters run loose than create a culture where players have to eschew certain moves just to avoid suspicion.

That's not what I had in mind. I was thinking more of things like webcams and screensharing for online tournaments.

But you make a good point. I used to be a tournament bridge player. Because it is a partnership game with hidden information, cheating is a threat to tournament bridge. Every strong player I know has a high ethical standard and bends over backwards to avoid taking advantage of possibly illicit information and to avoid the appearance of cheating. OTOH, innovation has been stifled by the fact that the innovators know more about the implications of their methods than their opponents. That has led to suspicions and allegations of cheating, and there are those who believe that that knowledge, which cannot be conveyed to the opponents in a few minutes, is private and in itself illicit. As a result, many new methods have been outlawed or severely restricted.

A similar atmosphere in go where certain plays result in suspicion would, IMO, be deadly. OTOH, since cheating has reared its ugly head, it is high time for strong players to adopt high ethical standards.

----

Edit: Let me give an example of something that players might adopt as part of a high ethical standard. In chess, recently a teenager was found guilty of cheating and suspended for a couple of years. In her defense she claimed that she came up with one amazing move while she was in the bathroom. (She also went to the bathroom surprisingly frequently, even for an elderly gentleman with prostate problems.) Now, a behavior which ethical players might adopt would be, except in emergencies, to go to the bathroom on your opponent's move. The danger, OC, is that your opponent might make a play and punch the clock while you were away. But that is the price an ethical player might be willing to pay to avoid the appearance of cheating.

Uberdude · **#29**

Just linking here a reddit thread (scarce in details, links to Chinese news articles) about a new cheating case in Korea, involving the Korean Sumire great young hope (TM) Kim Eunji 1p. https://www.reddit.com/r/baduk/comments ... _cheating/

RobertJasiek · **#30**

In that thread, the only hint at the child cheating is the rumour "kind of admitted to it". Without evidence, why is there endless prejudgement?

jaeup · **#31**

It looks like Korean Baduk Association does not really want to amplify this case, partially because of the pro player's age (just 13). She did admit she used AI. I know "confession alone" cannot be enough to determine if one is guilty, but many players and fans were almost certain that she used AI even before the confession, and I do not think there is any issue left. She did cheat, and I have no doubt about it.

KBA banned her for one year. Well.. many people complain that the punishment is too weak. Again, her age surely was a factor for this decision. From their announcement, it looks like three year ban will be the standard punishment for the future cases.

jaeup · **#32**

Just to clarify the new regulation, the future case will be subject to the ban of "at least" three years.

It may be a lifetime ban. We can only guess until the next case happens.

jlt · **#33**

I recently received an e-mail which suggests that China has an AI detection system (the mail says that it will be used for the final stages of the competition https://www.facebook.com/2020WYAWT/). Has anyone heard about it? Is it similar to the one used by Antti Törmänen?

Derived Metrics for the Game of Go

Who is online