NordicGoDojo wrote:
Bill Spight wrote:
Now, we have a lot of data of strong human players not using AI to cheat, going back before AlphaGo. In addition, thanks to the Elf team and GoGoD, we have a lot of data of differences between human play back then and modern bot play. Yes, humans are learning a lot from the bots, and will continue to do so until the law of diminishing returns kicks it. Which might take a decade or two. That amplifies the problems involved. And, OC, you don't have to stick to the Elf data, you can use KataGo and other bots for analysis, as well. It is just that there is a lot of data already available off the shelf. How much and in what ways human play has changed in the AI era is an important question, but it is important to establish an empirical baseline against which to measure changes.
This is part of the research plan we have established with Mr Egri-Nagy, but for now I personally don't know how useful for cheat detection the results can be.
Oh, in itself I doubt if it is worth much, either. But it is a start. And it has the advantage of examining the play of humans who we are quite sure are not using AI to cheat.

In addition, there is a lot of data against which to test hypotheses. Back in the days of rats in Skinner boxes, our first lab assignment in a course on learning was to put a rat in a Skinner box and observe its behavior without any reinforcement. By itself that showed next to nothing of interest, but it was an important first step to establish the rat's baseline behavior. To quote Rudyard Kipling, "Softly, softly, catchee monkey."

NordicGoDojo wrote:
In my experience, it is not difficult to tell apart humans from AIs (with the exception of AI-savvy top humans, as shown by the false positive I got for a Ke Jie game – but luckily such players are usually not the target group), but the difficulties start when a player starts deliberately mixing human and AI play.
Yes, clever mice are a problem in the cat and mouse game.

There is the saying about not trying to run before you have learned how to walk, but in today's cat and mouse game the cat has little choice in the matter.
And humans are becoming AI savvy rather quickly, since everybody has access to strong programs. It seems to me that most pros nowadays play nearly perfect openings, from the point of view of today's bots, because they try their ideas out on the bots before trying them out in real life, or if not, they copy popular plays and sequences (AI fuseki and joseki).
NordicGoDojo wrote:
Bill Spight wrote:
In the case in question, the suspected cheater had, according to Leela, taken a 70-30 lead by move 50, up from perhaps 50-50 or 45-55 or so. By move 180 or so, when the game ended, Leela gave his lead as 85-15. Even if you started off looking at move 50 and later, in percentage terms most of the player's advantage had already accrued, and in half as many plays or less. Say what you will, in that game his best play was already behind him. Wouldn't that be a good place to look for cheating?
It seems to me that you are assuming that percentage differences are linear, rather than for example logarithmic.
Actually, no. My preference is to use logits. (But in terms of testing evaluations, I have found that there are problems with them, as well.) In this case I was attempting to take the point of view of the naive investigators and to show that they had evidence that it would be a good idea to look at the early play.
NordicGoDojo wrote:
Comparing them is further made difficult by the fact that different AIs' percentages seem to mean different things; Elf OpenGo might give a position 90% while Leela Zero might only give 75%. For a quick sample with KataGo, in one game I got an early 70% matching a scoremean lead of roughly 2 points, while a 85% matched a scoremean lead of roughly 6 points.
The fact is that presumably objective measures, such as the probability of winning a game in self play from a specific position, have never been empirically validated. Never. As a result, like utils, the measure of human utility in economics, they are subjective. You can't compare evaluation measures across bots. They may be of use internally to the programs, but for that all they have to do is to order alternatives well enough. Validating them empirically is a waste of time if you are writing a go playing program. OTOH, if you are writing a program to analyze and evaluate positions, it is a necessity. However, at this point in time, people are happy to use strong players as analysts.
NordicGoDojo wrote:
Another issue to me seems to be that 'best play', or 'better play', needs defining.
Well, as I indicated, don't look to the bots for that. All a strong AI player needs to find is good enough plays. And humans fairly often come up with plays that the bot never considered, but which it evaluates as almost as good as its top choice, or even better, sometimes much better. And, OC, we have to take the bots' evaluations with a grain of salt, because they have not been empirically validated. Better to think of them like human feelings based on experience. The lack of empirical validation means that we do not know the significance of, say, an play that gets a 60% winrate estimate and one that gets a 55% estimate. We may think that the one with the higher estimate is probably better, but who knows? And, OC, the number of visits matters, but the number of visits is dictated by evaluations, so there is a circular logic there. We humans may care about the evaluation of specific plays and positions, but those evaluations are only part of what goes into making a strong bot. If accurate evaluations were necessary to make a strong bot, the programmers would have aimed to make accurate evaluations. They are not necessary, and they didn't.
In chess, IIUC, Ken Regan has come up with Elo evaluations of specific plays or positions, which is a remarkable achievement.

In go, I think that we are at least a decade away from anything comparable. Quien sabe?
NordicGoDojo wrote:
In my experience, comparing scoremean values leads to a much more consistent model, and it also seems intuitively preferable because human players can better understand 'leading by six points' than '80% chance to win'.
I agree that evaluations in terms of territory or area is an important step forward. Many thanks to lightvector.

But they have not been empirically validated, either.
NordicGoDojo wrote:
Bill Spight wrote:
Yes, there is a cat and mouse game. As John Fairbairn has pointed out, you need to establish a penumbra around cheating so that certain things that non-cheaters do may be disallowed, and certain things that non-cheaters do not currently do must be required. Honest players need to bend over backwards to avoid the appearance of cheating. Such is life.
If it finally comes to this, I think I would rather let some smart cheaters run loose than create a culture where players have to eschew certain moves just to avoid suspicion.
That's not what I had in mind. I was thinking more of things like webcams and screensharing for online tournaments.
But you make a good point. I used to be a tournament bridge player. Because it is a partnership game with hidden information, cheating is a threat to tournament bridge. Every strong player I know has a high ethical standard and bends over backwards to avoid taking advantage of possibly illicit information and to avoid the appearance of cheating. OTOH, innovation has been stifled by the fact that the innovators know more about the implications of their methods than their opponents. That has led to suspicions and allegations of cheating, and there are those who believe that that knowledge, which cannot be conveyed to the opponents in a few minutes, is private and in itself illicit. As a result, many new methods have been outlawed or severely restricted.
A similar atmosphere in go where certain plays result in suspicion would, IMO, be deadly. OTOH, since cheating has reared its ugly head, it is high time for strong players to adopt high ethical standards.

----
Edit: Let me give an example of something that players might adopt as part of a high ethical standard. In chess, recently a teenager was found guilty of cheating and suspended for a couple of years. In her defense she claimed that she came up with one amazing move while she was in the bathroom. (She also went to the bathroom surprisingly frequently, even for an elderly gentleman with prostate problems.) Now, a behavior which ethical players might adopt would be, except in emergencies, to go to the bathroom on your opponent's move. The danger, OC, is that your opponent might make a play and punch the clock while you were away. But that is the price an ethical player might be willing to pay to avoid the appearance of cheating.
