FinrodFelagund wrote:
A first solution might be to simply use the AI to characterize the skill of players and the skill level of specific games.
1.) Find the average winrate change (i.e. move 1 deviates from the AI's top choice by some percentage, called the move's delta for players) in a specific rank by analyzing a high number of games.
2.) Create a histogram for all ranks, from 25k to pro level play. This might mean that the average pro level move is only -5% (an example, I have no idea), and that the average 1d move is -15%.
You can use this data to identify suspicious players. The statistics of any player could be displayed in a public way on the server, so that instead of forcing the admins to decide if a player is cheating, the decision would be up to the suspicious player's opponents. This would work even for high dan players, where it would be obvious if their average deviation put them on par with or above top pros. Matchmaking systems and challenges could also be customized, so that you can set your own limits on the suspicious-ness of your opponents.
Eventually, you could use this kind of data to replace traditional ranking systems. Instead of a 1dan being some arbitrary elo rating, you could assign the rank to an average move delta.
If the cheater was clever, and didn't always pick the best move, they would end up playing at some reasonable level. It would be more difficult to detect these cheaters, but they would also do less harm, since they wouldn't be playing above the strength of the account and so would not on average be frustrating for honest players.
The only really difficult part of this is choosing the number of playouts and the specific engine. My guess is that even a relatively modest level of playouts with any strong engine would work, because the most important thing is that you use the same engine with the same settings for everyone, like a measuring stick.
I have looked into this, and unfortunately it does not look feasible.
I have analysed a considerable number of games with KataGo looking at the size of a player's 'average mistake' – not in terms of winrate-%, but points, as KataGo is able to do. I think this method is more robust than looking at the winrate, because even a very small mistake can cause a big winrate change when the game is close.
After analysing a few players, Ke Jie's average mistake seemed to be in the range of -0.5 points per move. For my own games, I got around -0.8 points; then for Shūsaku I got around -1.2 points, at which point I started to get suspicious. Then I checked a few European 6d players, who came to around -1.5 points; and then I came upon a game by Lukas Podpera 7d and Tanguy Le Calvé, which had an average mistake of only -0.3 points per move for both players.
Investigating further, I realised that the size of the average mistake depends on the 'nature' of the game: fighting-oriented games inevitable lead to higher average mistakes and peaceful games lead to lower average mistake. This is why Ke Jie's -0.5 points per move is impressive. Even if we analysed winrate-% rather than KataGo-points, I believe we would get the same conclusion.
In order to find a way to rank players by the size of their average mistake, it seems we need a way to quantify how 'complex', or 'error-prone', a particular game is. So far I have not thought of a way to accomplish this.