It is not only important to successfully detect cheating (true positives), but is also important (perhaps even more) that the detection method gives very few false positives (leading to unjustful disqualifications and public discrediting of innocent players). You want to avoid things like the
Prosecutor's Fallacy and
Confirmation Bias.
I think what's actually needed for proper automatic cheating detection, is this procedure (which is quite common in machine learning and software classifier competitions):
[1] prepare a large (the larger, the better) collection of games where you know for certain whether there was cheating or not, because it includes games of various volunteers who cheated on purpose in various ways that they saw fit. This collection needs to have games from all levels of play and many different playing styles.
[2] with this annotated collection of games, developers can create and test classifiers (by machine learning or some other method).
[3] you can objectively compare the quality of various classifiers by (for example) their
Matthews correlation coefficient.
[4] you could even create a competition between classifiers from different developers by using a (perhaps undisclosed) representative subcollection of the annotated games that were not used in the creation and testing of those classifiers.
I think that step [1] will be a lot of work, requiring a coordinated effort to create a high quality dataset to use in the next steps. Also, I think step [1] is typically a task for an organisation and not a task for the developers of classifiers.
But once you have this data set, a public competition with prize money could be quite a cheap way to get a very good classifier (see for example the CASP14 protein folding competition that was won by Deepmind's AlphaFold in 2020).