It is currently Fri Apr 26, 2024 5:57 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 33 posts ]  Go to page 1, 2  Next
Author Message
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #11 Posted: Tue Nov 10, 2020 10:28 pm 
Judan

Posts: 6162
Liked others: 0
Was liked: 789
Now I discuss the rest of the paper.

For cheat detection, the paper considers a winrate graph over a game's moves according to the AI's stated probabilites. One player cheating is described as the graph steadily going upwards to 99%, both players cheating as a rather constant graph.

Regardless of indirect calculation of winrates, I agree that such graphs identify suspective players because such graphs express them to make essentially no significant mistakes. They are an indication of possibly occurring cheating but not a proof of it.

The paper claims that a player's average 'effect' and consistent development of his moves' effects indicated his skill. Since effect is calculated from scoremeans, again, this is wrong. Effects only indicate a model of skill. See my earlier remarks.

The paper repeats its earlier mistakes, which I have mentioned for earlier sections.

It compares a player's moves to KataGo's first move suggestions; close agreement is said to let a player be suspicious. Suspicious, why not; every player can be suspected to be cheating. However, there is another explanation than cheating: a player can have trained a lot with KataGo or have a similar playing style. Besides, there is the problem that a player might be cheating using a different AI program or different KataGo network; then KataGo's moves might not be particularly suitable for comparison to the player's moves.

The paper studies four example games discussing indications of cheating. Again, it frequently repeats its earlier mistakes. Per player and game, a few indicators are considered to judge about cheating. Most alleged indicators are interpreted as indicating cheating, although they can also be interpreted as the opposite. The paper's systematic repetition of its earlier mistakes combined with indicators interpreted with prejudice of indicating cheating produces alleged detection of cheaters on the implied assumption that several such aspects combined would be sufficiently convincing evidence. A warning of caution that false allegations can occur only serves as an alibi. Such an approach of cheating detection is bound to detect cheaters regardless of what percentage of players is judged upon wrongly.

Another part of the problem is that the paper has introduced values (called metrics) and applies them while the paper does not provide theory for distinguishing when values, or combinations of values, indicate versus do not indicate cheating. This is like statistics without confidence thresholds for individual values, let alone for combined consideration of several same kinds of values and then for different kinds of values.

Value graphs are supposed to be interpreted subconsciously by human arbiters. Instead, there ought to be theory interpreting data represented in graphs, such as analysing differences between two value curves in a graph.

Such theory requires agreement to very large samples of games and their values and graphs. This is so also because different board positions and different games can have different behaviours of values. E.g., imagine a semeai with two local maxima: one correct and one wrong; when calculting an average for a roughly balanced tree search, also the average will be a wrong indication. Currently, such is interpreted as being as good as all other values for ordinary positions.

The paper's value analysis applied to players creates an unfair prejudice: some players with specific playing styles, studying with specific AI programs or having studied much with AI are in much greater danger of being wrongly indicated as cheaters.

In conclusion, although the paper suggests some values potentially useful for some studies or models, the theory is very far from rather safe application as distinction between cheating and no cheating, except for the mentioned success cases of tool usage followed by a player admitting cheating. Currently, the theory is very incomplete, is over-interpreted and frequently advertised by the paper's authors within the paper as being more (an alleged description of reality, such as "a player's skill") than it is (only a model such as "a model of a player's skill"; furthermore a model lacking quality evaluation, which - for the promoted application of cheating detection - is essential).

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #12 Posted: Wed Nov 11, 2020 3:00 am 
Beginner

Posts: 19
Liked others: 2
Was liked: 8
Rank: 1p
RobertJasiek wrote:
Apparently, the scoremean is defined as a mean over all visited scores (at the leaves, I suppose) during a Monte-Carlo search. When correct subsequent play approaches a leaf, the scoremean can converge to strong human score prediction. So far so good. However, in the general case, which includes many positions far earlier than a leaf, there may be some stability in the values and game-tree-local convergence for strong AI play but we do not know by how much in every specific position scoremean and strong human score prediction differ. The scoremean does not equal strong human score prediction.

Sure enough, human counting and AI counting are different.

An interesting feature of the scoremean is that KataGo is reliably able to produce ties against itself (with an integer komi), showing that its calculations are at least consistent to a degree. (For example Leela Zero is unable to do the same.) KataGo also reliably beats Leela Zero, indicating that its understanding of the game should be the better one. While the scoremean values are 'impure' and 'imprecise', unlike human counting, I still think we should give them value.

RobertJasiek wrote:
The paper says: "Every move played in a game reduces the number of its future possibilites." Unless a superko or similar rule applies, this is just a conjecture and disproven by this counter-example: White's two-eye-formation fills the board, White fills an eye, Black passes, White fills an eye committing suicide (assuming it is legal according to the rules). The resulting position has a greater number of future possibilities than the initial position. To get a theorem instead of a conjecture, some presuppositions need to be stated and a proof is required.

I'm not sure I get this. For every move, you could have played the move or passed. After playing a move (or passing), you clearly have a smaller number of possible futures remaining.

RobertJasiek wrote:
The effect of a move is defined as the difference of scoremeans after and before it. The paper says that statistical information on the effects describe the playing skill of a player. No. It only describes a model of the playing skill of a player because the scoremean is only a model of correct positional judgement.

For sure, a player's average effect in a single game does not make it possible to accurately estimate their playing skill – we never claimed such a thing. There is a fairly strong correlation, however.

RobertJasiek wrote:
The paper claims that AI does not follow strategic plans, which can be expressed in human terms. Wrong. I already described such during the early AlphaGo days, when AI performed exactly according to my previously described generally applicable strategic planning of a certain kind (best reducing large sphere of inflence).

Of course you can fit a humanly describable strategic plan to a particular move by an AI.

The point is that the AI's move-choosing procedure itself is not, e.g., 'right now there is nothing urgent going on, so the next most valuable thing is to reduce that sphere of influence, and for that particular task I should use the technique I read about in a theory book.' The AI chooses its moves purely from the information available on the game board, through algorithms not accessible to humans, without trying to infer the opponent's verbalisable strategy.

RobertJasiek wrote:
Regardless of indirect calculation of winrates, I agree that such graphs identify suspective players because such graphs express them to make essentially no significant mistakes. They are an indication of possibly occurring cheating but not a proof of it.

I think we should note that the only 'proof' of cheating is the cheater's confession or getting caught in the act, for example by a video recording or a trusted proctor. All other anti-cheating solutions are finally based on probabilities, which I think should be called 'evidence' rather than 'proof'.

RobertJasiek wrote:
It compares a player's moves to KataGo's first move suggestions; close agreement is said to let a player be suspicious. Suspicious, why not; every player can be suspected to be cheating. However, there is another explanation than cheating: a player can have trained a lot with KataGo or have a similar playing style. Besides, there is the problem that a player might be cheating using a different AI program or different KataGo network; then KataGo's moves might not be particularly suitable for comparison to the player's moves.

I have tested the model on a wide variety of AIs, even AlphaGo Master, which is trained on human games and plays considerably differently from modern AIs such as KataGo. Superhuman performance is still visible in the graphs, even if KataGo's own value function is slightly different.

Furthermore, being able to for example play KataGo's favoured sequences out from time to time will not mark a player as suspicious, but consistently playing in the roughly right part of the board will. I have seen no evidence of a player's 'familiarity with AI moves' making them stand out in my analysis.

RobertJasiek wrote:
Another part of the problem is that the paper has introduced values (called metrics) and applies them while the paper does not provide theory for distinguishing when values, or combinations of values, indicate versus do not indicate cheating. This is like statistics without confidence thresholds for individual values, let alone for combined consideration of several same kinds of values and then for different kinds of values.

I think you may have misunderstood the purpose of the paper. As we are basically just starting the research, we can hardly have a finished product for cheat detection work – if we did, you could already download the software somewhere. The paper describes how different metrics derivable from AI analysis can be used in cheat detection. We cannot possibly do statistics with confidence thresholds if we don't even know what we should measure. This is what the paper starts to tackle.

RobertJasiek wrote:
E.g., imagine a semeai with two local maxima: one correct and one wrong; when calculting an average for a roughly balanced tree search, also the average will be a wrong indication. Currently, such is interpreted as being as good as all other values for ordinary positions.

This will however not happen, because the AI will not use the same amount of playouts for both possibilities. You can easily test this with KataGo and see that it gives the (roughly) right scoremean once it figures out the status of the position. Depending on the complexity of a position, this may of course require a larger number of playouts.

RobertJasiek wrote:
The paper's value analysis applied to players creates an unfair prejudice: some players with specific playing styles, studying with specific AI programs or having studied much with AI are in much greater danger of being wrongly indicated as cheaters.

As I said above, this claim is unsubstantiated.

RobertJasiek wrote:
In conclusion, although the paper suggests some values potentially useful for some studies or models, the theory is very far from rather safe application as distinction between cheating and no cheating, except for the mentioned success cases of tool usage followed by a player admitting cheating. Currently, the theory is very incomplete, is over-interpreted and frequently advertised by the paper's authors within the paper as being more (an alleged description of reality, such as "a player's skill") than it is (only a model such as "a model of a player's skill"; furthermore a model lacking quality evaluation, which - for the promoted application of cheating detection - is essential).

As I also wrote above, I think you have misunderstood the point of the paper. It certainly was not presenting a robust system that can be applied in cheat detection – if that was the case, then the research would be nearing completion, rather than having just started, and we would already have a product to offer. The four cases presented in the paper are examples of how interpretation of the generated graphs can possible quicken the otherwise slow, pure human analysis of alleged cheating cases. None of the presented metrics or procedures is given as a 'general solution', but rather as a check that can be done to get a better idea of what happened. When applying a series of 'cheat filters' such as these and all (or most) of them coming off as 'suspicious', then we have probabilistic data that I believe is more trustworthy than a 'mere' human interpretation made from reviewing a game. When there is no way to get actual 'proof' of cheating, this, I think, can very well be the next best thing.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #13 Posted: Wed Nov 11, 2020 4:03 am 
Oza

Posts: 3657
Liked others: 20
Was liked: 4631
What I see here is an argument between mathematics + mathematics and mathematics + common sense. You can easily guess which side I'm on.

It seems from the reports here on the aptly named Corona Cup that online go already has its own CV problem - the Cheating Virus. A vaccine is needed urgently. The world and (more significantly perhaps) its stock markets have already greeted with jubilation the news that there may be a vaccine soon for the real CV that gives "only" 90% protection. An approach that gives 90% protection NOW or SOON against cheating in go is surely to be similarly welcomed.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #14 Posted: Wed Nov 11, 2020 4:24 am 
Gosei
User avatar

Posts: 1754
Liked others: 177
Was liked: 492
Some people worry more about side-effects of the vaccine on healthy people than the protective effect on vulnerable people. The discussion can go on forever. I am confident that the anti-CV committee will make reasonable use of the vaccine.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #15 Posted: Wed Nov 11, 2020 5:26 am 
Judan

Posts: 6162
Liked others: 0
Was liked: 789
NordicGoDojo wrote:
An interesting feature of the scoremean is that KataGo is reliably able to produce ties against itself (with an integer komi), showing that its calculations are at least consistent to a degree. [...] KataGo also reliably beats Leela Zero, indicating that its understanding of the game should be the better one. While the scoremean values are 'impure' and 'imprecise', unlike human counting, I still think we should give them value.


Sure.

Where I object is over-interpretation of KataGo's skill, e.g., when the paper refers to "a player's skill".


Quote:
RobertJasiek wrote:
number of its future possibilites

I'm not sure I get this.


I lack to time to work out this. The paper might as well omit the related statement without significant impact, so who cares?:)

Quote:
a player's average effect in a single game does not make it possible to accurately estimate their playing skill


At the same time, the paper describes the intention of analysing a player's skill (but so far should only speak of a "model of it") from just one game. His demonstrated, what the paper calls, skill shall then be used as a basis for possibly detecting his cheating in this game.

If you do hold your statement, you must at the same time hold that cheating detection by the paper's means from only one game of the player is impossible.

Quote:
There is a fairly strong correlation, however.


I do not have a problem with seeing a fairly strong correlation, as long as it roughly described as "for an average game of a particular, arbitrary player, the paper's tools can indicate a 'cheating' suspicion under the assumption that the model of the player's performance is his performance".

Quote:
Of course you can fit a humanly describable strategic plan to a particular move by an AI.


Not just to one move but To particular kinds of move sequences.

Quote:
The point is that the AI's move-choosing procedure itself is [...]


...not described as a human-readable strategic plan indeed, right. It is well hidden in the network values, pure tree searches and code.

Quote:
I think we should note that the only 'proof' of cheating is the cheater's confession or getting caught in the act, for example by a video recording or a trusted proctor. All other anti-cheating solutions are finally based on probabilities, which I think should be called 'evidence' rather than 'proof'.


Right.

Therefore, if "statistical" probabilities shall serve as evidence, they require theory for thresholds, levels of confidence and agreement to large samples.

Quote:
I have tested the model on a wide variety of AIs, even AlphaGo Master, which is trained on human games and plays considerably differently from modern AIs such as KataGo.


Good.

Quote:
Furthermore, being able to for example play KataGo's favoured sequences out from time to time will not mark a player as suspicious, but consistently playing in the roughly right part of the board will.


(My reply refers to phases before the endgame phase.)

I disagree. A player can have the skill to always play in roughly the, as indicated by AI analysis, right part of the board in some of his games. Such a player need not have superhuman level.

A player is suspicious if he also consistently plays locally close to optimal. If we know he is a strong (or very strong) player, we must be extra cautious and tolerant towards interpreting his skill.

Quote:
I have seen no evidence of a player's 'familiarity with AI moves' making them stand out in my analysis.


I expect what you describe. Nevertheless and regardless, could you describe your observations so far in more detail, please? We might learn from them.

Quote:
I think you may have misunderstood the purpose of the paper.


I get it that the paper is an early step in metrics analysis - for that purpose, I do not think to have misunderstood its purpose.

At the same time, at various places, the paper makes detailed statements that go far beyond the aforementioned purpose. I criticise the paper for such over-interpreting statements.

Furthermore, the paper goes far beyond the aforementioned purpose when suggesting and describing application to cheating detection. I also criticise that the paper rushes ahead too fast while it even serves as part of justification of already applying such tools in tournaments.

IOW, there is not just one purpose of the paper - not just the modest purpose of an early step in metrics analysis. This paper does not give an impression like a pure maths paper, such as about KL-divergence gives. Quite contrarily, implicitly the paper is referred to as strong justification when a tournament announcement refers to "state-of-the-art" anti-cheating tools.

Quote:
Depending on the complexity of a position, this may of course require a larger number of playouts.


Yes, but cheating detection is supposed to be applicable even when, in quite a few positions, there are not enough playouts.

Quote:
RobertJasiek wrote:
The paper's value analysis applied to players creates an unfair prejudice: some players with specific playing styles, studying with specific AI programs or having studied much with AI are in much greater danger of being wrongly indicated as cheaters.

As I said above, this claim is unsubstantiated.


I am not convinced because I do not buy it that the paper's only purpose would be early research. (You might rewrite the paper to convince me by removing all details hinting at advanced application / interpretation, but please do not waste your time on doing so:) As a suggestion for future papers, clearly distinguish current level of understanding and possible future research, and maybe applications beyond a paper described outside a paper itself.)

Quote:
It certainly was not presenting a robust system that can be applied in cheat detection – if that was the case, then the research would be nearing completion, rather than having just started, and we would already have a product to offer.


Thank you for the clarification!

Quote:
When applying a series of 'cheat filters'


Good in theory, but only good in practice if each filter itself is convincing - and is not a roughly 50% interpretation chance "cheated or not cheated".

Quote:
I believe is more trustworthy than a 'mere' human interpretation made from reviewing a game.


I think what might some time become a useful filter is objective analysis of data currently presented as graphs and indicating similar progress (such as "winning chances") during a game of different AIs' moves versus a player's moves. Such characteristics are very hard to fake if they occur absolutely consistently before the stage of already having won a game strategically. Of course, that presumes extensive studies that coincidences do not occur just because of a specific nature of a game's development.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #16 Posted: Wed Nov 11, 2020 6:11 am 
Judan

Posts: 6162
Liked others: 0
Was liked: 789
"an argument between mathematics + mathematics and mathematics + common sense"

A maths paper deserves, first of all, a maths assessment.

A paper containing maths and informal text deserves a reply that is partly either.

Beyond the paper itself, there can also be informal discussion. Such as how cheating should be treated in a particular tournament.

All of this is not an argument between maths and informal discussion. Both have their place.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #17 Posted: Wed Nov 11, 2020 6:17 am 
Judan

Posts: 6162
Liked others: 0
Was liked: 789
jlt, whenever arbitration, anti-cheating detection or judgements about crimes are concerned, by all means decisions must minimise unjust judgements as far as anyhow possible.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #18 Posted: Wed Nov 11, 2020 7:34 am 
Beginner

Posts: 19
Liked others: 2
Was liked: 8
Rank: 1p
Quote:
a player's average effect in a single game does not make it possible to accurately estimate their playing skill
RobertJasiek wrote:
At the same time, the paper describes the intention of analysing a player's skill (but so far should only speak of a "model of it") from just one game. His demonstrated, what the paper calls, skill shall then be used as a basis for possibly detecting his cheating in this game.

If you do hold your statement, you must at the same time hold that cheating detection by the paper's means from only one game of the player is impossible.

My apologies, I should have written 'a player's average effect alone' to emphasise the point. This is why all the other metrics are there for help. Although there is a strong general correlation between the average effect and a player's skill, it is possible to find a game by top European amateurs that ends with an average effect of around -0.3, while a random Ke Jie game might end with average effects at around -0.7. This obviously doesn't imply that the European amateurs are stronger, but that the 'type' of the game is different, with the latter game involving more what we might call 'complexity'. If we could somehow quantify this complexity, we might be able to move forward towards a more general solution. This might be less difficult than it sounds; or, at least, currently I'm working on a promising solution idea.

Quote:
I think we should note that the only 'proof' of cheating is the cheater's confession or getting caught in the act, for example by a video recording or a trusted proctor. All other anti-cheating solutions are finally based on probabilities, which I think should be called 'evidence' rather than 'proof'.
RobertJasiek wrote:
Right.

Therefore, if "statistical" probabilities shall serve as evidence, they require theory for thresholds, levels of confidence and agreement to large samples.

I agree completely, and hope that the research will reach such a point one day.

Quote:
Furthermore, being able to for example play KataGo's favoured sequences out from time to time will not mark a player as suspicious, but consistently playing in the roughly right part of the board will.
RobertJasiek wrote:
(My reply refers to phases before the endgame phase.)

I disagree. A player can have the skill to always play in roughly the, as indicated by AI analysis, right part of the board in some of his games. Such a player need not have superhuman level.

A player is suspicious if he also consistently plays locally close to optimal. If we know he is a strong (or very strong) player, we must be extra cautious and tolerant towards interpreting his skill.

For sure, the criteria especially in the current model has to adapt to the analysed player's level. As the 'human' part of the current model, I will be a lot more surprised if a supposed 7k plays so well that I couldn't imagine playing any better, than if a supposed 7d does the same. Note that this may not 'prove' that the 7k is consulting an AI – they might simply be sandbagging.

Quote:
I have seen no evidence of a player's 'familiarity with AI moves' making them stand out in my analysis.
Quote:
I expect what you describe. Nevertheless and regardless, could you describe your observations so far in more detail, please? We might learn from them.

This sounds like a useful undertaking – I already have a number of 'cheater profiles' in my mind, but not in written form, and I should get around to classifying and describing them better. I think I should omit writing anything hastily here for now, but I expect to get back on the subject later (at the latest in a future paper).

Anecdotally, I can say that 'I have been studying AI a lot these past few months and that is why my play looks similar' is the most common counterargument I hear when I accuse somebody of cheating – it comes up almost invariably. Usually, after that the suspect goes on to explain why they were able to play some AI early or middle game joseki, when my reason for suspicion will have been something very different (such as a surprisingly small average effect or extremely sharp reading during one or multiple key fights). I tend to completely ignore AI joseki in my analysis, not counting them as evidence.

RobertJasiek wrote:
IOW, there is not just one purpose of the paper - not just the modest purpose of an early step in metrics analysis. This paper does not give an impression like a pure maths paper, such as about KL-divergence gives. Quite contrarily, implicitly the paper is referred to as strong justification when a tournament announcement refers to "state-of-the-art" anti-cheating tools.

It seems we entered the semantics part anyway. Let me quote a dictionary definition which I think is in line with CC2's usage of the term: 'very modern and using the most recent ideas and methods.' If you can tell me of go-specific anti-cheating tools so much more modern that my model pales in comparison, I will be happy to hear of them.

Quote:
Depending on the complexity of a position, this may of course require a larger number of playouts.
RobertJasiek wrote:
Yes, but cheating detection is supposed to be applicable even when, in quite a few positions, there are not enough playouts.

I think this will be extremely difficult to achieve in practice. How can you know that a player played like an AI if you don't consult an AI to a reasonable depth (where 'reasonable' must be context-specific) to see what its output is? Of course, a strong human player might get a fairly good idea without, but trying to bypass that requirement is one of the main points of the model.

Quote:
When applying a series of 'cheat filters'
RobertJasiek wrote:
Good in theory, but only good in practice if each filter itself is convincing - and is not a roughly 50% interpretation chance "cheated or not cheated".

Surely this is an unnecessary requirement. If we have five purely independent filters that all have a 50% chance, we already have 97% accuracy when applying them all.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #19 Posted: Wed Nov 11, 2020 8:30 am 
Judan

Posts: 6162
Liked others: 0
Was liked: 789
0.5 to the 5th power is not necessarily 1 - 97% when some (or even most) of the filters are applied with prejudice towards the cheating allegation. You need 5 mutually independent, objective filters.

When you confront a player with metrics, he cannot defend himself on the same terms. He must defend himself by his own skills, which might include to have studied by AI. You have to be fair to him instead of quickly disbelieving reasons just because others could abuse them as fake evidence.

Good luck / skill with your continued research!


This post by RobertJasiek was liked by: NordicGoDojo
Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #20 Posted: Wed Nov 11, 2020 10:40 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
NordicGoDojo wrote:
As we are basically just starting the research, we can hardly have a finished product for cheat detection work


May I strongly suggest that you add an empirical researcher to your team? Without empirical validation it is easy to go astray. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #21 Posted: Wed Nov 11, 2020 11:59 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
NordicGoDojo wrote:
I already have a number of 'cheater profiles' in my mind, but not in written form, and I should get around to classifying and describing them better. I think I should omit writing anything hastily here for now, but I expect to get back on the subject later (at the latest in a future paper).

Anecdotally, I can say that 'I have been studying AI a lot these past few months and that is why my play looks similar' is the most common counterargument I hear when I accuse somebody of cheating – it comes up almost invariably.


I remember the flap some year ago over a cheating allegation that a player had used Leela11 to cheat. There were real problems with the evidence of cheating. For one thing the accusers reported only plays between move 50 and move 150, for no apparent reason. That raises the question of cherry picking their data. For another, their main evidence was plays matching one of the top three of Leela11's choices. One thing that lay people do not understand, without having been taught, is that confirmatory evidence is weak. In fact, it is very weak. So much so that Popper discounts it entirely.

Obviously, one way of cheating is to copy a bot's play, and matching several of its plays can raise suspicions of cheating. But suspicion is only suspicion. OC, it could justify more investigation. In the case I am talking about, the accusers broadened their search for evidence of cheating to include matching Leela11's second choices. That raises suspicions about the accusers. They have switched their theory of cheating. (OC, they may, for reasons unknown, started with looking for more matches than matches to Leela's top choices. But bear with me, please.) Their theory now is that the cheater sometimes played the top choice, sometimes the second choice. Well, that is a possible way to cheat, OC. But two things happen by broadening what is considered to be confirmatory evidence. First, the weak evidence is made even weaker. Second, by being able to match more moves, the evidence of cheating is made to sound more convincing. Instead of reporting, say, 60% matches, you can report, say, 90% matches. And by including matches to Leela's third choices, as they also did, you weaken the evidence further, but make it sound more convincing by reporting, say, 95% matches. (I don't remember the exact number of matches reported in the accusation, but you get the general picture. Weaker evidence that sounds better.) Now, I am not saying that the accusers purposely presented misleading evidence, but I do think that their investigation was biased by looking for evidence of cheating. The right approach is, once your suspicions have been aroused, to look for evidence that the player is not cheating. You try to disprove your theory. Such an investigation may provide further evidence of cheating, OC. :)

This does not mean that matching a bot's top choices may not be good evidence of cheating. It simply must be disconfirmatory evidence. One avenue that may be helpful came to light in a discussion here some time back. It turns out, at least preliminarily, that any two of today's top bots match each other's top choices around 80% of the time — IIRC, this was fairly early in the game. The term for that is a concordance rate of 80%. Research with several bots over many, many games could produce a good estimate, along with probability distributions and error estimates. Now suppose that a suspected cheater's plays matched a particular bot 95% of the time, while several other top bots each matched that bot's plays around 80% of the time. That would be strong evidence that the player was matching that bot's plays. Not because of the matching, per se, but because of the difference between the player's percentage of matching and that of the other bots. Confirmatory evidence is weak. Disconfirmatory evidence is strong. :)

Edit: This illustrates why loose matching to any of a bot's three top choices is a bad idea. For such matching, the concordance rate between any two of today's top bots would approach 100%. Even if a player's plays matched one of a bot's three top choices at a very high rate, the difference between the player's matching rate and that of another top bot would be too small to find a significant difference.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


This post by Bill Spight was liked by: Harleqin
Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #22 Posted: Wed Nov 11, 2020 7:24 pm 
Beginner

Posts: 19
Liked others: 2
Was liked: 8
Rank: 1p
Bill Spight wrote:
May I strongly suggest that you add an empirical researcher to your team? Without empirical validation it is easy to go astray. :)

I've been wanting to blind-test my model for a long time, and that is why I recently made the initial test with 5 AI v. AI games, 5 human v. human games, and 10 AlphaGo v. human games (reaching a hit rate of 87.5%, with one false positive and four false negatives). The problem is that proper material for blind testing is extremely difficult (or expensive) to gather: we need to have cheaters who confess their cheating, we need to be able to trust their confession (or else have proof of their cheating), and we also need proof that the non-cheating players really did not cheat.

Bill Spight wrote:
For one thing the accusers reported only plays between move 50 and move 150, for no apparent reason.

The Hawkeye program on the Yike server does something similar. I understand the rationale is that (stronger) humans can memorise opening moves and count accurately in the endgame to avoid larger mistakes, meaning that middle game content should be the most informative for identifying cheaters. My own experience generally suggests the same, although from time to time there can be useful material in the opening and the endgame as well.

Bill Spight wrote:
In the case I am talking about, the accusers broadened their search for evidence of cheating to include matching Leela11's second choices. That raises suspicions about the accusers. They have switched their theory of cheating. (OC, they may, for reasons unknown, started with looking for more matches than matches to Leela's top choices. But bear with me, please.) Their theory now is that the cheater sometimes played the top choice, sometimes the second choice. Well, that is a possible way to cheat, OC.

One major challenge at the time was that online cheating as a phenomenon was new, and there was nothing nearing a 'best established practice' for cheat detection. Checking how consistently a player's moves rate within an AI's top candidates is an easy idea to come up with – besides this case, it is also used in the Yike Hawkeye program.

As per my understanding of what happened, the accused actually changed their play after the accusation; when before they tended to pick the AI's top suggestion, and that was pointed out, then they started opting for lower candidates instead, which the accuser then pointed out. If this description is accurate, surely the change increases, rather than decreases, the likelihood of the accused cheating. However, my understanding is based on hearsay from people not directly involved in the case, so I will restrict myself to the hypothetical.

Bill Spight wrote:
But two things happen by broadening what is considered to be confirmatory evidence. First, the weak evidence is made even weaker. Second, by being able to match more moves, the evidence of cheating is made to sound more convincing. Instead of reporting, say, 60% matches, you can report, say, 90% matches. And by including matches to Leela's third choices, as they also did, you weaken the evidence further, but make it sound more convincing by reporting, say, 95% matches. (I don't remember the exact number of matches reported in the accusation, but you get the general picture. Weaker evidence that sounds better.) Now, I am not saying that the accusers purposely presented misleading evidence, but I do think that their investigation was biased by looking for evidence of cheating. The right approach is, once your suspicions have been aroused, to look for evidence that the player is not cheating. You try to disprove your theory. Such an investigation may provide further evidence of cheating, OC. :)

A big challenge in the whole field of cheat detection is that it is essentially an endless cat-and-mouse game: when you come up with a model that catches a high ratio of cheaters with a small ratio of false positives and make it publicly known, the smart cheaters will adjust their play so that the model doesn't find them. Presumably for this observer effect, chess servers such as chess.org don't publicly state how exactly their cheat detection mechanism works.

If you monitor concordance with the AI's top moves, smart cheaters start avoiding them. If you monitor the development of the winrate, smart cheaters start controlling the 'story' of the game that it is not straightforward, but that it seems they are losing at one or several parts of the game. If you monitor the scoremean, smart cheaters start playing suboptimally, so that they perform just a little bit better than the opponent. And so on, and so on.

I am aware that the above can be interpreted as 'all players being guilty until disproven', so I would like to stress that this is completely not how I operate. I always assume that an alleged cheater is innocent until shown otherwise, and for that a good number of 'cheat filters' have to come off as positive. (E.g., straightforward win – check; unexpectedly high quality of the game – check; concordance with AI's recommendations at key points – check; etc.) Unfortunately, at this point the model relies on my human 'gut feeling' in the final call, rather than, say, an accurate probability distribution with confidence thresholds.

Bill Spight wrote:
Research with several bots over many, many games could produce a good estimate, along with probability distributions and error estimates. Now suppose that a suspected cheater's plays matched a particular bot 95% of the time, while several other top bots each matched that bot's plays around 80% of the time. That would be strong evidence that the player was matching that bot's plays. Not because of the matching, per se, but because of the difference between the player's percentage of matching and that of the other bots. Confirmatory evidence is weak. Disconfirmatory evidence is strong. :)

Edit: This illustrates why loose matching to any of a bot's three top choices is a bad idea. For such matching, the concordance rate between any two of today's top bots would approach 100%. Even if a player's plays matched one of a bot's three top choices at a very high rate, the difference between the player's matching rate and that of another top bot would be too small to find a significant difference.

As far as I know, top pros already have concordance rates of above 80% in Hawkeye, and a European amateur player was recently suspected for reaching 76%. Personally I see too many potential problems in this model to rely on it in my own analysis – it makes for a singular, weak 'cheat filter' at best.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #23 Posted: Wed Nov 11, 2020 9:15 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
NordicGoDojo wrote:
Bill Spight wrote:
May I strongly suggest that you add an empirical researcher to your team? Without empirical validation it is easy to go astray. :)

I've been wanting to blind-test my model for a long time, and that is why I recently made the initial test with 5 AI v. AI games, 5 human v. human games, and 10 AlphaGo v. human games (reaching a hit rate of 87.5%, with one false positive and four false negatives). The problem is that proper material for blind testing is extremely difficult (or expensive) to gather: we need to have cheaters who confess their cheating, we need to be able to trust their confession (or else have proof of their cheating), and we also need proof that the non-cheating players really did not cheat.


I understand the difficulties. The questions you raise are ones that an experienced empirical researcher can address. :)

Now, we have a lot of data of strong human players not using AI to cheat, going back before AlphaGo. In addition, thanks to the Elf team and GoGoD, we have a lot of data of differences between human play back then and modern bot play. Yes, humans are learning a lot from the bots, and will continue to do so until the law of diminishing returns kicks it. Which might take a decade or two. That amplifies the problems involved. And, OC, you don't have to stick to the Elf data, you can use KataGo and other bots for analysis, as well. It is just that there is a lot of data already available off the shelf. How much and in what ways human play has changed in the AI era is an important question, but it is important to establish an empirical baseline against which to measure changes.

NordicGoDojo wrote:
Bill Spight wrote:
For one thing the accusers reported only plays between move 50 and move 150, for no apparent reason.

The Hawkeye program on the Yike server does something similar. I understand the rationale is that (stronger) humans can memorise opening moves and count accurately in the endgame to avoid larger mistakes, meaning that middle game content should be the most informative for identifying cheaters. My own experience generally suggests the same, although from time to time there can be useful material in the opening and the endgame as well.


Maybe so, but that is an untested hypothesis. Now, there may be good evidence for that in chess, for human play versus that of pre-neural net chess engines. But it is not a good methodology. As one of my profs stressed, don't throw away data. (Yes, you may have to deal with outliers, but we are not talking about that now. Besides, you deal with them, you don't just throw them out.) In the case in question, the suspected cheater had, according to Leela, taken a 70-30 lead by move 50, up from perhaps 50-50 or 45-55 or so. By move 180 or so, when the game ended, Leela gave his lead as 85-15. Even if you started off looking at move 50 and later, in percentage terms most of the player's advantage had already accrued, and in half as many plays or less. Say what you will, in that game his best play was already behind him. Wouldn't that be a good place to look for cheating?

NordicGoDojo wrote:
Bill Spight wrote:
In the case I am talking about, the accusers broadened their search for evidence of cheating to include matching Leela11's second choices. That raises suspicions about the accusers. They have switched their theory of cheating. (OC, they may, for reasons unknown, started with looking for more matches than matches to Leela's top choices. But bear with me, please.) Their theory now is that the cheater sometimes played the top choice, sometimes the second choice. Well, that is a possible way to cheat, OC.

One major challenge at the time was that online cheating as a phenomenon was new, and there was nothing nearing a 'best established practice' for cheat detection.


Well, yes, the accusers were feeling their way around. But that didn't stop a lot of people from drawing very strong conclusions. :( Let me say that Ales Cieply did some very careful and meticulous analysis. :)

NordicGoDojo wrote:
Bill Spight wrote:
But two things happen by broadening what is considered to be confirmatory evidence. First, the weak evidence is made even weaker. Second, by being able to match more moves, the evidence of cheating is made to sound more convincing. Instead of reporting, say, 60% matches, you can report, say, 90% matches. And by including matches to Leela's third choices, as they also did, you weaken the evidence further, but make it sound more convincing by reporting, say, 95% matches. (I don't remember the exact number of matches reported in the accusation, but you get the general picture. Weaker evidence that sounds better.) Now, I am not saying that the accusers purposely presented misleading evidence, but I do think that their investigation was biased by looking for evidence of cheating. The right approach is, once your suspicions have been aroused, to look for evidence that the player is not cheating. You try to disprove your theory. Such an investigation may provide further evidence of cheating, OC. :)

A big challenge in the whole field of cheat detection is that it is essentially an endless cat-and-mouse game: when you come up with a model that catches a high ratio of cheaters with a small ratio of false positives and make it publicly known, the smart cheaters will adjust their play so that the model doesn't find them. Presumably for this observer effect, chess servers such as chess.org don't publicly state how exactly their cheat detection mechanism works.


Yes, there is a cat and mouse game. As John Fairbairn has pointed out, you need to establish a penumbra around cheating so that certain things that non-cheaters do may be disallowed, and certain things that non-cheaters do not currently do must be required. Honest players need to bend over backwards to avoid the appearance of cheating. Such is life.

But if cheat detection is essentially the cat and mouse game, the atmosphere is already poisoned. You see this in the world of espionage, where there is very little in the way of proof, and you never know whether you are being paranoid enough.

Quote:
I am aware that the above can be interpreted as 'all players being guilty until disproven', so I would like to stress that this is completely not how I operate.


I didn't think so. :) We need people who are immersed in the cat and mouse game. But that immersion perforce tends to produce a suspicious mind set, which may be necessary to play that game well. This is why we have different people assess the evidence in the end. That's also why we develop empirical methods of testing the evidence.

NordicGoDojo wrote:
Bill Spight wrote:
Research with several bots over many, many games could produce a good estimate, along with probability distributions and error estimates. Now suppose that a suspected cheater's plays matched a particular bot 95% of the time, while several other top bots each matched that bot's plays around 80% of the time. That would be strong evidence that the player was matching that bot's plays. Not because of the matching, per se, but because of the difference between the player's percentage of matching and that of the other bots. Confirmatory evidence is weak. Disconfirmatory evidence is strong. :)

As far as I know, top pros already have concordance rates of above 80% in Hawkeye, and a European amateur player was recently suspected for reaching 76%. Personally I see too many potential problems in this model to rely on it in my own analysis – it makes for a singular, weak 'cheat filter' at best.


The point is that by establishing an empirical baseline we can turn weak evidence into strong evidence. Not because the evidence matches our suspicions, but because it differs from the baseline. Confirmatory evidence is weak, disconfirmatory evidence is strong. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #24 Posted: Wed Nov 11, 2020 9:39 pm 
Judan

Posts: 6162
Liked others: 0
Was liked: 789
If you lack (non-)cheating games, how about emulating them?

Anti-cheating means are weak if they have to be kept secret to work well. They must be open and survive attempts of cheaters' adapted styles.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #25 Posted: Wed Nov 11, 2020 9:50 pm 
Judan

Posts: 6162
Liked others: 0
Was liked: 789
Bill Spight wrote:
certain things that non-cheaters do may be disallowed


Such as entering a tournament hall without passing a metal detector?

Surely we must not prohibit any legal move.

Quote:
Confirmatory evidence is weak, disconfirmatory evidence is strong.


You keep repeating this gospel but please explain the theory why, IYO, this necessarily must be so, must it?

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #26 Posted: Thu Nov 12, 2020 12:32 am 
Beginner

Posts: 19
Liked others: 2
Was liked: 8
Rank: 1p
Bill Spight wrote:
Now, we have a lot of data of strong human players not using AI to cheat, going back before AlphaGo. In addition, thanks to the Elf team and GoGoD, we have a lot of data of differences between human play back then and modern bot play. Yes, humans are learning a lot from the bots, and will continue to do so until the law of diminishing returns kicks it. Which might take a decade or two. That amplifies the problems involved. And, OC, you don't have to stick to the Elf data, you can use KataGo and other bots for analysis, as well. It is just that there is a lot of data already available off the shelf. How much and in what ways human play has changed in the AI era is an important question, but it is important to establish an empirical baseline against which to measure changes.

This is part of the research plan we have established with Mr Egri-Nagy, but for now I personally don't know how useful for cheat detection the results can be. In my experience, it is not difficult to tell apart humans from AIs (with the exception of AI-savvy top humans, as shown by the false positive I got for a Ke Jie game – but luckily such players are usually not the target group), but the difficulties start when a player starts deliberately mixing human and AI play.

Bill Spight wrote:
Maybe so, but that is an untested hypothesis. Now, there may be good evidence for that in chess, for human play versus that of pre-neural net chess engines. But it is not a good methodology. As one of my profs stressed, don't throw away data. (Yes, you may have to deal with outliers, but we are not talking about that now. Besides, you deal with them, you don't just throw them out.)

I can completely agree with this.

Bill Spight wrote:
In the case in question, the suspected cheater had, according to Leela, taken a 70-30 lead by move 50, up from perhaps 50-50 or 45-55 or so. By move 180 or so, when the game ended, Leela gave his lead as 85-15. Even if you started off looking at move 50 and later, in percentage terms most of the player's advantage had already accrued, and in half as many plays or less. Say what you will, in that game his best play was already behind him. Wouldn't that be a good place to look for cheating?

It seems to me that you are assuming that percentage differences are linear, rather than for example logarithmic. Comparing them is further made difficult by the fact that different AIs' percentages seem to mean different things; Elf OpenGo might give a position 90% while Leela Zero might only give 75%. For a quick sample with KataGo, in one game I got an early 70% matching a scoremean lead of roughly 2 points, while a 85% matched a scoremean lead of roughly 6 points.

Another issue to me seems to be that 'best play', or 'better play', needs defining. Sure, if we set that 'better play' means 'causing a bigger effect in winrate', then your logic is sound, but I would personally attach much less value to the percentages – among other reasons, because of the above interpretation problem. In my experience, comparing scoremean values leads to a much more consistent model, and it also seems intuitively preferable because human players can better understand 'leading by six points' than '80% chance to win'.

Bill Spight wrote:
Yes, there is a cat and mouse game. As John Fairbairn has pointed out, you need to establish a penumbra around cheating so that certain things that non-cheaters do may be disallowed, and certain things that non-cheaters do not currently do must be required. Honest players need to bend over backwards to avoid the appearance of cheating. Such is life.

If it finally comes to this, I think I would rather let some smart cheaters run loose than create a culture where players have to eschew certain moves just to avoid suspicion. As we for now have little idea of just how good a cheat detection model is possible, however, personally I find little interest in contemplating the options. Compared to chess, I think go has more hope because of the longer games and larger number of relevant options.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #27 Posted: Thu Nov 12, 2020 2:38 am 
Oza

Posts: 3657
Liked others: 20
Was liked: 4631
Quote:
If it finally comes to this, I think I would rather let some smart cheaters run loose than create a culture where players have to eschew certain moves just to avoid suspicion.


If you alter this sentence to read something more general like "I would rather let some clever manipulators have the chance to avoid detection" you will find it is accepted (grudgingly) around the world in domains outside go by administrators - people on the server side. For example, governments offer financial benefits to people in need and almost invariably take the view that it is better to ensure that 100 legitimate people get fed while one person may cheat and get away with it.

However, on the client side, people regularly show that they deeply resent the cheaters and demand more action against them. The administrators therefore try to strike a balance. Often at great expense they will set up investigation bodies, which ordinary people end up paying for via taxes. People may have to do much extra an onerous work by providing bank statements or refences and the like. More insidiously, governments tend to try to suppress the incentive to cheat by keeping the benefits much lower than they could be. A further measure governments use is the law, with severe penalties and disgrace for convicted cheaters. We end up with something that works - after a fashion. Cheating still happens, ordinary folk still bear the cost and feel disgruntled.

If we try to superimpose model back on to online go, we see that administrators are offering the service of running tournaments and are putting in some measures to detect cheating, while ordinary folk are bitterly complaining about losing games to suspected cheats. It may not cost them a fortune, but there is lost opportunity cost, the cost and hassle of buying webcams or the like, but on the whole it's the sour taste and extra hassle that bothers them most. The "law" - of a kind - operates: cheaters can be disqualified, i.e. there is a penalty.

The one thing that seems to me to be lacking in this superimposition of the model onto go is "disgrace". From what I can see, games are annulled because of suspected cheating but the perpetrators are not named. The argument seems to be that cheating can only be suspected, not proven. Even a fantastic anti-cheating programme that actually works is never going to be accepted as proof enough to out someone publicly.

In the end, therefore, if that remains the case, we can't really apply this real-world model to online go to make it work in the unsatisfactory but bearable way we accept for real-world issues.

But is there a solution? Yes. I think there are two. Imperfect, but good enough to make playing organised go bearable.

One is to eschew online go altogether. Play live. Apply a similar anti-cheating ethos, but the focus will be on smart phones, toilet visits and the line, where detection is of a level that can act as proof. That adds the element of disgrace. Chess has used this model. It seems to work - after a fashion :) This solution can be reinforced by organisations such as the EGF strenuously resisting the offer of large cash prizes for online events, and even refusing to count results in online events for rating points and promotions. That may be seen as a drawback, but the compensation in sociability, travel and public exposure (i.e. PR) that live go offers seem to outweigh that heavily.

The other solution I offer is to create a form of implied disgrace. Professionals are already laying online games in which the screen shows, in real-time I believe, a graph of how well a player's moves are matching an AI bot. Both players can be allowed to see this simultaneously. If the match is suspiciously close, a player can cry foul, of course. But even without that, what he can do is take action by himself. He can refuse to play that player in future, for example. He can show his friends, team members, go associations or whatever an independent graph. This requires players to use their real names online, of course, but that should be a sine qua non anyway. This is all a little extreme, and should be combined with a downplaying of online go anyway, but this solution can (I imagine) be made available quickly and is less extreme that the problem that has called out for solutions.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #28 Posted: Thu Nov 12, 2020 5:04 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
NordicGoDojo wrote:
Bill Spight wrote:
Now, we have a lot of data of strong human players not using AI to cheat, going back before AlphaGo. In addition, thanks to the Elf team and GoGoD, we have a lot of data of differences between human play back then and modern bot play. Yes, humans are learning a lot from the bots, and will continue to do so until the law of diminishing returns kicks it. Which might take a decade or two. That amplifies the problems involved. And, OC, you don't have to stick to the Elf data, you can use KataGo and other bots for analysis, as well. It is just that there is a lot of data already available off the shelf. How much and in what ways human play has changed in the AI era is an important question, but it is important to establish an empirical baseline against which to measure changes.

This is part of the research plan we have established with Mr Egri-Nagy, but for now I personally don't know how useful for cheat detection the results can be.


Oh, in itself I doubt if it is worth much, either. But it is a start. And it has the advantage of examining the play of humans who we are quite sure are not using AI to cheat. :) In addition, there is a lot of data against which to test hypotheses. Back in the days of rats in Skinner boxes, our first lab assignment in a course on learning was to put a rat in a Skinner box and observe its behavior without any reinforcement. By itself that showed next to nothing of interest, but it was an important first step to establish the rat's baseline behavior. To quote Rudyard Kipling, "Softly, softly, catchee monkey." :)

NordicGoDojo wrote:
In my experience, it is not difficult to tell apart humans from AIs (with the exception of AI-savvy top humans, as shown by the false positive I got for a Ke Jie game – but luckily such players are usually not the target group), but the difficulties start when a player starts deliberately mixing human and AI play.


Yes, clever mice are a problem in the cat and mouse game. :) There is the saying about not trying to run before you have learned how to walk, but in today's cat and mouse game the cat has little choice in the matter.

And humans are becoming AI savvy rather quickly, since everybody has access to strong programs. It seems to me that most pros nowadays play nearly perfect openings, from the point of view of today's bots, because they try their ideas out on the bots before trying them out in real life, or if not, they copy popular plays and sequences (AI fuseki and joseki).

NordicGoDojo wrote:
Bill Spight wrote:
In the case in question, the suspected cheater had, according to Leela, taken a 70-30 lead by move 50, up from perhaps 50-50 or 45-55 or so. By move 180 or so, when the game ended, Leela gave his lead as 85-15. Even if you started off looking at move 50 and later, in percentage terms most of the player's advantage had already accrued, and in half as many plays or less. Say what you will, in that game his best play was already behind him. Wouldn't that be a good place to look for cheating?

It seems to me that you are assuming that percentage differences are linear, rather than for example logarithmic.


Actually, no. My preference is to use logits. (But in terms of testing evaluations, I have found that there are problems with them, as well.) In this case I was attempting to take the point of view of the naive investigators and to show that they had evidence that it would be a good idea to look at the early play.

NordicGoDojo wrote:
Comparing them is further made difficult by the fact that different AIs' percentages seem to mean different things; Elf OpenGo might give a position 90% while Leela Zero might only give 75%. For a quick sample with KataGo, in one game I got an early 70% matching a scoremean lead of roughly 2 points, while a 85% matched a scoremean lead of roughly 6 points.


The fact is that presumably objective measures, such as the probability of winning a game in self play from a specific position, have never been empirically validated. Never. As a result, like utils, the measure of human utility in economics, they are subjective. You can't compare evaluation measures across bots. They may be of use internally to the programs, but for that all they have to do is to order alternatives well enough. Validating them empirically is a waste of time if you are writing a go playing program. OTOH, if you are writing a program to analyze and evaluate positions, it is a necessity. However, at this point in time, people are happy to use strong players as analysts.

NordicGoDojo wrote:
Another issue to me seems to be that 'best play', or 'better play', needs defining.


Well, as I indicated, don't look to the bots for that. All a strong AI player needs to find is good enough plays. And humans fairly often come up with plays that the bot never considered, but which it evaluates as almost as good as its top choice, or even better, sometimes much better. And, OC, we have to take the bots' evaluations with a grain of salt, because they have not been empirically validated. Better to think of them like human feelings based on experience. The lack of empirical validation means that we do not know the significance of, say, an play that gets a 60% winrate estimate and one that gets a 55% estimate. We may think that the one with the higher estimate is probably better, but who knows? And, OC, the number of visits matters, but the number of visits is dictated by evaluations, so there is a circular logic there. We humans may care about the evaluation of specific plays and positions, but those evaluations are only part of what goes into making a strong bot. If accurate evaluations were necessary to make a strong bot, the programmers would have aimed to make accurate evaluations. They are not necessary, and they didn't.

In chess, IIUC, Ken Regan has come up with Elo evaluations of specific plays or positions, which is a remarkable achievement. :bow: In go, I think that we are at least a decade away from anything comparable. Quien sabe?

NordicGoDojo wrote:
In my experience, comparing scoremean values leads to a much more consistent model, and it also seems intuitively preferable because human players can better understand 'leading by six points' than '80% chance to win'.


I agree that evaluations in terms of territory or area is an important step forward. Many thanks to lightvector. :salute: :bow: But they have not been empirically validated, either.

NordicGoDojo wrote:
Bill Spight wrote:
Yes, there is a cat and mouse game. As John Fairbairn has pointed out, you need to establish a penumbra around cheating so that certain things that non-cheaters do may be disallowed, and certain things that non-cheaters do not currently do must be required. Honest players need to bend over backwards to avoid the appearance of cheating. Such is life.

If it finally comes to this, I think I would rather let some smart cheaters run loose than create a culture where players have to eschew certain moves just to avoid suspicion.


That's not what I had in mind. I was thinking more of things like webcams and screensharing for online tournaments.

But you make a good point. I used to be a tournament bridge player. Because it is a partnership game with hidden information, cheating is a threat to tournament bridge. Every strong player I know has a high ethical standard and bends over backwards to avoid taking advantage of possibly illicit information and to avoid the appearance of cheating. OTOH, innovation has been stifled by the fact that the innovators know more about the implications of their methods than their opponents. That has led to suspicions and allegations of cheating, and there are those who believe that that knowledge, which cannot be conveyed to the opponents in a few minutes, is private and in itself illicit. As a result, many new methods have been outlawed or severely restricted.

A similar atmosphere in go where certain plays result in suspicion would, IMO, be deadly. OTOH, since cheating has reared its ugly head, it is high time for strong players to adopt high ethical standards. :)

----

Edit: Let me give an example of something that players might adopt as part of a high ethical standard. In chess, recently a teenager was found guilty of cheating and suspended for a couple of years. In her defense she claimed that she came up with one amazing move while she was in the bathroom. (She also went to the bathroom surprisingly frequently, even for an elderly gentleman with prostate problems.) Now, a behavior which ethical players might adopt would be, except in emergencies, to go to the bathroom on your opponent's move. The danger, OC, is that your opponent might make a play and punch the clock while you were away. But that is the price an ethical player might be willing to pay to avoid the appearance of cheating. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #29 Posted: Thu Nov 19, 2020 1:25 pm 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Just linking here a reddit thread (scarce in details, links to Chinese news articles) about a new cheating case in Korea, involving the Korean Sumire great young hope (TM) Kim Eunji 1p. https://www.reddit.com/r/baduk/comments ... _cheating/

Top
 Profile  
 
Offline
 Post subject: Re: Derived Metrics for the Game of Go
Post #30 Posted: Thu Nov 19, 2020 2:14 pm 
Judan

Posts: 6162
Liked others: 0
Was liked: 789
In that thread, the only hint at the child cheating is the rumour "kind of admitted to it". Without evidence, why is there endless prejudgement?

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 33 posts ]  Go to page 1, 2  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group