Just to be clear on terminology, if the same position is reached by multiple sequences, but that position has not occurred before in the history, then it is normally called "transposition" in the modern tree search and algorithm literature, not "reversion". The word "reversion" shares the same stem as "reverse", i.e. to go backwards, and could be better if you were talking about a long cycle (e.g. superko) in which there was some return to a *prior* state. But it sounds like you are talking about "transposition", in which no long cycles are involved, rather two or more parallel paths of moves meet back together or cross over one another in the positions they lead to, hence the stem "trans" suggesting to "travel" or "cross".
RobertJasiek wrote:
Suppose we are in position P with two move candidates PA and PB both leading to the same follow-up situation Q (same follow-up position with the same follow-up player having the next turn) via different intermediate reversal sequences. Call QA the situation Q reached from PA and call QB the situation Q reached from PB.
No. Unless there is some ko or superko-related reason why they cannot be combined (which is usually not the case), QA and and QB will already be the same node Q in the search tree. This is *why* you see a larger visits/s. If the transposition were not detected, then PA and PB would be much slower to increase in visits because they would each be repeating the work instead of sharing the work. Sharing the work is what is causing the faster search.
RobertJasiek wrote:
During early search, PA and PB vales differ somewhat and search is normal, low. After some time, PA and PB vales are close. Thereafter, they permanently fight for being blue with either being blue changing frequently and indefinitely, and search fluctuates between high and low rather often and indefinitely. KataGo tries hard to find the better of PA and PB.
Flickering is not evidence that KataGo "tries hard". Flickering is simply a reflection of the fact that two moves are almost equal, so either one could be chosen. It implies nothing about the "effort" in computing those moves, it is merely showing that two floating point values are almost identical in the move selection algorithm, so that either one may be epsilon larger at a given moment. The individual flickers can be due to tiny value differences. PA and PB will each individually be exploring some other moves that do not transpose, and that tiny fraction of playouts will act as slight noise on top of the dominant shared value from Q.
RobertJasiek wrote:
KataGo does so by blowing up the subtrees QA and QB (and maybe the branches of the paths from PA to QA and PB to QB).
No, the single subtree Q is being blown up, and collectively PA and PB's total visit count are increasing more rapidly because incrementing a visit to Q usually allows each of PA and PB to also increment a visit the next time each is updated.
RobertJasiek wrote:
Everything indicates that KataGo continues this fight for blue without ever understanding that reversion occurs.
Again it sounds like you may be misunderstanding fluctuations as evidence of there being a "fight" or something inefficient or effortful happening in an algorithm. Instead two moves are sharing almost all of their work and therefore have almost the same value. Therefore at any given moment if the GUI asks KataGo which one would be chosen, it can get an answer that differs arbitrarily at different times based on insignificant things.
RobertJasiek wrote:
Even if the reversal sequences are short and, for a human, simple, KataGo also uses high analysis indefinitely.
What is "high" analysis? Do you mean high CPU usage? High CPU usage often indicates that the search is proceeding much more quickly, because GPU is normally the bottleneck. For example, if we magically were able to double the speed of the GPU code by finding some new optimizations, then KataGo would be much more efficient than before, but CPU usage would increase, because now more CPU would be required to feed positions faster to the GPU to keep the GPU at full or near-full usage. As long as GPU is the bottleneck, the CPU use only indicates the *relative* compute from CPU vs GPU, it does not definitely indicate the absolute efficiency or inefficiency of either one.
RobertJasiek wrote:
If KataGo understood reversion, it should detect reversion and prune the PB - QB subtree having 1 million visits by pointing to the PA - QA subtree having 5 million visits within a fraction of a second, but KataGo does not do it.
RobertJasiek wrote:
Do you think that KataGo implements reversion? If so, why does it take (even much!) more than a fraction of a second to increase the PB - QB subtree from, say, 1 million visits to 5 million visits? Without implemented reversion, KataGo can be improved by it. With allegedly implemented reversion but more than a fraction of a second to (almost-)equalise numbers of visits, KataGo must have a related bug.
We don't increase PB from 1 million visits to 5 million visits instantly because there is no urgency or importance to do so. Q is still shared and both of PA and PB are reading its latest value. When work is done on Q, both PA and PB benefit from the updated value. In such a situation, why does the numerical visit count of PA vs PB matter? It doesn't. Q is still searched the same either way.
Maybe PB does catch up, but only gradually. Perhaps the way the PUCT exploration formula happens to behave in a particular situation is that every time we increment PB by 2-3 visits (which happen very fast because PB sees that Q is already at 5 million visits and doesn't need to do any GPU evaluations), we also visit PA again and this takes some time because to increase PA more we need to increase Q more and that costs a GPU evaluation. Assuming the are no transpositions elsewhere and no other relevant effects, this would manifest to you as the number of total visits per second increasing by a factor of 3-4 while PB very gradually catches up to PA in visits, because per GPU evaluation we get 1 new visit on PA and we get the 2-3 GPU-less catch-up visits on PB. If the overall search is still bottlenecked by GPU evaluations on Q, there is no inefficiency.
Depending on the position, PB probably does also need to send some small fraction of a percent of its visits into other moves in order to catch up from 1 million to 5 million (again, soundness requires that in general as visits -> infinity, every move gets explored infinitely often just in case some deep hard-to-discover variation actually turns out to be better). So we can't simply do "PB.visits += 4000000" without spending a little bit of GPU time on these alternative moves. But again there may be no urgency to do that even if it would be cheap. If most of the compute is focused on expanding the shared Q (via PA-Q) rather updating PB's visits, and therefore PB's visits only catch up gradually even though the amount of work to catch up PB by searching its alternative moves would be small, that's fine. Spending more compute on Q is more important.