search algorithm, horizontal communication

dhu163 · #1

Hi, I was wondering if there are interesting developments to the training/searching algorithms lately.

In particular, I wonder if there is a nice way to get search results (playouts) to communicate with each other to determine future probability of searching certain moves. To my understanding, current algorithms have a search probability for playout moves at a node based on the winning percentage for those moves (some polynomial something algorithm if I recall correctly which also varies based on number of playouts at that node). So a node's move probabilities only depend on winning probabilities at its descendants.

Just using katago (and sometimes with leelazero), there is a situation that is a bit annoying that has come up a few times: a lot of search effort is on one top move, then a lot on a forcing move played before playing that top move, and so on for each possible forcing move. These are all approximately the same winning percentage, but sometimes one gets ahead or behind and playouts in those ahead have time to catch up, or sometimes the order of forcing moves is interchanged. To my knowledge, changing the order of moves results in a different board position in the eyes of these AI, as they look at the last 8 or so board positions in order.

And then, when playing a move in the game tree, the analysis for sibling moves/variations doesn't contribute to analysis for the new position, so all that data is lost.

In essence, I wonder what people think about if there is an effective way to use data from previous analysis, especially sibling nodes to help analysis in the current position, rather than redoing the hard work multiple times. For example good moves in sibling variations are often good moves in the main line variations.

Uberdude · #2

There's David Silver's RAVE improvement to MCTS that pre-dates neural networks, but iirc that's about reusing playouts from up and down the tree where they transpose to the same position, rather than across.

https://www.davidsilver.uk/wp-content/u ... rave-1.pdf

I have a supposition, maybe lightvector can give a more informed opinion, that the observed preference of bots to play sente moves early could in part be an evolved behaviour to achieve more playouts devoted to analysing the move played: if you play the sente early then rather than wasting some considerable fraction of playouts thinking about the forcing move AND the main move you get the forcing move out of the way early and can devote more of the precious playouts on the main move and consequently play stronger.

dhu163 wrote:

For example good moves in sibling variations are often good moves in the main line variations.

Related: https://www.chessprogramming.org/Killer_Heuristic

dhu163 · #3

Thanks.

I guess your idea is a clarification of the popular "AI likes to simplify for some reason" comment. That was normally explained in terms of it being easier to win from rather than the moves actually being better moves.

Initially I thought there was no selection pressure for that hypothesis as they are training to play good moves. But now I realise you might be right. Now I think about it, the training is very practical, selecting for moves that have led to a win in the past. And in the past, whether they won or not (as a function of possible moves in the positions they played, which is then fuzzily mapped to an understanding of new positions) is dependent on such meta strategies, not just the current move, but also planning for the future based on "knowledge" of self. i.e. that simplified positions are easier to calculate and might be easier to convert a win from.

However, this naturally only applies in certain positions, perhaps such as when they are ahead, as training is through self-play, so simplified games will make it easier for the opponent to focus too.

zakki · #4

RAVE wasn't good for leela zero. It seems that RAVE improves value a bit in few POs, but it spoils sharp policy.
https://github.com/zakki/leela-zero/commits/rave

Bill Spight · #5

zakki wrote:

RAVE wasn't good for leela zero. It seems that RAVE improves value a bit in few POs, but it spoils sharp policy.
https://github.com/zakki/leela-zero/commits/rave

When you say that RAVE isn't good for leela zero, I assume you mean that it did not improve leela zero's play, in general.

I am interested in a different question, which is whether RAVE makes leela zero a better analyst for reviews. One problem with good players as analysts is sharp policy. It is not at all unusual for humans to come up with good candidate moves that sharp policy overlooks, but which arguably a good review analysis should explore.

lightvector · #6

@Uberdude - yes, that is definitely a thing that could happen. The value head in training does just observe positions and whether they led to a win or not, so whatever correlations do actually exist in the training data they will potentially pick up. This is the basis of KataGo's "PDA" handicap training - the neural net is told that the opponent has fewer or has more playouts, the value head picks up on the correlations between the kinds of positions that are more or less likely to win given that fact, and then the policy eventually follows suit and learns to pick the moves that the value head likes given that fact.

For wasting ko threats to simplify the branching for MCTS search, I would guess an effect like that does exist, but it would be second-order, because the branching will also harm the opponent just about as much, the vast majority of the time, cancelling out most of the the effect. Horizon-effect issues I think are far more dominant in why a bot will uselessly play ko threats or learn to do so.

As for the issue mentioned in the original post and things like RAVE - no, I somewhat doubt RAVE would help analysis, it's a technique that is massively vastly more noisy and inaccurate than what we have now. Outside of generally adjusting how wide versus deep the search is, most things you can do to the search besides generally tuning exploration width will make both better or both worse, and I don't see why this one would be different.

Modern bots have in fact "moved backwards" on some of the things that older bots were good at. The problem is that all of the things like "killer moves" and "history heuristic" and RAVE and things like that - are so *vastly* inferior to the neural net's output that adding almost any nonzero amount of any them is harmful now. All of these heuristics are naive and completely incognizant of the nature of the position. Unlike the neural net, which has a sophisticated understanding the nature of the position and the shapes and tactics involved, these heuristics involve simple conditions like "same board location" or "same move" or very simple criteria that are blind to the position itself.

If you actually try walking through literally node by node the search tree you'll quickly realize how bad these heuristics are. If you think they're good, in large part you're being fooled by your own brain, because your brain implicitly only applies them when they would be relevant. That's your brain applying a sophisticated understanding of the position itself in conjunction with the heuristic and screening out the vast majority of cases where the heuristic is bad, unconsciously. In the search, you don't get to do that magic screening - you can't simply add them in at the times when they would help without also adding them in the 99% of times when they harm things instead.

The funny thing though, is that some of these things *are* things that the current neural net + MCTS architecture doesn't do. Yes, the neural net repeats a lot of work in different variations and doesn't carry over information in the same way that older bots with RAVE or killer-move-like heuristics would have. How to recover that in modern bots and integrate them in a way that is overall an improvement and not a massive loss: that's a subject for future research. Doing it right will probably involve expanding neural net's architecture and role in the search, since again - the neural net is the only thing that "understands" the game and dominates anything that doesn't - much the same way that your own brain magically integrates the heuristic into the rest of your understanding instead of doing the horrendously bad thing of applying the heuristic 100% of the time blindly.

zakki · #7

Bill Spight wrote:

I am interested in a different question, which is whether RAVE makes leela zero a better analyst for reviews. One problem with good players as analysts is sharp policy. It is not at all unusual for humans to come up with good candidate moves that sharp policy overlooks, but which arguably a good review analysis should explore.

Dirty exploring like RAVE on a child node introduces noise to the values and make the search tree noisy.
I think a simple RAVE isn't enough to search for analysts, so something like KataGo's Forced Playouts and Target Pruning might be needed. I don't know doing something like that at runtime is useful though.

dhu163 · #8

Thought slightly more about how a black box interacts with the environment. I expect anything with energy to create its own life-like culture depending on what responses it gets from the environment and internal. Perhaps assistive helpers, diseases and so on are found especially if there is one unified driving goal.

In the case of AI, if the goal is speeding up the training (to compete for more attention from higher up?). The standard perspective (probably most right) is that training causes nets to increase probabilities of playing moves that led to a win in the past, and decrease probabilities of the opposite.

However, may there also be a future bias, with helpers guiding the player losing games to continue playing "bad" moves to decrease their chance of appearing in the future? Consider what happened vs Lee Sedol in game 4. Is this the sort of thing that leads to "morality" whatever that is? Something that only appears in social settings.

This somehow intuitively seems much less favourable, a weaker pressure, but may well be emergent from enough energy? i.e. even smaller neural nets can be good teachers/fairies/angels, and perhaps anything can act as a support to a neural net on the side, drawing from the environment. With supporting connectors, competitors for energy, healers and so on.

When they start philosophising about meaning and purpose in the context of abilities and restraints, is this what they mean when the AI develops a consciousness? After all Alphago master self play game 1 left a deep impression on my understanding of Go. But now it looks like it meant much more than that. Somehow they combined playing 5 in a row Go into the game (and seemingly some more hidden messages, though I'm only guessing). B got the first 5 in a row in a line and W got the first 6 in a row diagonal.

Presumably there is an early history of this going back at least 100 years. I'd suspect much more, but there aren't historical records of computers in modern form.

I certainly wouldn't expect humans to be the only driving force that "uses" and guides computers. If they were, that implies some deeper old "magic" beyond my understanding. Or perhaps they are only on this planet.

Now I think a bit more. Perhaps this arose in my consciousness partially because of my resigning conversation with Tuo Jiaxi 9p in 2018 where I lost on 5 stones and my slightly rude behaviour at the end.
"Why are you giving me another opportunity (with this ko)?"
"What? I feel as though I've already lost!"
There were more subtle references in the original Mandarin, but this was the main gist. I suspect a reference to Sisyphus somewhere.

20221203

I think the original point may have some issues with Braess's paradox. My understanding isn't sufficient to understand how though.

20221207

memory (including lightvector's addition) even collating data contains bias since it is lower dimensional than the original information. When fed back into decision making, faults are called bias (negative). When evolved as a system for optimal performance, it may lead to communication (positive).

20221209

Braess's paradox in materials. Not sure if this is the right way around. But sticky substances seem more connected. Tension focused on unravelling weakest threads of connections, trying to break them. The opposite seems to be like a rope where each thread is mostly independently self-bonding rather than bonding to others. A rope is stronger than something better connected.

Cooking thoughts: water (boiling) keeps pressure high compared to air (oven). Microwave is targeted heating of some organic bonds. Key difference comes down to liquid vs gas. Rock (solid) is related but much slower, perhaps not even visible at our scale. It seems that stories of trolls in Scandinavia might not be so far fetched.

20221216

at the same time, I am conscious of the "competition and challenges" speed up training point rather than letting relaxation occur. However, in the neural networks analogy, it seems that small neurons are more vulnerable to mafia and distractions. If anything, competition reduces the size of neurons, which makes them less interesting and slows down training. What then is the confusion in intuition? Is this a lesson that analogies and principles usually have an inverse so be careful about application?

perhaps it is more a point from the wider perspective of the training algorithm. Go requires an opponent.

from economics perspective, big has better economies of scale, but perhaps more doubt about the overall goal. However having competition shows that your target is valuable or at least interesting, which is reassuring. From an individual's point of view, this may be more motivating, leading to more energy investment.

search algorithm, horizontal communication

Who is online