Yakago wrote:
So you simply made it avoid the 'dagger'/'Flying knife' pattern? with the option on of course.
What led to this change ?
Well, it felt a bit weird to simultaneously have some people in chat lamenting about this issue, yet also simultaneously the majority of them voting to keep the training still "zero"(*) instead of doing the things that make the self-play learning correct the issue. So I did this to try to make both sides happy.
Anyways, if "zero"-like training remains unlucky and never picks it up, I'm pretty confident that it will be "solved" once adding simple ways of leveraging outside data or human and breaking fully from zero. Still, it would be nice to have a way to better improve self-play exploration by itself. Such issues aren't not Go-specific as far as I can tell, from a very broad perspective, you have the same kinds of exploration issues problems in AlphaStar / OpenAI5 / other research that continue to be an unsolved problem for general machine learning.
(*) KataGo is still half-"zero" in the sense that it uses some Go-specific input features and training targets, as well as some optimizations to improve selfplay speed (e.g. Benson's algorithm, minor details around passing), but 100% of the data is via self-play with no use of outside data, and also not even leveraging any other bots either (e.g. ELF or other external weights).