Bill Spight wrote:
the significance of the kakari having a 2% higher evaluation than the pincer. Is the kakari objectively better in some sense, while still open to error, or does Master simply like kakaris better than pincers? It is obvious that Master plays pincers less often than humans. There are not enough published Zero games to say much, but it appears that Zero pincers when Master does not. And Zero is the better player. Maybe Zero would prefer the pincer by 2%. Quien sabe?
Maybe, but I think Zero is better mostly because of its huge speedup and the resulting deeper search. My impression was - which seem to match Redmonds' and other reviewers so far - that Zero shows incredible tactical accuracy that even Master is unable to keep up with sometimes.
Anyway, is there really a practical difference between A being better than B in some sense, and a strong teacher likes A better? (as long as no perfect solution or stronger teacher is available) Those 2% are partly backed by Master's value net which you seem to respect, and its compressed experience of selfplays from similar positions. Actually, since these are mostly opening positions with nearly empty boards, there should have been several selfplay lines starting from the exact positions in question. So the 2% corresponds for the case Master vs. Master - which may differ from Pro vs. Pro, or Zero vs. Zero. If Zero likes to pincer that only means it has higher (actual) winrate in the latter case.
Quote:
If the evaluations were actually probability estimates, then it would be possible to provide error estimates, as well. But since they are not actually probability estimates, there is nothing to compare them with to tell us what the errors may be.

I'm not sure if this is what you mean, but since these are binary estimates, I don't think a separate error estimate is necessary - variance or (in)confidence can be derived from the estimate itself (p*(1-p)). There is no real difference between saying that winrate is 51% or it is 99% with high error or deviation (it can only be really 99% if the errors are expected to be low).
