And you are right again:
hoa803 wrote:It's still fun to do the matches, though.
hoa803 wrote:It's still fun to do the matches, though.
hoa803 wrote:The thing nobody seems to be talking about in this thread is the confidence interval.
All these 20, 30 ... game matches aren't gospel, obviously. Match parameters vary wildly (different gpus, different time per game, different number of visits, usage of -m, -r, etc. etc.)nbc44 wrote:All these tests are just rubbish
Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece.Vargo wrote:But all these matches are really fun to run, and I think, taken as a whole, they can give an idea of the strength of the different networks.
They're not gospel, but they're not rubbish either, even with so few as 20 games.
For example
20 game match XXX v. YYY , result 15-5
If XXX and YYY were the same strength, there's only 2% chance for XXX to get at least 15 wins. It's not unreasonable to think that XXX is stronger, (particularly if there are several such matches going the same way).
I'm just curious. In those 52 games, was ELFv2 running on LZ 0.16 or 0.17?hoa803 wrote:
Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece.
And wrote:CS shows area + handicap, and sabaki and zen - territory
Thxjlt wrote:LeelaZero counted the score as territory + prisoners (+komi for White)
Crazystone counted 182 points for Black, which corresponds to (black living stones)+(black territory)
Code: Select all
#219 v elfv2 ( 400 games)
wins black white
#219 175 43.75% 65 41.67% 110 45.08%
elfv2 225 56.25% 91 58.33% 134 54.92%
156 39.00% 244 61.00%