Page 8 of 28

Re: LZ's progression

Posted: Thu Aug 16, 2018 6:45 am
by Vargo
moha wrote:One could also test if everything is ok (wrt settings / corrupted network files / etc) by a quick test at equal 1600 visits
Everything seems ok, I did a 20-game check #157 v 1fdfb1 (40b) at --visits=1601.
1fdfb1 won 75% which is reasonable (the "official" score it got was 84.25% : 2018-08-02).
40b is W
40b is B
And 4 additional e2be48 v #157 games at --visits=1601. As expected, e2be48 won them (taking 3 times more time than #157).
40b is W
40b is B
Seing this, I'm still surprised that 40b doesn't perform better at time parity.

Re: LZ's progression

Posted: Mon Sep 03, 2018 11:37 am
by Vargo
20 game match between LZ0.15#157 and LZ0.15#173
twogtp 1.4.10, 5 min per side and per game, no ponder, komi 7.5 (GPU 1x1080)

#157 wins 14:6 (8 wins as W, 6 wins as B) all games by resignation

Still a lot of catching up to do for the new networks...

If someone wants the games, I can upload them.

Re: LZ's progression

Posted: Tue Sep 04, 2018 10:42 am
by Vargo
20 game match between LZ0.15#157 andLZ0.15#174 (the new official best network is 256x40)
twogtp 1.4.10, time parity : 5 min per side and per game, no ponder, komi 7.5 (GPU 1x1080)

#157 v #174 --> 10:10 (7 wins as W, 3 wins as B) all games by resignation

A good surprise, 256x40 seems much stronger than the 256x20 series...

I'll run some more tests tomorrow, to be sure it's not a fluke ;-)
157_174_174isW.zip
(8.63 KiB) Downloaded 579 times
157_174_174isB.zip
(9.49 KiB) Downloaded 574 times

Re: LZ's progression

Posted: Wed Sep 05, 2018 12:30 am
by Vargo
40 more games between #157 and #174.

In all, it's a 60 game match, at time parity (5 min per game, GPU: 1x1080, komi 7.5, no pondering)

Final result : #157 wins 35:25 (58% , 17 wins as W, 18 wins as B)

So, maybe #174(256x40) is not as strong as #157, but it seems stronger than the 256x20 networks.


The 40 more games :
157v174_174isW.rar
(16.62 KiB) Downloaded 610 times
157v174_174isB.rar
(16.01 KiB) Downloaded 560 times

Re: LZ's progression

Posted: Wed Sep 05, 2018 6:38 am
by Gomoto
Vargo is doing time parity

Matches on https://zero.sjeng.org/ are played at 1600 visits

Re: LZ's progression

Posted: Wed Sep 05, 2018 7:26 am
by Uberdude
Even without the extra time advantage the larger networks get with equal playouts in test matches, the Leelo scale from successive 55% promotions is highly inflated. Based on comparisons to Elf I estimated it as around a factor of 5. So if one network is 500 above another the Elo formula says it'd win 95% but in reality it's more likely a 100 difference for 65% (I don't have the resources to actually do a test, it's possible LZ is particularly bad against Elf compared to old versions of itself). Another way to get a similar ball-park figure: Top pros are 3600 on goratings which is kind of a continuation of EGf ratings where a beginner is about 0, whilst LZ is about 12000 now and started at 0 for random (lower than beginner), and 12000 / 3600 is about 5 too.

Re: LZ's progression

Posted: Wed Sep 05, 2018 7:41 am
by Gomoto
have a look at the recent matches, the network is now the offical top dog

Re: LZ's progression

Posted: Wed Sep 05, 2018 7:42 am
by Vargo
A little postscriptum :

5 min per game with 1x1080 is roughly equivalent to --visits=3201 for #157, and to --visits=801 for #174

At time parity, #157 has 4 times more visits and wins.
At visits parity, #174 takes 4 times more time than #157 and wins.

I still feel it would be more natural to determine Elo at time parity (but maybe it would be difficult to do ?)

Re: LZ's progression

Posted: Wed Sep 05, 2018 8:19 am
by Gomoto
Does anybody know why visits are used instead of time?

I think because different hardware does not matter this way.

I also think this is a possible error source for further improvement of the networks.

Re: LZ's progression

Posted: Thu Sep 06, 2018 2:25 pm
by jokkebk
Match games are run with visit parity because the architecture was originally designed for fixed size net, so time and visit parity would be essentially the same.

With the Leela Zero project, there has been a size upgrade every few months or so, and as the change is usually done manually, it doesn't matter since all games after that are again at time parity. Breaks the ELO graph though (or not if you would want visit parity). Self-play ELO graph is not absolute in any case but kind of relative, so this is probably not seen as such a huge issue.

Re: LZ's progression

Posted: Wed Sep 12, 2018 9:52 am
by Vargo
40 games between #157 and #176.

Time parity, 5 min per game, GPU: 1x1080, komi 7.5, no pondering.

#157 wins 29:11 (17 wins as W, 12 wins as B)

Well, almost 2 months since #157... am I the only one to be so disappointed in the new networks ?

Could someone run a 157 v 176 match (at time-parity, with no pondering), just to be sure of these results.
176isW.zip
(16.83 KiB) Downloaded 554 times
176isB.zip
(17.46 KiB) Downloaded 538 times

Re: LZ's progression

Posted: Wed Sep 12, 2018 1:50 pm
by moha
Vargo wrote:Well, almost 2 months since #157... am I the only one to be so disappointed in the new networks ?
You probably won't see a fast improvement in these 1s/move games, even if the slower networks are getting stronger and stronger, because that strength still needs a meaningful sized search tree to do it's work. Below a certain limit more search beats smarter search, no way around that.

But this doesn't mean the new networks are weaker at "time parity" in general - just in these very fast games.

Re: LZ's progression

Posted: Thu Sep 13, 2018 2:10 am
by Vargo
All these time parity matches are 5min per game and per side, so, it's a bit more than 2 sec per move (with 1x1080), which corresponds roughly to 800 visits for #176, and to 3200 visits for #157.
The matches I ran with longer time settings for other networks had similar results...
Anyway, I've begun a new 157 v 176 match, with 2x1080Ti, 12800 visits for 157 and 3200 visits for 176, which should be approximately at time parity (I'll check, with the .dat)

Re: LZ's progression

Posted: Thu Sep 13, 2018 3:42 am
by moha
Vargo wrote:All these time parity matches are 5min per game and per side, so, it's a bit more than 2 sec per move (with 1x1080), which corresponds roughly to 800 visits for #176, and to 3200 visits for #157.
I think in this visit range more search still beats better search. There are around 300 candidate MOVES in a position, so (even if most of them are pruned) this doesn't mean a significantly deep search. Looking a bit deeper is more valuable than the order you FIRST look at the moves (which is all network strength is about).

So in this range I won't expect to see a spectacular improvement between successive networks even if they actually improve. Just lowering official matches from 3200 to 1600 visits had a noticeable negative effect (results more random, promotions scarcer even on a new size).
Anyway, I've begun a new 157 v 176 match, with 2x1080Ti, 12800 visits for 157 and 3200 visits for 176, which should be approximately at time parity (I'll check, with the .dat)
Thanks, this will be interesting. I never saw a statistically significant 40b vs 15b match at more realistic time controls (someone posted similar results on github but also only 1-2 sec/move).

OC testing like this is faster, but if someone uses LZ for serious analysis, he probably would allow at least 10-20 sec per move on 1080ti (nearly 10k visits - which is where search quality should start to overcome the visit disadvantage). And the tournaments these high block nets were first used saw much more visits. On the other hand it is good to know that users with weaker hardware are better off with 15 blocks for now. There probably will be unofficial 15b nets trained on 40 block selfplay data in the future as well.

Re: LZ's progression

Posted: Thu Sep 13, 2018 4:33 am
by Knotwilg
Has LZ also built up a model of the game for itself? Has AlphaGo? I'm confused as to the AI aspect. I understand how it uses MTCS and NN to solve the computation problem, but there's no AI in there, is it? Do these programs always rebuild their intelligence particular to the game? Or has LZ also trained itself like AG has? And what has been the result of AG's training? Did it have an impact only on the MTCS and NN parameters? Or did it rebuild some domain knowledge for itself?

Any articles on that?

Sorry to sound confused.