LZ's progression

nbc44 · Post by **nbc44** » Sat Apr 13, 2019 10:44 pm

hoa803 wrote:It might provoke an interesting discussion - the folks on GitHub don't feel the time parity matches are a good measure of engine strength, but rather visits. I don't claim to totally understand the reasoning but it might be worth looking into.

i think these visit's tests are "сферический конь в вакууме", known in the west as a spherical cow.

P.S. For https://github.com/leela-zero/leela-zer ... -482729494 wright now:
#219 (-v 1600) vs elfv2 (-v 3200)
10 wins, 40 losses
50 games played.

EDIT 1.

P.S.S For https://github.com/leela-zero/leela-zer ... -482957781 wright now:

#219 (-v 1600) vs elfv2 (-v 3200)
22 wins, 125 losses
157 games played.

to be continued...

nbc44 · Post by **nbc44** » Sun Apr 14, 2019 12:09 am

Time parity match with statistically significant result

(part II).
LZ0v17 #219 vs Elfv2
2x1080ti, 10s per move.

Code: Select all

#219 v elfv2 ( 400 games)
           wins        black       white
#219\  155 38.75%   64 37.21%   91 39.91%
elfv2  245 61.25%  108 62.79%  137 60.09%
                   172 43.00%  228 57.00%

(part III).
LZ0v17 #219 vs Elfv2
2x1080ti, 3s per move.

Code: Select all

#219 v elfv2 ( 403 games)
           wins        black       white
#219   153 37.97%   58 35.37%   95 39.75%
elfv2  250 62.03%  106 64.63%  144 60.25%
                   164 40.69%  239 59.31%

hoa803 · Post by **hoa803** » Sun Apr 14, 2019 4:36 am

My suspicion is that time parity might be correct with two entirely separate machines, with the same hardware, and using ponder. Basically what seems to have done in the AlphaZero paper? If I recall they used 90 mins main time and 15s/move byo-yomi. I'm not as sure about doing it on a single machine with --noponder, however.

The reason I wonder is because I know that each move the NN makes is not independent of what it calculated on the previous move(s). We also know that the number of visits calculated on each position will vary wildly for a given amount of time. That will definitely add some serious randomness to the performance of an engine throughout a game. What I don't know is, does that even matter? (see: comment)

Ultimate I think it may be like football. Take the best teams in the world, say Manchester City, Barcelona, etc. Now change the rules of the game in some fundamental way. Maybe some other team will now be stronger.

That may be a poor analogy but basically I'm saying that when we declare an engine stronger given test XYZ, basically all we're saying is under those exact conditions only is that a true statement - especially when the engines are very similar in strength, like elf and lz appear to be at this point.

maf · Post by **maf** » Mon Apr 15, 2019 7:02 am

of course, just an inconvenient truth

nbc44 · Post by **nbc44** » Mon Apr 15, 2019 1:51 pm

hoa803 wrote:My suspicion is that time parity might be correct with two entirely separate machines, with the same hardware, and using ponder.

What do you think about test with one computer, ponder, one dedicated gpu for each side? I believe that this will be a more or less honest test.

splee99 · Post by **splee99** » Mon Apr 15, 2019 4:41 pm

I think one computer only has one interface bus between the CPU and the GPU. So that part is actually shared and the bot using less interface time will take more advantage.

nbc44 · Post by **nbc44** » Mon Apr 15, 2019 8:08 pm

splee99 wrote:I think one computer only has one interface bus between the CPU and the GPU. So that part is actually shared and the bot using less interface time will take more advantage.

For 3200 visits and gpu(!) client? it's funny is not it?

nbc44 · Post by **nbc44** » Mon Apr 15, 2019 11:03 pm

Visit parity match with statistically significant result

LZ0v17 #219 (1600 visits) vs Elfv2 (3200 visits) 2x1080ti

Code: Select all

#219 v elfv2 ( 400 games)
           wins        black       white
#219   129 32.25%   60 31.58%   69 32.86%
elfv2  271 67.75%  130 68.42%  141 67.14%
                   190 47.50%  210 52.50%

In my case, everything is very bad.

nbc44 · Post by **nbc44** » Wed Apr 17, 2019 12:26 am

Visit parity match
LZ0v17 #219 (1600 visits) vs Elfv2 (3200 visits) 1x1080ti per side + ponder
Part1 - #219 (GPU0) vs Elfv2 (GPU1)

Code: Select all

#219 v elfv2 ( 208 games)
           wins        black       white
#219    74 35.58%   34 34.69%   40 36.36%
elfv2  134 64.42%   64 65.31%   70 63.64%
                    98 47.12%  110 52.88%

Part2 - #219 (GPU1) vs Elfv2 (GPU0)

Code: Select all

#219 v elfv2 ( 208 games)
           wins        black       white
#219    81 38.94%   43 39.45%   38 38.38%
elfv2  127 61.06%   66 60.55%   61 61.62%
                   109 52.40%   99 47.60%

Summary:

#219 vs elfv2 (37,26%)
+155-261=0

Aram · Post by **Aram** » Wed Apr 17, 2019 2:14 am

Is the difference in speed between the ELF network and the 40b network really 2x for you?
I know that theoretically that could be true, but if ive understood correctly, the difference isnt nearly that large in practise?

If you load the 40b network in leela, and write netbench 50000 and then load the elf network and write netbench 50000,
do you really play those 50.000 playouts in half the time with the ELF network?

nbc44 · Post by **nbc44** » Wed Apr 17, 2019 3:35 am

Aram wrote:Is the difference in speed between the ELF network and the 40b network really 2x for you?
I know that theoretically that could be true, but if ive understood correctly, the difference isnt nearly that large in practise?

If you load the 40b network in leela, and write netbench 50000 and then load the elf network and write netbench 50000,
do you really play those 50.000 playouts in half the time with the ELF network?

1). viewtopic.php?p=242577#p242577

2).

Code: Select all

c:\apps\l0gpu17\leelaz.exe --precision single -t 24 --gpu 0 --gpu 1  -w C:\APPS\net\00ff08eb.gz

Leela: netbench 50000
50000 evaluations in 58.73 seconds -> 851 n/s

c:\apps\l0gpu17\leelaz.exe --precision single -t 24 --gpu 0 --gpu 1  -w C:\APPS\net\05dbca15.gz

Leela: netbench 50000
50000 evaluations in 29.81 seconds -> 1677 n/s

hoa803 · Post by **hoa803** » Wed Apr 17, 2019 4:46 pm

NBC, if you're using visit parity you shouldn't use ponder. The time to reach that number of visits varies by position. Time parity matches can use ponder on separate hardware though, similar to how Alphago was tested.

There's a thread on GitHub with a visit "parity" (1600 vs 3200) match between 220 and elfv2. The result was inconclusive, seems to indicate they're about the same strength at that visit count.

Early in the match LZ appeared to be stronger with over 95% confidence, but by the end the result evened out.

Edit: a word

splee99 · Post by **splee99** » Wed Apr 17, 2019 8:20 pm

My observation is that elfv2 is well trained to make sharp attacks in the early stage of a game. However it does have many blind spots in a complicated life death situations where LZ can take advantage of.

And · Post by **And** » Tue Apr 23, 2019 10:22 am

What is the 15b strongest network? edb61bc2, 0a963117 or another?

hoa803 · Post by **hoa803** » Tue Apr 23, 2019 4:48 pm

And wrote:What is the 15b strongest network? edb61bc2, 0a963117 or another?

There was a GitHub thread about a 15b trained on 40b awhile back. Unsure if anyone is still doing it.

https://github.com/leela-zero/leela-zero/issues/2192

Life In 19x19

LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression