Life In 19x19

Posted: **Thu May 17, 2018 3:51 pm**

dfan wrote:There is no particular reason that winning percentages have to be related in this exact mathematical way.

For example, Alice, Bob and Carol all play the classic game "Whose random number is bigger?". Alice is a beginner and picks integers from 1 to 100 uniformly at random. Bob is more experienced and picks integers from 51 to 150 uniformly at random. Carol is an expert and picks integers from 101 to 200 uniformly at random (she's very good at this game, though you can probably imagine even better strategies).

How often does Bob beat Alice? How often does Carol beat Bob? How often does Carol beat Alice?

Bob beats Alice ⅞ of the time, with at win/loss ratio of 7/1. Carol beats Bob ⅞ of the time, with at win/loss ratio of 7/1. Carol always beats Alice. If we estimate Carol's win/loss ratio as (7/1) (7/1) = 49/1, OC, the win/loss ratio is off by infinity. However, the winning percentage is off by only 2%.

Posted: **Thu May 17, 2018 4:24 pm**

Suppose we play a hold'em game. We have 3 possible starting hands. You choose a hand first, I choose second. We then play out the flop, turn and river with no betting.

We bet an amount on the outcome. Surely you are bound to win, as you choose first?

The 3 possible hands are:
The red 2s
6, 7 of spades
Ace of spades, king of clubs.

Posted: **Thu May 17, 2018 5:56 pm**

dfan wrote:
Bill Spight wrote:How is it an example? Doesn't it depend upon the structure of the game and a presumed definition of expertise at it, rather than the distribution of the game results per se?
Your point there is no necessary relationship between the win rates of A vs. B, B vs. C, and A vs. C is well taken. But I don't think that is what moha is saying.
I thought it was an example of a "shape of the actual distribution of single game performances" (uniform rather than Gaussian or somesuch) but maybe I was misinterpreting moha's phrase. For one thing, I was interpreting "game performance" as being a function of a single player, and whoever has the better performance wins; perhaps something else was meant

I think we talk about the same thing, and your example seems good (edit: or maybe not?

Bill's billiard example may be more interesting - exponential? but the difference is still more normal). I assumed that there is an individual performance distribution for both players (pointwise for simplicity - verifiable in go), and that game result distribution is a function of those two and can be different (though with normal individual distributions the difference will be normal as well). In the A>B>C 60%+60% situation it may even be possible to design games with individual shapes for either extreme (A beats C in 61% or 99% - though this may need distorting factors, since the difference will usually be a bit more normal shape, like in dfan's example).

About "there is no necessary relationship between the win rates of A vs. B, B vs. C, and A vs. C": it seems to me that - assuming the simplest case like no correlations, players performances are independent, etc. - there is a relationship, which depends on the individual performance distributions (thus varies by game type).

Posted: **Fri May 18, 2018 6:06 am**

drmwc: Your example seems to be a second player advantage game, not quite the same as the A>B>C 60%+60% question.
For the the latter, a slightly similar and interesting example with pure uncorrelated distributions:

Player A picks number 30 (20% of cases) or 3.
Player B picks number 20 (50% of cases) or 2.
Player C picks number 10 (80% of cases) or 1.

But this still seems an r-p-s like situation, which can be considered a distorting factor (individual distributions differ in shape).

Posted: **Fri May 18, 2018 6:51 am**

moha wrote:drmwc: Your example seems to be a second player advantage game, not quite the same as the A>B>C 60%+60% question.

Apologies to drmwc for butting in, but I take it as an example of non-transitivity.

For the the latter, a slightly similar and interesting example with pure uncorrelated distributions:

Player A picks number 30 (20% of cases) or 3.
Player B picks number 20 (50% of cases) or 2.
Player C picks number 10 (80% of cases) or 1.

But this still seems an r-p-s like situation, which can be considered a distorting factor (individual distributions differ in shape).

I'm not sure, but your idea of individual distributions seems to be related to what I am calling different strengths and weaknesses of different players (multi-dimensionality). Which can also cause non-transitivity.

Posted: **Fri May 18, 2018 7:19 am**

Bill Spight wrote:I'm not sure, but your idea of individual distributions seems to be related to what I am calling different strengths and weaknesses of different players (multi-dimensionality). Which can also cause non-transitivity.

Yes, but this is similar to rock-paper-scissors. I tried to exclude such distortions, and assumed uncorrelated performances, and that the players only differ in strength (position and variance but not shape of distribution). Then that shape still seems to matter for the accuracy of the oddswise approach.

Posted: **Mon May 21, 2018 8:37 pm**

Just wondering, there is a rather sharp upturn in strength graph at elo 10800, does that correspond to introducing ELF OpenGo to train LZ?

Posted: **Tue May 22, 2018 5:21 am**

chut wrote:Just wondering, there is a rather sharp upturn in strength graph at elo 10800, does that correspond to introducing ELF OpenGo to train LZ?

Yes. (You can see the ELF network in the graph in multiple places; it's the gray cross with hash code starting with 62b.)

Posted: **Tue May 22, 2018 7:12 am**

In this series of matches with Haylee, there is no 3,3 point invasions even with 2 stones handicap, so the network weight used is the 'human' one?

https://www.youtube.com/watch?v=hExYHwtsra8

I find the zero human network quite bad with handicap games. Leela 11 would trash me with 4 stones handicap, but Zero would start by invading all the 3,3 points making the game much easier and much less interesting.

I am wondering whether there is a way to tweek the MCTS for handicap game, for example to favor branches that may not be the best, but with the highest number of sub-branches that are near optimal.

Posted: **Tue May 22, 2018 7:23 am**

People on the LZ team have been thinking about how to make it play handicap games better: https://github.com/gcp/leela-zero/issues/1313

Posted: **Fri May 25, 2018 7:39 am**

TWOGTP matches :

1) LZ's networks
Matches between networks #0, #10, #20, #30, ..., #140.
For each network, two 100 games matches against network #70, which is the reference point.
For example, #0 never wins against #70 (1st run = 0 win out of 100 games, and 2nd run = 0 win),
and #140 almost always wins against #70 (1st run = 99 wins out of 100 games, and 2nd run = 99 wins).
twogtp, with LZ015, --visits=51 --noponder
For example, line 60-70, 29, 17 means network#60 won 29 games out of 100 against #70, and 17 games out of 100 in the second 100 games match.
Two odd things :
#20 won 1 game against #70 !
For #60, the two results vary a lot (29 and 17)

: netw.jpg (94.5 KiB) Viewed 16255 times

2) Zen7 vs LZ with networks #...
--visits=3201 --noponder for LZ and
-t 12 -T 1 -s 850 (gtp4zen)
Each match is 20 games (10 as B, 10 as W)
Zen takes about twice as much time as LZ

: zen.jpg (67.53 KiB) Viewed 16255 times

Posted: **Sat May 26, 2018 1:02 am**

For an idea of what these win ratios mean in terms of (weaker) human rank difference, check https://senseis.xmp.net/?EGFWinningStatistics. e.g a 3d beats a 6d about 8% whilst 4d beats 7d about 3%.

Posted: **Sat May 26, 2018 10:11 pm**

Network #144 (9e88) was promoted against #143 (057a) by winning 54.84% of its 403 games.

Is it more or less reproducible ? (I think official matches are with 3201 visits).
Here are five twogtp matches, 403 games per match (--visits=xxxx , --noponder)
Up to 200 visits, win% is fluctuating wildly (75% at 0 visit, and then less than 50% with few visits)
Then at 3201 visits, it's 52.35%, which is not bad.
(I won't make a lot of these 400 games matches with 3200 visits, because it takes a long time, even with good GPU)

Posted: **Sun May 27, 2018 6:05 am**

This may be a good time for buying a lottery ticket.

luck will usually be there as that is still an easy way towards promotion. Most networks with >55% winrates will in fact be around 52% or so.

OC this assumes promotion is rare (is a kind of survivor bias). And the difference somewhat scales with the number of sims, so the advantage of the stronger net will likely be bigger in deeper searches.

Posted: **Mon May 28, 2018 2:36 am**

The 74.68% win rate of network 9e88 against network 057a (with --visits=1) seemed weird…
Was it due to the relatively small sample (403 games) or to --visits=1 ?

Here are 9 twogtp matches (3x403 games, 3x1000 and 3x10000). I've kept the results reports generated by twogtp, here is one of these.

9e88_057v1_1.zip: (26.19 KiB) Downloaded 547 times

If someone is interested, I can upload the other ones.

: 9e88.jpg (161.29 KiB) Viewed 16019 times

I was expecting the max variation to decrease as the number of games increased… But going from 35% to 66% ???
Am I doing something wrong ? Has someone tried something similar ?
Parameters :
--gtp --weights=xxx --visits=1 --noponder -r 10 and
-games xxxxx -sgffile C:\... -auto -komi 7.5

Curiously, the overall win% is around...52%

Life In 19x19

LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression