It is currently Fri Apr 19, 2024 12:22 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 53 posts ]  Go to page Previous  1, 2, 3
Author Message
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #41 Posted: Thu Jan 16, 2020 6:39 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
dfan wrote:
Bill Spight wrote:
xela wrote:
LZ's neural nets are not symmetric.

Why not? I can guess, but I'd rather not.

How could they be? Can you define the mathematical constraint you'd like to put on the convolutional kernels?


Well, considering that the square go board, unlike the chess board, has no essentially identifiable sides, or quadrants, or semi-quadrants, and thus has eightfold symmetry, why shouldn't the neural nets have eightfold symmetry?

Quote:
One thing you could theoretically do to handle symmetries is to canonicalize every input position to the network. Create a total ordering on board positions (e.g., sort positions by the state of the A1 point, then the A2 point if the two positions are the same at A1, then the A3 point, etc.). Then when you want to evaluate a position, take the eight transformations of it (including the identity), select the one that results in the "smallest" position, apply that transformation, send it through the network, and then apply the inverse transformation to the policy. That sounds like kind of a waste of energy, though.

Similarly, you could define the output of the policy network to be the average of all eight policies on the transformed positions, but now you're effectively doing eight times as much work to evaluate each position.


I don't know what assumptions you are making, to make things so difficult.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #42 Posted: Thu Jan 16, 2020 6:47 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
There are known ways of enforcing symmetry (search for "equivariant neural nets"). They are quite bad compared to simply randomly applying one of the 8 symmetries to your training data on every data sample as you feed the data to the neural net. With the latter method, the neural net of course won't be fully symmetric in practice, as it will start off not symmetric, and will randomly be learning the different symmetries for each position in a different order, etc. The "true" ideal target you're training it to converge to in the limit of infinite training is symmetric though.

A fun consequence of this is that usually the more thoroughly the neural net "understands" a position, the more closely all 8 symmetries will agree on that position, so you can use the divergence between the 8 symmetries as a crude indicator of the neural net's certainty on a position.

Why is enforcing symmetry bad?

Intuition: Perhaps it's better to allow the neural net to learn potentially asymmetric things on its way to converging to the ideal symmetric function, it's less likely to get stuck in a local hill as the asymmetric states can act as more bridges for the neural net to better transition between different nearly-symmetric states.

Also a practical issue: The current specific ways of enforcing symmetry via equivariant nets enforce a certain property of "equivariance" in the weights of the neural net at each layer. This construction has the neural net apply symmetry-rotated/reflected versions of every set of weights internally at every layer. If any isolated bit of the weights that the neural net would have ended up learning would have been roughly symmetric anyways (e.g. maybe in an early layer the neural net has a "pseudoeye" detector that looks for an empty point surrounded by four adjacent stones of the same color - a fully symmetric local shape), then holding computational cost of the net fixed this results in an 2 or 4 or 8-fold waste of the capacity of that bit of the neural net to have symmetrized copies of those weights that are already naturally symmetric.

Maybe you think: instead of trying some complicated equivariance thing involving mirrored copies of asymmetric weights, why not just simply enforce that the convolutional filters at every layer are symmetric directly? This is even worse. Many symmetric functions are most naturally expressed as combinations of asymmetric functions - e.g. things like count liberties northward - count liberties westward - count liberties eastward - count liberties southward - add all together. All four operations individually are asymmetric, since each one only counts in a specific direction, but their combination is symmetric. If you enforce literal symmetry at every internal layer, you prevent the neural net from doing things like this - "equivariance" what we call it when you still allow the net to do this, where north,west,east,south can all individually compute asymmetric things temporarily, even over many layers, but are symmetric to *each other*.


This post by lightvector was liked by 5 people: Bill Spight, dfan, ez4u, Maharani, marvin
Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #43 Posted: Thu Jan 16, 2020 7:55 am 
Gosei

Posts: 1590
Liked others: 886
Was liked: 528
Rank: AGA 3k Fox 3d
GD Posts: 61
KGS: dfan
Bill Spight wrote:
dfan wrote:
Bill Spight wrote:
Why not? I can guess, but I'd rather not.

How could they be? Can you define the mathematical constraint you'd like to put on the convolutional kernels?

Well, considering that the square go board, unlike the chess board, has no essentially identifiable sides, or quadrants, or semi-quadrants, and thus has eightfold symmetry, why shouldn't the neural nets have eightfold symmetry?

I was speaking from an implementation point of view (thus my question about the form of the constraint on the convolutional kernels), rather than the mathematics of the desired output (which I agree "should be" symmetrical). Anyway, lightvector answered this (at least) as well as I can.

Quote:
Quote:
One thing you could theoretically do to handle symmetries is to canonicalize every input position to the network. Create a total ordering on board positions (e.g., sort positions by the state of the A1 point, then the A2 point if the two positions are the same at A1, then the A3 point, etc.). Then when you want to evaluate a position, take the eight transformations of it (including the identity), select the one that results in the "smallest" position, apply that transformation, send it through the network, and then apply the inverse transformation to the policy. That sounds like kind of a waste of energy, though.

Similarly, you could define the output of the policy network to be the average of all eight policies on the transformed positions, but now you're effectively doing eight times as much work to evaluate each position.

I don't know what assumptions you are making, to make things so difficult.

I'm not sure what this sentence means, but I'm also not sure it's worth pursuing! If you want to spell it out more I will try to respond, but I also don't mind just dropping it.


This post by dfan was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #44 Posted: Thu Jan 16, 2020 8:12 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Thanks to you, lightvector, and dfan for your replies. :)

I said I'd rather not guess, but here is my guess, anyway. Enforcing symmetry is quite likely to slow learning. For instance, suppose that altering the weight for Q-17 on the full board gives the most improvement for a training game or set of training games. But if you applied the same alteration to all 8 3-4 points, it might even make the player play worse. OTOH, slower learning might not be, in the end, a bad thing. For instance, enforcing symmetry would reduce path dependency by joining many paths into one.


lightvector wrote:
There are known ways of enforcing symmetry (search for "equivariant neural nets"). They are quite bad compared to simply randomly applying one of the 8 symmetries to your training data on every data sample as you feed the data to the neural net. With the latter method, the neural net of course won't be fully symmetric in practice, as it will start off not symmetric, and will randomly be learning the different symmetries for each position in a different order, etc. The "true" ideal target you're training it to converge to in the limit of infinite training is symmetric though.

A fun consequence of this is that usually the more thoroughly the neural net "understands" a position, the more closely all 8 symmetries will agree on that position, so you can use the divergence between the 8 symmetries as a crude indicator of the neural net's certainty on a position.


Very interesting. :cool:

Quote:
Why is enforcing symmetry bad?

Intuition: Perhaps it's better to allow the neural net to learn potentially asymmetric things on its way to converging to the ideal symmetric function, it's less likely to get stuck in a local hill as the asymmetric states can act as more bridges for the neural net to better transition between different nearly-symmetric states.


Interesting. On its face that is quite different from my intuition that enforcing symmetry would be likely to smooth out the fitness landscape. ;)

Quote:
Also a practical issue: The current specific ways of enforcing symmetry via equivariant nets enforce a certain property of "equivariance" in the weights of the neural net at each layer. This construction has the neural net apply symmetry-rotated/reflected versions of every set of weights internally at every layer. If any isolated bit of the weights that the neural net would have ended up learning would have been roughly symmetric anyways (e.g. maybe in an early layer the neural net has a "pseudoeye" detector that looks for an empty point surrounded by four adjacent stones of the same color - a fully symmetric local shape), then holding computational cost of the net fixed this results in an 2 or 4 or 8-fold waste of the capacity of that bit of the neural net to have symmetrized copies of those weights that are already naturally symmetric.

Maybe you think: instead of trying some complicated equivariance thing involving mirrored copies of asymmetric weights, why not just simply enforce that the convolutional filters at every layer are symmetric directly? This is even worse. Many symmetric functions are most naturally expressed as combinations of asymmetric functions - e.g. things like count liberties northward - count liberties westward - count liberties eastward - count liberties southward - add all together. All four operations individually are asymmetric, since each one only counts in a specific direction, but their combination is symmetric. If you enforce literal symmetry at every internal layer, you prevent the neural net from doing things like this - "equivariance" what we call it when you still allow the net to do this, where north,west,east,south can all individually compute asymmetric things temporarily, even over many layers, but are symmetric to *each other*.


Now that you bring up counting liberties, I have wondered how the networks learn to do so. I kind of doubt whether they learn a counting operation. And even if they did, how do they learn how to order the counting? After all, there are 120 ways to count 5 dame. ;) Ordering the counting is necessary for efficiency. I wondered if they count somewhat like parrots do. Parrots can count to 6 in the sense that they can distinguish between there being 5 objects in a small enough space and there being 6 objects there, but the difference between 6 objects and 7 objects gives them trouble. I have imagined that the bots gradually learn to distinguish between N and N-1 dame as N increases.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #45 Posted: Thu Jan 16, 2020 8:37 am 
Lives in sente

Posts: 1037
Liked others: 0
Was liked: 180
Bill Spight wrote:
I wondered if they count somewhat like parrots do. Parrots can count to 6 in the sense that they can distinguish between there being 5 objects in a small enough space and there being 6 objects there, but the difference between 6 objects and 7 objects gives them trouble. I have imagined that the bots gradually learn to distinguish between N and N-1 dame as N increases.


Not just parrots. We humans can count small numbers of things if we want to, because we have learned counting as a method for when the number of things is too great for "number recognition". But we do not ordinarily count small numbers of objects. We recognize how many (just like the parrots). I believe that there have been recent papers about AI neural nets being able to learn this ability. Keep in mind, OUR brains ARE neural nets.

Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #46 Posted: Thu Jan 16, 2020 8:38 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
dfan wrote:
Bill Spight wrote:
Well, considering that the square go board, unlike the chess board, has no essentially identifiable sides, or quadrants, or semi-quadrants, and thus has eightfold symmetry, why shouldn't the neural nets have eightfold symmetry?

I was speaking from an implementation point of view (thus my question about the form of the constraint on the convolutional kernels), rather than the mathematics of the desired output (which I agree "should be" symmetrical).
Quote:
I don't know what assumptions you are making, to make things so difficult.

I'm not sure what this sentence means, but I'm also not sure it's worth pursuing! If you want to spell it out more I will try to respond, but I also don't mind just dropping it.

I've got the general idea, thanks. :)

As far as I am concerned, going into detail about implementation would not mean much to me until I get my new axe. Then I can play around with different ideas. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


This post by Bill Spight was liked by: dfan
Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #47 Posted: Thu Jan 16, 2020 8:40 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Mike Novack wrote:
Bill Spight wrote:
I wondered if they count somewhat like parrots do. Parrots can count to 6 in the sense that they can distinguish between there being 5 objects in a small enough space and there being 6 objects there, but the difference between 6 objects and 7 objects gives them trouble. I have imagined that the bots gradually learn to distinguish between N and N-1 dame as N increases.


Not just parrots. We humans can count small numbers of things if we want to, because we have learned counting as a method for when the number of things is too great for "number recognition". But we do not ordinarily count small numbers of objects. We recognize how many (just like the parrots). I believe that there have been recent papers about AI neural nets being able to learn this ability. Keep in mind, OUR brains ARE neural nets.


Rainman excepted, I think humans may not in general be as good as parrots. For instance, there are human languages with no numbers greater than three. :o

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #48 Posted: Thu Jan 16, 2020 10:04 pm 
Lives in gote

Posts: 586
Location: Adelaide, South Australia
Liked others: 208
Was liked: 265
Rank: Australian 2 dan
GD Posts: 200
Bill Spight wrote:
Rainman excepted, I think humans may not in general be as good as parrots. For instance, there are human languages with no numbers greater than three. :o

Can't think of too many parrot languages with numbers greater than three either.


This post by xela was liked by 3 people: Bill Spight, ez4u, Waylon
Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #49 Posted: Thu Jan 16, 2020 10:39 pm 
Lives with ko

Posts: 237
Location: Pasadena, USA
Liked others: 79
Was liked: 12
Rank: OGS 9 kyu
OGS: Maharani
xela wrote:
Bill Spight wrote:
Rainman excepted, I think humans may not in general be as good as parrots. For instance, there are human languages with no numbers greater than three. :o

Can't think of too many parrot languages with numbers greater than three either.

Lol :D

Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #50 Posted: Fri Jan 17, 2020 12:49 am 
Lives in gote

Posts: 445
Liked others: 0
Was liked: 37
NNs like to start from random (non-random inits learn slower, if at all), and enforcing symmetries reduces randomness. Some papers showed that most NN convergences are direct results of separatable, luckily initialized weight subsets (the rest may even be left out). And because of the randomised use of the net in selfplay, if just one symmetry learns / finds out sth useful, that will also be taught to other symmetries soon.


This post by jann was liked by 2 people: Bill Spight, xela
Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #51 Posted: Fri Jan 17, 2020 9:12 pm 
Lives in gote

Posts: 586
Location: Adelaide, South Australia
Liked others: 208
Was liked: 265
Rank: Australian 2 dan
GD Posts: 200
Bill has suggested an interesting position for analysis. It's an example of "LZ ignores a good move".

The game is Sakaguchi-Hayashi, GoGoD 1861-12-18e. After :w30: there's an interesting fight happening in the bottom left.

Click Here To Show Diagram Code
[go]$$Bc Black to play
$$ +---------------------------------------+
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . X . . . . . . . . . O . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O O O X . . . . . . . . . . . . |
$$ | . O b O X X X X . . . . . . . . . . . |
$$ | . a X X O O . . . . . . . . . . X . . |
$$ | . . X O . O . O X . . . . . O . . . . |
$$ | . . X O O . c . X . . . . . . O . . . |
$$ | . . e . . f d . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+[/go]

Looking at the policy network, a is the "first instinct" move (70-80%), and all of b through d are interesting (3-10%). Possibly e and f are worth a quick look too (around 1%). LZ-258, ELF and KataGo 1.2 all broadly agree on the policy values. (Installing KataGo 1.3 is on my to-do list!)

What would you play here? Spoilers behind the cut.
Hayashi actually played a. He went on to lose the game: black soon tenukis, the bottom left is left either unresolved or beyond my reading ability, and there's a big fight at the top left.
According to the ELF commentary, c is a bit better (assuming 7.5 komi, whereas in fact it's a no komi game). This is based on about 2,800 playouts for a (winrate 41.6%) and 155,000 playouts for c (winrate 50.5%).

LZ-258 on a few thousand playouts from the starting position will choose a as the best move, but if you put c on the board it will get a higher evaluation. So here's a case where it can see in hindsight that there's a better move but it won't find the move on its own. Although for LZ-258 the difference isn't quite as big, 46% for c versus 41% for a.

(Actually, LZ-258 may find the right move after about half a million playouts, depending on the random number seed. With a "bad" seed, it will in principle still get there, but I haven't checked how many million playouts are needed.)

LZ using the ELF network will replicate what's in the official ELF commentary, and goes to c almost right away.

For what it's worth, I'm not convinced that Hayashi's move is a "mistake" as such. KataGo 1.2 analysing without komi has black about 8 or 9 points ahead and sees both a and c as winning moves. It's like LZ-258 in that left to its own devices it will prefer a, whereas c gets a higher score in hindsight. But the winrate difference is only about 3%. KataGo would have us believe that black lost the game later, with a blunder at move 55.

Here's the full game for your entertainment. More detailed AI analysis to come soon...



Attachments:
sakaguchi-hayashi-1861-12-18.sgf [763 Bytes]
Downloaded 508 times
Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #52 Posted: Sat Jan 18, 2020 6:12 am 
Lives in gote

Posts: 586
Location: Adelaide, South Australia
Liked others: 208
Was liked: 265
Rank: Australian 2 dan
GD Posts: 200
Here's a trace of 10,000 playouts (SGf summary only, the CSV files for this many playouts are huge!) I'll give you three versions:
  • LZ-258 from the starting position: it spends most of its time exploring B5, and only gives 43 playouts to G3. (Playing with different random number seeds, I can persuade it to give over 60 visits to G3, but the evaluations don't change much.)
  • LZ-258 analysing the position after G3 has been played.
  • LZ with the ELF network from the starting position. It still looks at B5 first, but starts paying serious attention to G3 after about 2,000 playouts, and looks at G3 exclusively from playout 4,319 onwards.







Remember that the difference between LZ-258 and LZ-ELF here is only the network. It's still the same software running the same search algorithm. So how do the different networks affect the choice of B5 or G3?

The policy values tell us that both networks will look at B5 first. But that's not the full story. LZ-258 gets to G3 at playout 93, and on a first look thinks that it's a 60% winrate for white, not that much different from the initial 59% for B5. But after a few playouts, G3 starts to look worse, so LZ-258 gives up on it. LZ-ELF starts off similarly, looking at G3 from playout 21, and scoring G3 at 46% for white compared with 49% for B5, again in the same ballpark. But on more playouts, LZ-ELF rates B5 as getting a lot worse for black, while the rating for G3 doesn't change much. So the difference isn't between those two positions specifically, but further down the tree.

Looking at the variations after B5, I haven't yet found a massive difference between 258 and ELF. But here's something interesting for the G3 variations:

Click Here To Show Diagram Code
[go]$$Bc
$$ | . . . . . . . . . .
$$ | . . . O O O X . . .
$$ | a O 6 O X X X X . .
$$ | . 5 X X O O . . . .
$$ | b . X O . O . O X .
$$ | . . X O O . 1 . X .
$$ | . 3 2 4 . . . . . .
$$ | . c . . . . . . . .
$$ +--------------------[/go]


Both networks have a as the first instinct. But it's not actually a good move. ELF spends 15 playouts on it, and then gives 994 playouts to c (and has another look at a later on). Meanwhile it's been spending a lot of time on B5 at the start, so it's not until around playout number 3,000 that c is seen to be clearly better than a in this diagram.

LZ-258 gives 18 playouts to a, but then its second choice is b, which also doesn't work well (and is discarded after four playouts). Starting from the initial position, LZ-258 will never even look at c (the policy value is around 1%). By playout number 5,000 it's given up on G3 (at least for the time being; it will come back by playout number 100,000 if you let it run that long). Maybe this is a little blind spot in LZ-258's network that skews the evaluations slightly. Not the full story, there are still a lot of other variations to look at.

Finally, G3 in the initial position has a policy value of around 7% for LZ-258, or around 10% for ELF. Does that 3% difference have a big influence on the search? I suspect not -- we've seen examples of where 1% moves can get a lot of playouts if the evaluations are promising -- but it's hard to say for sure (I've tried scribbling down some equations based on the UCT formulae, and the maths gets pretty messy). I might come back to that later. Or I might get distracted by something else...


Attachments:
trace_1861-12-18e-move31-LZ-ELF.sgf [1.6 MiB]
Downloaded 498 times
trace_1861-12-18e-move32-LZ258.sgf [1.7 MiB]
Downloaded 482 times
trace_1861-12-18e-move31-LZ258.sgf [1.75 MiB]
Downloaded 480 times

This post by xela was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: Exploring LZ's search algorithm: worked examples
Post #53 Posted: Tue Mar 03, 2020 5:01 pm 
Lives in gote

Posts: 586
Location: Adelaide, South Australia
Liked others: 208
Was liked: 265
Rank: Australian 2 dan
GD Posts: 200
I was going to post some ladder positions here, but it got complicated, so I started a separate thread for them.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 53 posts ]  Go to page Previous  1, 2, 3

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group