A new paper from DeepMind about how they trained the hyper-parameters of AlphaGo:
https://arxiv.org/abs/1812.06855
I just skimmed it, but something that jumped out to me was it was this automated tuning process that suggested to them to stop using rollouts and just use the value network.
Paper: Bayesian Optimization in AlphaGo
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times