Indeed, when researchers have evaluated historical chess players by having computers rate their moves, Capablanca comes out better than expected (not that he was a slouch in the first place), because his simple style meant that he had fewer opportunities to make mistakes, compared to, say, someone like Kasparov who played in a maximal dynamic style.Calvin Clark wrote:This definition is tricky. First, a human probability is not the same as an AI one. Second, attempts to do this crudely in chess unfairly punish more tactical players, because go strength is not just about making fewer mistakes but also about provoking your opponent to make bigger ones.
Elf OpenGo paper released
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: Elf OpenGo paper released
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Elf OpenGo paper released
That's why the log of the odds ratio is a more informative measure.Uberdude wrote:once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: Elf OpenGo paper released
Even then I think the quality and precision of the bot's suggestions is reduced: if it says one move is 99.675 and another is 99.784 does can you really believe those sig figs? It just wants a safe win (0.5 points as seen from move 100 could be safe) and caa play slack moves whereas a human might want to keep pressing the advantage for a comfier margin. A better approach would be to be able to add in some dynamic komi to get the winrate back near 50% and then analyse from that modified board state. Unfortunately the Elf converted to LZ weights (at least v0 / v1) don't play nicely with the dynamic komi modified version of LZ engine.Bill Spight wrote:That's why the log of the odds ratio is a more informative measure.Uberdude wrote:once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: Elf OpenGo paper released
What is the story behind this graph? It shows the averaged biggest mistake (in win % drop) for all players over time. There probably wasn't much data back in 1700, and then we have them making bigger mistakes to around 1780, then getting better down to a trough of mistakes around 1860 to 1895. Is this seen as a golden age of Japanese go, you've got the last few years of Shusaku to start and Shuei at the end, though I presume he indivudually was a small part of the corpus. Checking the stats for just him he averaged around 24%, quite a bit lower than the trough at 27%. Then we have mistakes getting bigger again at turn of the century, is this the collapse of the Go houses? As ez4u mentioned the peak in mistakes in the 1930s is the Shin Fuseki era BEFORE WW2. And in modern times the reduction in biggest mistakes seems quite nicely correlated with reducing time limits
-
John Fairbairn
- Oza
- Posts: 3724
- Joined: Wed Apr 21, 2010 3:09 am
- Has thanked: 20 times
- Been thanked: 4672 times
Re: Elf OpenGo paper released
I'm not sure the above graph tells us much. My own background in statistics is not much more than reading books like Freakonomics on long-haul flights, but I have hugely more knowledge of how the database was constructed and what it contains. Both of those things tell me to treat the graph with a great amount of caution.
Among the factors that potentially affect the results are these. This is not a complete list but does run roughly chronologically.
1. The very early games include a large number of Chinese games under ancient rules. Apart from the fact that these usually start with 4 stones on the board, which restricts the style of play somewhat, and are without komi, they also have group tax, which may be a distorting element. But the single biggest distortion is that the dates of these games are usually unknown (and even the dates of the players can be quite unknown). I therefore catalogued the games under the date, or estimated date, of the publications in which I found the games. This means, for example, that there are very many games labelled 1700. That's not when they were played.
2. Old games in general are likewise affected by being with no komi and often with handicaps. Thee handicaps include not just stones but the series type of handicaps (e.g. taking Black in 2 games out of 3). Since a series handicap was defined by current grades only initially, but could then change between those two same players (not to mention that grades were set largely on the basis of politics) and not others, there must have been many cases where the wrong handicap was used. In general, too, no komi is not just a problem with training bots but I expect it also encourages White to make wilder moves, and thus bigger potential mistakes.
3. At times such as the late Edo/Meiji period in Japan, there were fewer games because of less sponsorship and other external factors. But also there may be gaps in the record. For example, I have not got round to doing the complete games of Shuwa yet.
4. At any period with the older players, the corpus is likely to focus on the star players, via their collected games. This means many games from very early in their careers. Nowadays the proportion of games by weaker players is likely to be much less because there are just so many games by strong players to collect instead.
5. In the 1930s, as has been observed, there was a spike that can be considered to coincide with Shin Fuseki. Intuitively, I suppose we would expect many mistakes then as players started experimenting. But there may be a further factor. That period has been of special interest to me and so over the past 2-3 years I have been adding lots of games from this era. These are generally by weaker players (more mistake prone? More experimental?) and so in this period we get both more data and also data covering a much wider range of players (e.g. the Oteai B Section) than in other historical eras. There is also a trend towards the use of komi in this era, but weird ones such 2.5 points.
6. As regards the war period, there is actually very little data. Apart from disruption from bombs, and players being sent off to fight, paper was scarce and publication of games was minimal (many now known were reconstructed later from personal records of the players).
7. I don't think time limits have much to do with anything in this graph. For a start, as one example, in the days of 13 hours each, there were distortions at both ends of the scale. On the one hand many players would use the extra time to take a shower, pop out to the shops or have a snooze upstairs. At the other end of the scale, where a player did use all his time, it became apparent that this carried significant health risks, presumably making mistakes likelier, and so time limits were reduced significantly without any external pressure from sponsors or public.
8. Since I get to choose what goes in the database and I don't like games at Mickey Mouse time limits, I have at times tended to ignore these (at any era).
9. If we look at very recent times, several factors leap out at me for initial consideration. One is that the proportion of Chinese and Korean players represented has increased very significantly. Whether you accept Cho Hun-hyeon's view that this has coincided with a horde of programmatically trained clones is up to you, but I think what is beyond question is that the level of play has not just increased but differences in both strength and styles between players have become smaller, so that there are much fewer mismatches (with their potential for bigger mistakes?) than in the past.
10. I suspect there is also a flattening effect in modern times due to increases in komi. And of course Elf is trained on current komi, so is perhaps more likely, on average, to find big mistakes in games which are not at 7.5 komi? (Or even to report as mistakes moves in games at other komis which were not really wrong?)
Among the factors that potentially affect the results are these. This is not a complete list but does run roughly chronologically.
1. The very early games include a large number of Chinese games under ancient rules. Apart from the fact that these usually start with 4 stones on the board, which restricts the style of play somewhat, and are without komi, they also have group tax, which may be a distorting element. But the single biggest distortion is that the dates of these games are usually unknown (and even the dates of the players can be quite unknown). I therefore catalogued the games under the date, or estimated date, of the publications in which I found the games. This means, for example, that there are very many games labelled 1700. That's not when they were played.
2. Old games in general are likewise affected by being with no komi and often with handicaps. Thee handicaps include not just stones but the series type of handicaps (e.g. taking Black in 2 games out of 3). Since a series handicap was defined by current grades only initially, but could then change between those two same players (not to mention that grades were set largely on the basis of politics) and not others, there must have been many cases where the wrong handicap was used. In general, too, no komi is not just a problem with training bots but I expect it also encourages White to make wilder moves, and thus bigger potential mistakes.
3. At times such as the late Edo/Meiji period in Japan, there were fewer games because of less sponsorship and other external factors. But also there may be gaps in the record. For example, I have not got round to doing the complete games of Shuwa yet.
4. At any period with the older players, the corpus is likely to focus on the star players, via their collected games. This means many games from very early in their careers. Nowadays the proportion of games by weaker players is likely to be much less because there are just so many games by strong players to collect instead.
5. In the 1930s, as has been observed, there was a spike that can be considered to coincide with Shin Fuseki. Intuitively, I suppose we would expect many mistakes then as players started experimenting. But there may be a further factor. That period has been of special interest to me and so over the past 2-3 years I have been adding lots of games from this era. These are generally by weaker players (more mistake prone? More experimental?) and so in this period we get both more data and also data covering a much wider range of players (e.g. the Oteai B Section) than in other historical eras. There is also a trend towards the use of komi in this era, but weird ones such 2.5 points.
6. As regards the war period, there is actually very little data. Apart from disruption from bombs, and players being sent off to fight, paper was scarce and publication of games was minimal (many now known were reconstructed later from personal records of the players).
7. I don't think time limits have much to do with anything in this graph. For a start, as one example, in the days of 13 hours each, there were distortions at both ends of the scale. On the one hand many players would use the extra time to take a shower, pop out to the shops or have a snooze upstairs. At the other end of the scale, where a player did use all his time, it became apparent that this carried significant health risks, presumably making mistakes likelier, and so time limits were reduced significantly without any external pressure from sponsors or public.
8. Since I get to choose what goes in the database and I don't like games at Mickey Mouse time limits, I have at times tended to ignore these (at any era).
9. If we look at very recent times, several factors leap out at me for initial consideration. One is that the proportion of Chinese and Korean players represented has increased very significantly. Whether you accept Cho Hun-hyeon's view that this has coincided with a horde of programmatically trained clones is up to you, but I think what is beyond question is that the level of play has not just increased but differences in both strength and styles between players have become smaller, so that there are much fewer mismatches (with their potential for bigger mistakes?) than in the past.
10. I suspect there is also a flattening effect in modern times due to increases in komi. And of course Elf is trained on current komi, so is perhaps more likely, on average, to find big mistakes in games which are not at 7.5 komi? (Or even to report as mistakes moves in games at other komis which were not really wrong?)
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Elf OpenGo paper released
Well, I don't really believe that win rates are win rates, anyway.Uberdude wrote:Even then I think the quality and precision of the bot's suggestions is reduced: if it says one move is 99.675 and another is 99.784 does can you really believe those sig figs?Bill Spight wrote:That's why the log of the odds ratio is a more informative measure.Uberdude wrote:once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.
But if we take the log odds ratios we get 5.73 and 6.14, respectively, for a difference of 0.41. By comparison the log odds ratio for 60% is 0.41 and the log odds ratio for 50% is, OC, 0. So the play with a win rate of 99.675% instead of 99.784% could be just as bad a mistake as a play with a win rate of 50% instead of 60%. Quien sabe?
But, as you say, surely the errors are larger as we approach 100% or 0%.
Edit: Also, the use of changes in win rate estimates between moves instead of comparison of win rate estimates for the same possible moves introduces the complication that the estimates should approach 0 or 1 as the game continues. That's probably a small factor early on, but in the endgame it could be significant.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: Elf OpenGo paper released
By the way, the log of the odds ratio is what these networks actually produce under the hood. As people have noted here, you don't want to have to expend lots of energy making your network produce values bounded between 0 and 1 that precisely hit targets like 0.98 or 0.99. So instead you have the network produce an unbounded value, let's call it x, and then run x through the sigmoid function 1/(1 + e^(-x)) to produce a probability p between 0 and 1. Solving for x, you end up with x = log (p/(1-p)), which is the log odds.Bill Spight wrote:That's why the log of the odds ratio is a more informative measure.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Elf OpenGo paper released
Great minds think alike.dfan wrote:By the way, the log of the odds ratio is what these networks actually produce under the hood.Bill Spight wrote:That's why the log of the odds ratio is a more informative measure.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
- smartgo
- Dies with sente
- Posts: 110
- Joined: Sat Apr 24, 2010 4:23 pm
- Rank: AGA 3 dan
- GD Posts: 0
- Has thanked: 11 times
- Been thanked: 120 times
Re: Elf OpenGo paper released
If you want to look at ELF’s analysis of pro games but don’t want to download gigs of data, you can now download the annotated SGF for a specific game:
https://smartgo.com/gogod.html
https://smartgo.com/gogod.html
Anders Kierulf
@smartgo
@smartgo
-
And
- Gosei
- Posts: 1464
- Joined: Tue Sep 25, 2018 10:28 am
- GD Posts: 0
- Has thanked: 212 times
- Been thanked: 215 times
Re: Elf OpenGo paper released
changed ELF OpenGo Windows binary https://dl.fbaipublicfiles.com/elfopeng ... ngo_v2.zip
I did not find the change log, but the size decreased to 1 GB and now the 43 version of Sabaki. In the previous version worked on my computer only elf_cpu_full, but compared to elf_v2 + lz it plays much weaker. The new version does not work at all. Has anyone tried ELF OpenGo Windows binary?
I did not find the change log, but the size decreased to 1 GB and now the 43 version of Sabaki. In the previous version worked on my computer only elf_cpu_full, but compared to elf_v2 + lz it plays much weaker. The new version does not work at all. Has anyone tried ELF OpenGo Windows binary?
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Elf OpenGo paper released
Oh, one more thing about Elf and the New Fuseki. Elf likes the New Fuseki. Yes, it found big mistakes then, but it also found a large number of matches in the first 60 moves to its own choices, peaking in the 1930s. The the number of such early matches dropped until after WWII, then rebounding gradually into the 1970s and 80s. No wonder AlphaGo seemed like the ghost of Go Seigen. 
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Elf OpenGo paper released
BTW, sports fans. I was unable to download the Elf analysis files, here: - Annotated SGFs:
s3://dl.fbaipublicfiles.com/elfopengo/analysis/data/varied_models_commentary_sgfs.gzip
Safari complains about the s3: prefix.
s3://dl.fbaipublicfiles.com/elfopengo/analysis/data/varied_models_commentary_sgfs.gzip
Safari complains about the s3: prefix.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
vier
- Dies with sente
- Posts: 91
- Joined: Wed Oct 30, 2013 7:04 am
- GD Posts: 0
- Has thanked: 8 times
- Been thanked: 29 times
Re: Elf OpenGo paper released
For me replacing s3: by https: worked.Bill Spight wrote:I was unable to download
s3://dl.fbaipublicfiles.com/elfopengo/analysis/data/varied_models_commentary_sgfs.gzip
Safari complains about the s3: prefix.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Elf OpenGo paper released
Many thanks.vier wrote:For me replacing s3: by https: worked.Bill Spight wrote:I was unable to download
s3://dl.fbaipublicfiles.com/elfopengo/analysis/data/varied_models_commentary_sgfs.gzip
Safari complains about the s3: prefix.
Maybe Facebook is hiring.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Elf OpenGo paper released
By replacing s3: with https: after unzipping I got 78 folders, each with the same games from 2011 - 2015. 
Edit: Apparently the commentary on the GoGoD files are elsewhere. Attempting download now.
Edit2: Success!
Edit: Apparently the commentary on the GoGoD files are elsewhere. Attempting download now.
Edit2: Success!
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.