Page 2 of 2

Re: Statistical Approach and Efficiency of Play

Posted: Thu Oct 25, 2012 10:47 am
by SmoothOper
Mef wrote: I'm inclined to believe it might be, since it would perhaps allow for comparison of play between a game with two large moyos (lots of points and few dame) vs. a fighting game with many groups (lots of dame few points).
That is an interesting point that the overall score of the game may bias the results of any evaluation of tesuji if the points per stone metric is used without any adjustment.

Re: Statistical Approach and Efficiency of Play

Posted: Thu Oct 25, 2012 11:14 am
by Bill Spight
Mef wrote:
Bill Spight wrote:On this general topic, I wondered about the question of whole board liberties and efficiency. To try to make a start, I looked at a few games at the end. OC, the proper index of efficiency is the score. :) However, I identified two distinct types of liberties: shared liberties and territory liberties. Assume that the shared liberties (outside of seki) are filled and any necessary repair moves are made. That leaves two types of territory, liberties and non-liberties.

I suspect that there is a correlation between non-liberty territory and score. I. e., efficiency means making territory at a distance. :)
To see if I follow what you're saying...
Click Here To Show Diagram Code
[go]$$
$$ -----------
$$ . . a . . .
$$ . . b . . .
$$ X X X X X X
$$ . . c . . .
$$ O O O O O O[/go]
A is non-liberty territory, B is territory liberty, C is shared liberties? If I'm following right it seems like you're trying to (more or less) figure out average "points per territory-making stone" (let's go ahead and call a territory-making stone a stone with at least one territory liberty).
You got the empty point classification right. :)

But what I was musing about was not so specific. SmoothOper pointed out the importance of liberties in tactical situations and then asked about whole board liberties and efficiency. What I am suggesting is that, at the level of the whole board, efficiency is perhaps more related to non-liberty territory.

Re: Statistical Approach and Efficiency of Play

Posted: Fri Oct 26, 2012 6:43 am
by Mef
SmoothOper wrote: That is an interesting point that the overall score of the game may bias the results of any evaluation of tesuji if the points per stone metric is used without any adjustment.

Well, with any statistical comparison it is always important to properly normalize your results -- A recent example from baseball: (a digression about the triple crown, stats, and baseball history for those who care to read it).
Miguel Cabrera just won the Triple Crown with a .330 batting average, 44 home runs, and 139 RBIs. This has not been done since Carl Yastrzemski did it in 1967 scoring a .326 average, 44 home runs, and 126 RBIs. You might be tempted to make a direct comparison of these numbers and conclude that, while their seasons were similar, Cabrera's season was a bit better.

If you look beyond those numbers there's a much different story. One "normalized" way of measuring an offensive player's production is oWAR (Wins Above a Replacement Player - isolated for offensive stats only). In 1967, Yaz is said to have produced 9.5 oWAR, while in 2012 Cabrera generated "only" 7.4. This gives us a hint that Yaz's achievement might be better, but even this doesn't capture the whole story. You see WAR looks at the expected production of a "replacement level player", and a replacement leftfielder that Yastrzemski is being compared to is expected to be a significantly better hitter than a third baseman who would replace Cabrera (this because more teams will accept a poor hitting third baseman who has a strong fielding ability).

If you want to properly quantify the difference just for hitting, you can look at a normalized hitting statistic like OPS+. OPS is On Base Percentage + Slugging percentage, a statistic that has been gaining in popularity over the last decade because it's a quick way to compare both players who walk/hit for average against players who hit for power. OPS+ takes this one step further, it compares the OPS of each player compared to the average in the league, and further normalizes this based on what ballparks the players play in. A league average player will have an OPS+ of 100. In 1967, Yastrzemski's OPS+ was 193 (that's 93% higher than league average!) vs. 2012 Cabrera's 165. So while certainly played great Cabrera's this season, Yastrzemski's 1967 season was phenomenal -- even though the numbers for the triple crown are better for Cabrera.

Why is that you might ask?

Well, back in 1967 life wasn't so good as a batter...pitchers were absolutely dominant. In the early 60s there had been lots of power hitting (1961 is when Mantle and Maris go after the home run title and Maris hits 61). To compensate the did things like expand the strike zone and the result is it swings the other way. It got so bad that in 1969 they changed the rules, lowered the pitcher's mound and shrunk the strikezone. Also even though the rules haven't technically been changing, it's been argued that the strike zone has been effectively shrinking even more since then, based on how umpires are actually calling games.

Long story long, context is everything. One player putting up hitting numbers during the peak of a pitcher dominated era isn't the same as another player doing the same today.
To bring it back to go, you'd need to find a way to normalize each move, perhaps using "relative gain" (profit?), or maybe "points secured vs. total points scored", or perhaps combine the two -- "profit relative to total points scored". After all, the name of the game in go isn't figuring out how to maximize your number of points, it's about maximizing the probability you have more points than your opponent at the end of the game. Generally that means finding ways to make points while your opponent isn't. It might be through forcing them to make life while you secure territory, it might mean forcing them to connect two groups across dame points, etc. Ideally a statistic you develop would be able to quantify this...though the further you go down this line it sounds more and more just like a fancy name for miai counting.
Bill Spight wrote: You got the empty point classification right. :)

But what I was musing about was not so specific. SmoothOper pointed out the importance of liberties in tactical situations and then asked about whole board liberties and efficiency. What I am suggesting is that, at the level of the whole board, efficiency is perhaps more related to non-liberty territory.
For tactical stability, perhaps it would be worth quantifying N-th order liberties*, where an N-th order liberty is a liberty you can potentially add to your group with N moves? The simplest example of such is of course a net - reducing the potential to get liberties rather than reducing liberties directly.

Or perhaps more along the lines of your definitions, exclusive liberties - liberties that one side can obtain if needed that are unavailable to the opponent (this seems to me like it would be some function of territory liberties and non-liberty territory?).


*I had originally put secondary and tertiary liberties here, because that is how I had always heard the term used, but, a quick sensei's search shows that the term secondary liberty is used for something else.

Re: Statistical Approach and Efficiency of Play

Posted: Fri Oct 26, 2012 7:17 am
by Bill Spight
Mef wrote:
Bill Spight wrote: You got the empty point classification right. :)

But what I was musing about was not so specific. SmoothOper pointed out the importance of liberties in tactical situations and then asked about whole board liberties and efficiency. What I am suggesting is that, at the level of the whole board, efficiency is perhaps more related to non-liberty territory.
For tactical stability, perhaps it would be worth quantifying N-th order liberties*, where an N-th order liberty is a liberty you can potentially add to your group with N moves? The simplest example of such is of course a net - reducing the potential to get liberties rather than reducing liberties directly.

Or perhaps more along the lines of your definitions, exclusive liberties - liberties that one side can obtain if needed that are unavailable to the opponent (this seems to me like it would be some function of territory liberties and non-liberty territory?).
IIRC, Tajima uses the idea of Nth order liberties in his paper about the Possible Omission Number of a group. (The PON is the number of times you can tenuki and still save the group.) Exclusive liberties sound something like outside liberties in a semeai. :)
*I had originally put secondary and tertiary liberties here, because that is how I had always heard the term used, but, a quick sensei's search shows that the term secondary liberty is used for something else.
SL is not a reliable source for English go usage. Nobody who wrote there about terminology was or is a lexicographer, as far as I know. The only place a number of terms appear in the literature is on SL. They are what somebody came up with on their own. (Not that there is anything wrong with coining terminology. But when you do that you are not reflecting actual usage.)

Re: Statistical Approach and Efficiency of Play

Posted: Fri Oct 26, 2012 11:03 am
by SmoothOper
Mef wrote:
SmoothOper wrote: That is an interesting point that the overall score of the game may bias the results of any evaluation of tesuji if the points per stone metric is used without any adjustment.

Well, with any statistical comparison it is always important to properly normalize your results -- A recent example from baseball: (a digression about the triple crown, stats, and baseball history for those who care to read it).
Miguel Cabrera just won the Triple Crown with a .330 batting average, 44 home runs, and 139 RBIs. This has not been done since Carl Yastrzemski did it in 1967 scoring a .326 average, 44 home runs, and 126 RBIs. You might be tempted to make a direct comparison of these numbers and conclude that, while their seasons were similar, Cabrera's season was a bit better.

If you look beyond those numbers there's a much different story. One "normalized" way of measuring an offensive player's production is oWAR (Wins Above a Replacement Player - isolated for offensive stats only). In 1967, Yaz is said to have produced 9.5 oWAR, while in 2012 Cabrera generated "only" 7.4. This gives us a hint that Yaz's achievement might be better, but even this doesn't capture the whole story. You see WAR looks at the expected production of a "replacement level player", and a replacement leftfielder that Yastrzemski is being compared to is expected to be a significantly better hitter than a third baseman who would replace Cabrera (this because more teams will accept a poor hitting third baseman who has a strong fielding ability).

If you want to properly quantify the difference just for hitting, you can look at a normalized hitting statistic like OPS+. OPS is On Base Percentage + Slugging percentage, a statistic that has been gaining in popularity over the last decade because it's a quick way to compare both players who walk/hit for average against players who hit for power. OPS+ takes this one step further, it compares the OPS of each player compared to the average in the league, and further normalizes this based on what ballparks the players play in. A league average player will have an OPS+ of 100. In 1967, Yastrzemski's OPS+ was 193 (that's 93% higher than league average!) vs. 2012 Cabrera's 165. So while certainly played great Cabrera's this season, Yastrzemski's 1967 season was phenomenal -- even though the numbers for the triple crown are better for Cabrera.

Why is that you might ask?

Well, back in 1967 life wasn't so good as a batter...pitchers were absolutely dominant. In the early 60s there had been lots of power hitting (1961 is when Mantle and Maris go after the home run title and Maris hits 61). To compensate the did things like expand the strike zone and the result is it swings the other way. It got so bad that in 1969 they changed the rules, lowered the pitcher's mound and shrunk the strikezone. Also even though the rules haven't technically been changing, it's been argued that the strike zone has been effectively shrinking even more since then, based on how umpires are actually calling games.

Long story long, context is everything. One player putting up hitting numbers during the peak of a pitcher dominated era isn't the same as another player doing the same today.
To bring it back to go, you'd need to find a way to normalize each move, perhaps using "relative gain" (profit?), or maybe "points secured vs. total points scored", or perhaps combine the two -- "profit relative to total points scored". After all, the name of the game in go isn't figuring out how to maximize your number of points, it's about maximizing the probability you have more points than your opponent at the end of the game. Generally that means finding ways to make points while your opponent isn't. It might be through forcing them to make life while you secure territory, it might mean forcing them to connect two groups across dame points, etc. Ideally a statistic you develop would be able to quantify this...though the further you go down this line it sounds more and more just like a fancy name for miai counting.
Bill Spight wrote: You got the empty point classification right. :)

But what I was musing about was not so specific. SmoothOper pointed out the importance of liberties in tactical situations and then asked about whole board liberties and efficiency. What I am suggesting is that, at the level of the whole board, efficiency is perhaps more related to non-liberty territory.
For tactical stability, perhaps it would be worth quantifying N-th order liberties*, where an N-th order liberty is a liberty you can potentially add to your group with N moves? The simplest example of such is of course a net - reducing the potential to get liberties rather than reducing liberties directly.

Or perhaps more along the lines of your definitions, exclusive liberties - liberties that one side can obtain if needed that are unavailable to the opponent (this seems to me like it would be some function of territory liberties and non-liberty territory?).


*I had originally put secondary and tertiary liberties here, because that is how I had always heard the term used, but, a quick sensei's search shows that the term secondary liberty is used for something else.
I was thinking about efficiency of play in couple of different ways. Relating it to another sport basketball, offensive efficiency is the number of points per possession, defensive efficiency is points permitted per possession. The most efficient teams on the season are generally correlated with winning-est teams. In this context efficiency would be related to sente and gote. As a hypothesis maintaining more liberties across the board would help in sente and gote.

Also I am thinking about tempo in go, and what that term means. Is tempo always better? I suspect that fast development is not necessarily the most efficient development, however efficiency ends up defined.