Extending SGF?

General conversations about Go belong here.
Amtiskaw
Dies in gote
Posts: 38
Joined: Sun Apr 17, 2016 5:22 am
GD Posts: 0
Has thanked: 4 times
Been thanked: 20 times

Re: Extending SGF?

Post by Amtiskaw »

SGF properties can have multiple values attached to them, e.g. AB[dd][dp][pd][pp] so if one wants to have a winrate for a node by multiple bots, you could have something like:

VR[48.6:1600:7.5:Leela Zero][43.6:900:7.5:Elf]

e.g. Leela thinks Black is 48.6% to win, Elf thinks Black is 43.6% to win...
Amtiskaw
Dies in gote
Posts: 38
Joined: Sun Apr 17, 2016 5:22 am
GD Posts: 0
Has thanked: 4 times
Been thanked: 20 times

Re: Extending SGF?

Post by Amtiskaw »

John Fairbairn wrote:Not all sgf editors read sgf files correctly. In fact, I'm told that very, very few do. Even the Eidogo one used on this forum seems to choke on quite a few things.

So extending the format seems to be a recipe for more confusion. A fresh start using e.g. xml may be wiser.
By the way, I want to very strongly object to this idea. SGF is an absolute delight both to parse and to write.

The only real flaw in SGF is that files in certain encodings can confuse a parser due to the presence of \ or ] bytes inside multi-byte characters. This can be avoided if UTF-8 is considered mandatory, as UTF-8 never produces those bytes for multi-byte characters.

Aside from that, you will experience no major problems writing an SGF parser, as long as you understand that a node has 0 or more keys, a key has 1 or more values, keys use only characters A-Z, and values use any UTF-8 character, with ] and \ characters escaped.

The core of my own SGF parser is just 100 lines of Golang.

Of course, what you do with the property values is a different issue; they often need to be converted from strings to something else to be useful; perhaps most problems are coming at that stage. But the core logic of variations, as well as important properties like B, W, AB, AW, AE, PL are all easy to understand and implement.
deungsan
Dies in gote
Posts: 32
Joined: Thu Jan 24, 2019 5:23 pm
GD Posts: 0
Been thanked: 9 times

Re: Extending SGF?

Post by deungsan »

Amtiskaw wrote:SGF properties can have multiple values attached to them, e.g. AB[dd][dp][pd][pp] so if one wants to have a winrate for a node by multiple bots, you could have something like:

VR[48.6:1600:7.5:Leela Zero][43.6:900:7.5:Elf]

e.g. Leela thinks Black is 48.6% to win, Elf thinks Black is 43.6% to win...
What if AB[dd][dp][pd][pp] are taken as separate nodes AB[dd]AB[dp]AB[pd]AB[pp]?
Amtiskaw
Dies in gote
Posts: 38
Joined: Sun Apr 17, 2016 5:22 am
GD Posts: 0
Has thanked: 4 times
Been thanked: 20 times

Re: Extending SGF?

Post by Amtiskaw »

deungsan wrote:What if AB[dd][dp][pd][pp] are taken as separate nodes AB[dd]AB[dp]AB[pd]AB[pp]?
A semi-colon ; is how SGF specifies a new node has started.

A node with 4 AB values is written like ;AB[dd][dp][pd][pp]

4 nodes, with 1 AB value each, are written like ;AB[dd];AB[dp];AB[pd];AB[pp] (notice the 4 semi-colons)

Finally, ;AB[dd]AB[dp]AB[pd]AB[pp] is actually invalid because of duplicated keys within a node, although many SGF readers will accept it as being 1 node with 4 values.

All of this is just a description of how SGF works already, not a proposal. The AB property for instance is used to set handicap stones.
deungsan
Dies in gote
Posts: 32
Joined: Thu Jan 24, 2019 5:23 pm
GD Posts: 0
Been thanked: 9 times

Re: Extending SGF?

Post by deungsan »

Amtiskaw wrote: A semi-colon ; is how SGF specifies a new node has started.
There are lots of semicolons in a sgf file, but there is no node in sgf. Perhaps a concept of nodes could be a product of some programmers, used in their sgf parser.
Amtiskaw wrote: Finally, ;AB[dd]AB[dp]AB[pd]AB[pp] is actually invalid because of duplicated keys within a node, although many SGF readers will accept it as being 1 node with 4 values.
AB or AW is not a key, but a property name in sgf. In my understanding, sgf is a collection of property name and property value pairs. SGF can be parsed without using node or key. In my program, AB[dd]AB[dp]AB[pd]AB[pp] is not "invalid". By the way on what basis you call this valid or invalid?
Amtiskaw
Dies in gote
Posts: 38
Joined: Sun Apr 17, 2016 5:22 am
GD Posts: 0
Has thanked: 4 times
Been thanked: 20 times

Re: Extending SGF?

Post by Amtiskaw »

https://www.red-bean.com/sgf/sgf4.html
"SGF is a text-only format (not a binary format). It contains game trees, with all their nodes and properties, and nothing more."

[...]

Only one of each property is allowed per node, e.g. one cannot have two comments in one node:
... ; C[comment1] B [dg] C[comment2] ; ...
There's some terminology confusion that maybe I'm contributing to, but property names are what I'm calling keys.

You cannot do ;AB[dd]AB[pp] -- same key twice in a node is disallowed. Each key in a node should be unique.

But you can do ;AB[dd][pp] -- keys can have more than 1 value. Any handicap game will prove this. It's also used for markup, e.g. TR, MA, SQ, CR, AR, and that sort of thing which indicate triangles, crosses, etc. If you want multiple triangles in a node you'll need a key (property name) attached to multiple values.

Many SGF parsers (including mine) will tolerate the first form and treat it like the second.

As a final note, the SGF specs are annoying and will tell you that [dd][pp] is specifying a single value of type "list". While you can think about it that way, it is much, much simpler to consider this form as indicating multiple values. You can dispose of the notion of values having types (every value is a string, really) until you need to interpret them. This is pretty normal to do, e.g. Sabaki's sgf API considers a property to be a key which retrieves an array of strings (i.e. the values).
User avatar
spook
Lives with ko
Posts: 151
Joined: Thu Jul 24, 2014 1:34 pm
Rank: 2d
GD Posts: 0
KGS: LordVader
Location: Belgium
Has thanked: 11 times
Been thanked: 48 times
Contact:

Re: Extending SGF?

Post by spook »

I personally think it's perhaps a good time to drop SGF, and to introduce a new format.

Here is what ZBaduk uses internally : https://gist.github.com/bvandenbon/56e3 ... d7d902d6dc
It's almost a direct translation of SGF to JSON.

One minor tweak in this format, is that it groups those traditional properties in objects.
(This is just a quick copy of the typescript file)

Code: Select all

  public moveProperties: MoveProperties = new MoveProperties();
  public setupProperties: SetupProperties = new SetupProperties();
  public nodeAnnotationProperties: NodeAnnotationProperties = new NodeAnnotationProperties();
  public moveAnnotationProperties: MoveAnnotationProperties = new MoveAnnotationProperties();
  public markupProperties: MarkupProperties = new MarkupProperties();
  public timingProperties: TimingProperties = new TimingProperties();
  public miscProperties: MiscProperties = new MiscProperties();
  public scoreEstimateProperties: ScoreEstimateProperties = new ScoreEstimateProperties();
In fact, I could provide TypeScript interfaces (definition) for the entire thing.
And perhaps that could be the start of an open standard, as an alternative to hard-to-parse SGF.
On top of that, I could provide an SGF parser that loads SGF files to the same structure,
making it 100% compatible.

As for AI properties, I propose this addition:
We could just add a statisticsProperties to the list.
And that object could contain properties like: bot, version, stats (which is an array).
A single stat should contain properties like: winrate, playouts, visits, ...

And that's where I think, these properties don't belong in SGF, but should only exist in this new format.
It's just too hard to create layers (e.g. collections) in the traditional SGF format.
Enjoy LeeLaZero and KataGo from your webbrowser, without installing anything !
https://www.zbaduk.com
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Extending SGF?

Post by lightvector »

If there is going to be a new standard, I hope it will stop using the silly alphabetic encoding for coordinates that only permits board sizes up to 25x25, and just start using zero-indexed or one-indexed integers. That part of the SGF spec confuses me, it specifically reduces the futureproofness and flexibility of the format, increases parsing difficulty (you have to skip the letter 'i'!), and for only at best a slight gain in human readability when end-users actually rarely ever actually try to read an SGF to the point of mentally parsing coordinates anyways.

While we're at it, explicit support for non-square board sizes in the format would also be nice, specifying width and height separately.
User avatar
spook
Lives with ko
Posts: 151
Joined: Thu Jul 24, 2014 1:34 pm
Rank: 2d
GD Posts: 0
KGS: LordVader
Location: Belgium
Has thanked: 11 times
Been thanked: 48 times
Contact:

Re: Extending SGF?

Post by spook »

My personal goal and motivation:
I would like a file format that can be used by Lizzie, ZBaduk, Sabaki, Go Review Partner, and all those AI viewers.

And I have the impression that what holds us back is all about the syntax: ";(X[])".

There have been many attempts to come up with XML formats, but I've never seen a succesful one. That's why I don't want to make it too innovative neither.
I personally, just want to change the structure, not the tags or element names per se.

I could live with numeric coordinates though.

TL;DR:
But If we do make the coordinates numeric, then I propose a 0-based numeric format.
And then reserving -1 ; -1 for a pass. - I think that would be reasonable.
lightvector wrote:While we're at it, explicit support for non-square board sizes in the format would also be nice, specifying width and height separately.
My fear is, that once you start messing with the shape of the board, that <1% of software will implement it. The same goes for 3-color-go, circular boards, boards without edges, not to mention 3D shaped boards, ...
It's only April 1st, one day a year. So, it's a lot of effort for something that will rarely ever be used.
Enjoy LeeLaZero and KataGo from your webbrowser, without installing anything !
https://www.zbaduk.com
Amtiskaw
Dies in gote
Posts: 38
Joined: Sun Apr 17, 2016 5:22 am
GD Posts: 0
Has thanked: 4 times
Been thanked: 20 times

Re: Extending SGF?

Post by Amtiskaw »

lightvector wrote:If there is going to be a new standard, I hope it will stop using the silly alphabetic encoding for coordinates that only permits board sizes up to 25x25, and just start using zero-indexed or one-indexed integers. That part of the SGF spec confuses me, it specifically reduces the futureproofness and flexibility of the format, increases parsing difficulty (you have to skip the letter 'i'!)
This isn't so. Firstly, SGF supports up to size 52, e.g. try (;SZ[52];B[XX];W[dW];B[cc];W[Wd]) in a good viewer, e.g. Sabaki.

Secondly, one does not "skip the i" when parsing. The viewer itself may or may not display coordinates like that, but the SGF format has no concept whatsoever that i is skipped. Try (;SZ[19];B[ii])

Is it possible you're talking about GTP rather than SGF? The two aren't related.

As for spook's ideas, certainly JSON is far nicer to deal with than XML but I'm not exactly seeing the need. SGF is not "hard to parse", it is easy to parse. It is not hard to create collections, it is trivial.
Last edited by Amtiskaw on Fri May 17, 2019 5:08 pm, edited 1 time in total.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: Extending SGF?

Post by Bill Spight »

lightvector wrote:While we're at it, explicit support for non-square board sizes in the format would also be nice, specifying width and height separately.

(;FF[4]ST[2]GM[1]CA[UTF-8]AP[GOWrite:3.0.15]SZ[5:7]PM[2]FG[259:]PB[ ]PW[ ]GN[ ]
)

5x7 board. :)
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Extending SGF?

Post by lightvector »

Amtiskaw wrote:
lightvector wrote:If there is going to be a new standard, I hope it will stop using the silly alphabetic encoding for coordinates that only permits board sizes up to 25x25, and just start using zero-indexed or one-indexed integers. That part of the SGF spec confuses me, it specifically reduces the futureproofness and flexibility of the format, increases parsing difficulty (you have to skip the letter 'i'!)
This isn't so. Firstly, SGF supports up to size 52, e.g. try (;SZ[52];B[XX];W[dW];B[cc];W[Wd]) in a good viewer, e.g. Sabaki.

Secondly, one does not "skip the i" when parsing. The viewer itself may or may not display coordinates like that, but the SGF format has no concept whatsoever that i is skipped. Try (;SZ[19];B[ii])

Is it possible you're talking about GTP rather than SGF? The two aren't related.

As for spook's ideas, certainly JSON is far nicer to deal with than XML but I'm not exactly seeing the need. SGF is not "hard to parse", it is easy to parse. It is not hard to create collections, it is trivial.
Bill Spight wrote:
lightvector wrote:While we're at it, explicit support for non-square board sizes in the format would also be nice, specifying width and height separately.

(;FF[4]ST[2]GM[1]CA[UTF-8]AP[GOWrite:3.0.15]SZ[5:7]PM[2]FG[259:]PB[ ]PW[ ]GN[ ]
)

5x7 board. :)
Oh, sorry, yep that would be GTP rather than SGF. But yeah, the fact that GTP has these encoding restrictions is really annoying. Looks like SGF is actually a bit nicer, so ignore my previous post. :)
User avatar
spook
Lives with ko
Posts: 151
Joined: Thu Jul 24, 2014 1:34 pm
Rank: 2d
GD Posts: 0
KGS: LordVader
Location: Belgium
Has thanked: 11 times
Been thanked: 48 times
Contact:

Re: Extending SGF?

Post by spook »

For sure, SGF is simple and is made to store sequences. And it may appear like SGF is just what we need.
Amtiskaw wrote:As for spook's ideas, certainly JSON is far nicer to deal with than XML but I'm not exactly seeing the need. SGF is not "hard to parse", it is easy to parse. It is not hard to create collections, it is trivial.
If we get a little more technical it may become obvious.
What LeeLa Zero returns is something like this:

Code: Select all

info move Q16 visits 33 winrate 4346 prior 1673 lcb 4299 order 0 pv Q16 D4 D16 Q4 C6 C14 R6 R14 O3 info move D16 visits 33 winrate 4347 prior 1662 lcb 4298 order 1 pv D16 Q4 Q16 D4 C6 C14 R6 R14 O3 info move Q4 visits 33 winrate 4349 prior 1663 lcb 4293 order 2 pv Q4 D16 D4 Q16 R14 R6 C14 C6 O17 info move D4 visits 29 winrate 4340 prior 1644 lcb 4280 order 3 pv D4 Q16 D16 Q4 O17 F17 O3 F3 R6
This data is an array in itself.
Each element of this array has the following properties: move, winrate, priority, lcb, order and a sequence (which is a list of moves on itself).

We have to keep in mind that other AIs will have overlap, but may have more or less properties. I don't think you want to define a standard and dedicate it to 1 bot.
So, it should be very flexible.

So, what I propose in JSON is:

Code: Select all

stats: [
{
  move: "Q16",
  visits: 33, 
  winrate: 43.46,
  priority: 1673,
  lcb: 42.99,
  order: 2
  prediction: [Q16 D4 D16 Q4 C6 C14 R6 R14 O3]
},
{
  move: "D16",
  visits: 33, 
  winrate: 43.47,
  priority: 1662,
  lcb: 42.98,
  order: 1
  prediction: [D16 Q4 Q16 D4 C6 C14 R6 R14 O3]
},
...
]
Let's assume that we want to store information about a different kind of bot. (e.g. AlphaGo)
If it only mentions winrates, it could look like this:

Code: Select all

stats: [
{
  move: "Q16",
  winrate: 43.46,
},
{
  move: "D16",
  winrate: 43.47,
},
...
]
Now, let's continue and make things just a little more complicated.
In future, you may want to go 1 step further, and store statistics of multiple bots inside the same file, but still keeping them seperate:
So, for each move you would have:

Code: Select all

botStats: [
{
  bot: "LeeLa Zero",
  version: "0.16 weightXyZ",
  stats: [ ... ]
},
{
  bot: "AlphaGo",
  version: "Master",
  stats: [ ... ]
}
]

There is nothing in SGF that resembles this even a little. This is a totally new kind of structure. On top of that, it would be hard to keep SGF backwards compatible. Software developers aren't supposed to write their own XML or JSON parser. Nevertheless, each Baduk related software project has its own SGF parser. And as a result there are over 100 implementations of SGF parsers. The problem being: each one of these has small variations and trade-offs in how they handle ";[]\/()" characters in comments. So, if you try to create a new structure, there is a reasonable chance that you will break existing software. It's a minefield.
Enjoy LeeLaZero and KataGo from your webbrowser, without installing anything !
https://www.zbaduk.com
User avatar
spook
Lives with ko
Posts: 151
Joined: Thu Jul 24, 2014 1:34 pm
Rank: 2d
GD Posts: 0
KGS: LordVader
Location: Belgium
Has thanked: 11 times
Been thanked: 48 times
Contact:

Re: Extending SGF?

Post by spook »

... or basically what John says: :bow:
John Fairbairn wrote:Not all sgf editors read sgf files correctly. In fact, I'm told that very, very few do. Even the Eidogo one used on this forum seems to choke on quite a few things.

So extending the format seems to be a recipe for more confusion. A fresh start using e.g. xml may be wiser.

One simple solution for the OP seems to be to use the C[ ] property. The info he wants can be considered a kind of comment anyway, but he can also bracket it in some coded way within the C[] text so that the info can be used or extracted programmatically.
Last year I might have gone for XML.
But now I would go for JSON. :)

(PS: If you only need it programmatically, encoding with base64 is also a good trick to avoid conflicts.)
Enjoy LeeLaZero and KataGo from your webbrowser, without installing anything !
https://www.zbaduk.com
Amtiskaw
Dies in gote
Posts: 38
Joined: Sun Apr 17, 2016 5:22 am
GD Posts: 0
Has thanked: 4 times
Been thanked: 20 times

Re: Extending SGF?

Post by Amtiskaw »

spook wrote:each one of these has small variations and trade-offs in how they handle ";[]\/()" characters in comments.
When writing, only ] and \ need to be escaped. When reading, just accept whatever follows a \ character as being part of the comment.

Of course there are additional problems when the data isn't UTF-8, but mandating UTF-8 would fix SGF's biggest problems.

Still, I agree SGF has no nice way to record bot analysis, especially from 2 or more bots at once.
Post Reply