SGF Grammar Ambiguous?

For discussing go computing, software announcements, etc.
Post Reply
hyperpape
Tengen
Posts: 4382
Joined: Thu May 06, 2010 3:24 pm
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 4k
Location: Caldas da Rainha, Portugal
Has thanked: 499 times
Been thanked: 727 times

SGF Grammar Ambiguous?

Post by hyperpape »

I've been interested in parsing lately, and was looking at the SGF spec. Am I right in thinking that the grammar is ambiguous, because for an unspecified PropertyIdentifier, the PropertyValue could (so long as it has the right form) be either Text (of either form) or one of the other property value types?

This is not a particularly serious ambiguity--one could just parse the node as Text and pass it along, leaving the consumer of that information to decide how to handle it.
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: SGF Grammar Ambiguous?

Post by Kirby »

When you say "unspecified propertyidentifier", do you mean the identifier for a private property? Because the spec says that private properties are defined by the application, and that sgf reader implementations should skip unknown property types (and issue a warning).

In this case, you know the type of your own defined private property.
be immersed
User avatar
HermanHiddema
Gosei
Posts: 2011
Joined: Tue Apr 20, 2010 10:08 am
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Location: Groningen, NL
Has thanked: 202 times
Been thanked: 1086 times

Re: SGF Grammar Ambiguous?

Post by HermanHiddema »

Actually, unknown properties should be preserved: http://www.red-bean.com/sgf/sgf4.html#2.2.3

The best way to go about it is indeed to treat it as Text. Only Text and SimpleText are allowed to be in any specific character encoding (CA property), and Text is more permissive on line breaks and such, so if you parse it as Text, using the given CA, then you should be fine and it should be safe to write the value out again even if you use a different character encoding.
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: SGF Grammar Ambiguous?

Post by Kirby »

HermanHiddema wrote:Actually, unknown properties should be preserved: http://www.red-bean.com/sgf/sgf4.html#2.2.3

The best way to go about it is indeed to treat it as Text. Only Text and SimpleText are allowed to be in any specific character encoding (CA property), and Text is more permissive on line breaks and such, so if you parse it as Text, using the given CA, then you should be fine and it should be safe to write the value out again even if you use a different character encoding.

Good call, my bad.
be immersed
User avatar
HermanHiddema
Gosei
Posts: 2011
Joined: Tue Apr 20, 2010 10:08 am
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Location: Groningen, NL
Has thanked: 202 times
Been thanked: 1086 times

Re: SGF Grammar Ambiguous?

Post by HermanHiddema »

BTW, if you are using regular expressions to parse things, here's the one I used to parse PropValues:

\[(?:\\.|[^\\\]])*\]

This matches from an opening bracket [ until a closing bracket ], ignoring escaped closing brackets \] in between, which is a rather trickier thing than you'd think, in regular expressions. :)
User avatar
Li Kao
Lives in gote
Posts: 643
Joined: Wed Apr 21, 2010 10:37 am
Rank: KGS 3k
GD Posts: 0
KGS: LiKao / Loki
Location: Munich, Germany
Has thanked: 115 times
Been thanked: 102 times

Re: SGF Grammar Ambiguous?

Post by Li Kao »

From what I remember treating SimpleText as Text and treating Text as SimpleText both break some corner cases. The way I'd treat is using a RawText type that preserves all linebreaks as is, and only convert to either Text/SimpleText when you know which one is applicable.

The two problem cases I remember were `:`, which sometimes separates composite properties, and sometimes is a literal `:`. And a backslash followed by a linebreak, which is removed in Text properties(soft linebreak), but gets replaced by a space in SimpleText (At least that's how I read the spec, it might not be what was intended).

So I'd keep properties as a raw text, and offer three helper methods:

1. ToSimpleText
2. ToText
3. SplitComplex, which splits on unescaped `:` and unescapes the escaped `:`s.
Sanity is for the weak.
hyperpape
Tengen
Posts: 4382
Joined: Thu May 06, 2010 3:24 pm
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 4k
Location: Caldas da Rainha, Portugal
Has thanked: 499 times
Been thanked: 727 times

Re: SGF Grammar Ambiguous?

Post by hyperpape »

Thanks everyone.

A related question: has anyone compiled an SGF "bestiary" to do tests on? I suppose I might need to move this thread to the computer go list.
hyperpape
Tengen
Posts: 4382
Joined: Thu May 06, 2010 3:24 pm
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 4k
Location: Caldas da Rainha, Portugal
Has thanked: 499 times
Been thanked: 727 times

Re: SGF Grammar Ambiguous?

Post by hyperpape »

There is another pain point: since Compose values are split via ":", and in Text or SimpleText nodes, ":" need not be escaped, there is no way to reliably distinguish between a Compose value and either kind of text value, except by making your parser know which properties take on Compose values.
Post Reply