It is currently Fri May 02, 2025 1:27 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
Offline
 Post subject: SGF Grammar Ambiguous?
Post #1 Posted: Sun Sep 30, 2012 8:00 pm 
Tengen

Posts: 4382
Location: Caldas da Rainha, Portugal
Liked others: 499
Was liked: 733
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 4k
I've been interested in parsing lately, and was looking at the SGF spec. Am I right in thinking that the grammar is ambiguous, because for an unspecified PropertyIdentifier, the PropertyValue could (so long as it has the right form) be either Text (of either form) or one of the other property value types?

This is not a particularly serious ambiguity--one could just parse the node as Text and pass it along, leaving the consumer of that information to decide how to handle it.

_________________
Occupy Babel!

Top
 Profile  
 
Offline
 Post subject: Re: SGF Grammar Ambiguous?
Post #2 Posted: Sun Sep 30, 2012 9:06 pm 
Honinbo

Posts: 9552
Liked others: 1602
Was liked: 1712
KGS: Kirby
Tygem: 커비라고해
When you say "unspecified propertyidentifier", do you mean the identifier for a private property? Because the spec says that private properties are defined by the application, and that sgf reader implementations should skip unknown property types (and issue a warning).

In this case, you know the type of your own defined private property.

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: SGF Grammar Ambiguous?
Post #3 Posted: Mon Oct 01, 2012 1:17 am 
Gosei
User avatar

Posts: 2011
Location: Groningen, NL
Liked others: 202
Was liked: 1087
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Actually, unknown properties should be preserved: http://www.red-bean.com/sgf/sgf4.html#2.2.3

The best way to go about it is indeed to treat it as Text. Only Text and SimpleText are allowed to be in any specific character encoding (CA property), and Text is more permissive on line breaks and such, so if you parse it as Text, using the given CA, then you should be fine and it should be safe to write the value out again even if you use a different character encoding.


This post by HermanHiddema was liked by: Kirby
Top
 Profile  
 
Offline
 Post subject: Re: SGF Grammar Ambiguous?
Post #4 Posted: Mon Oct 01, 2012 4:32 am 
Honinbo

Posts: 9552
Liked others: 1602
Was liked: 1712
KGS: Kirby
Tygem: 커비라고해
HermanHiddema wrote:
Actually, unknown properties should be preserved: http://www.red-bean.com/sgf/sgf4.html#2.2.3

The best way to go about it is indeed to treat it as Text. Only Text and SimpleText are allowed to be in any specific character encoding (CA property), and Text is more permissive on line breaks and such, so if you parse it as Text, using the given CA, then you should be fine and it should be safe to write the value out again even if you use a different character encoding.



Good call, my bad.

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: SGF Grammar Ambiguous?
Post #5 Posted: Mon Oct 01, 2012 4:51 am 
Gosei
User avatar

Posts: 2011
Location: Groningen, NL
Liked others: 202
Was liked: 1087
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
BTW, if you are using regular expressions to parse things, here's the one I used to parse PropValues:

\[(?:\\.|[^\\\]])*\]

This matches from an opening bracket [ until a closing bracket ], ignoring escaped closing brackets \] in between, which is a rather trickier thing than you'd think, in regular expressions. :)

Top
 Profile  
 
Offline
 Post subject: Re: SGF Grammar Ambiguous?
Post #6 Posted: Mon Oct 01, 2012 5:21 am 
Lives in gote
User avatar

Posts: 643
Location: Munich, Germany
Liked others: 115
Was liked: 102
Rank: KGS 3k
KGS: LiKao / Loki
From what I remember treating SimpleText as Text and treating Text as SimpleText both break some corner cases. The way I'd treat is using a RawText type that preserves all linebreaks as is, and only convert to either Text/SimpleText when you know which one is applicable.

The two problem cases I remember were `:`, which sometimes separates composite properties, and sometimes is a literal `:`. And a backslash followed by a linebreak, which is removed in Text properties(soft linebreak), but gets replaced by a space in SimpleText (At least that's how I read the spec, it might not be what was intended).

So I'd keep properties as a raw text, and offer three helper methods:

1. ToSimpleText
2. ToText
3. SplitComplex, which splits on unescaped `:` and unescapes the escaped `:`s.

_________________
Sanity is for the weak.

Top
 Profile  
 
Offline
 Post subject: Re: SGF Grammar Ambiguous?
Post #7 Posted: Mon Oct 01, 2012 6:00 am 
Tengen

Posts: 4382
Location: Caldas da Rainha, Portugal
Liked others: 499
Was liked: 733
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 4k
Thanks everyone.

A related question: has anyone compiled an SGF "bestiary" to do tests on? I suppose I might need to move this thread to the computer go list.

_________________
Occupy Babel!

Top
 Profile  
 
Offline
 Post subject: Re: SGF Grammar Ambiguous?
Post #8 Posted: Wed Oct 24, 2012 8:57 am 
Tengen

Posts: 4382
Location: Caldas da Rainha, Portugal
Liked others: 499
Was liked: 733
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 4k
There is another pain point: since Compose values are split via ":", and in Text or SimpleText nodes, ":" need not be escaped, there is no way to reliably distinguish between a Compose value and either kind of text value, except by making your parser know which properties take on Compose values.

_________________
Occupy Babel!

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group