Life In 19x19 http://www.lifein19x19.com/ |
|
SGF Grammar Ambiguous? http://www.lifein19x19.com/viewtopic.php?f=18&t=6860 |
Page 1 of 1 |
Author: | hyperpape [ Sun Sep 30, 2012 8:00 pm ] |
Post subject: | SGF Grammar Ambiguous? |
I've been interested in parsing lately, and was looking at the SGF spec. Am I right in thinking that the grammar is ambiguous, because for an unspecified PropertyIdentifier, the PropertyValue could (so long as it has the right form) be either Text (of either form) or one of the other property value types? This is not a particularly serious ambiguity--one could just parse the node as Text and pass it along, leaving the consumer of that information to decide how to handle it. |
Author: | Kirby [ Sun Sep 30, 2012 9:06 pm ] |
Post subject: | Re: SGF Grammar Ambiguous? |
When you say "unspecified propertyidentifier", do you mean the identifier for a private property? Because the spec says that private properties are defined by the application, and that sgf reader implementations should skip unknown property types (and issue a warning). In this case, you know the type of your own defined private property. |
Author: | HermanHiddema [ Mon Oct 01, 2012 1:17 am ] |
Post subject: | Re: SGF Grammar Ambiguous? |
Actually, unknown properties should be preserved: http://www.red-bean.com/sgf/sgf4.html#2.2.3 The best way to go about it is indeed to treat it as Text. Only Text and SimpleText are allowed to be in any specific character encoding (CA property), and Text is more permissive on line breaks and such, so if you parse it as Text, using the given CA, then you should be fine and it should be safe to write the value out again even if you use a different character encoding. |
Author: | Kirby [ Mon Oct 01, 2012 4:32 am ] |
Post subject: | Re: SGF Grammar Ambiguous? |
HermanHiddema wrote: Actually, unknown properties should be preserved: http://www.red-bean.com/sgf/sgf4.html#2.2.3 The best way to go about it is indeed to treat it as Text. Only Text and SimpleText are allowed to be in any specific character encoding (CA property), and Text is more permissive on line breaks and such, so if you parse it as Text, using the given CA, then you should be fine and it should be safe to write the value out again even if you use a different character encoding. Good call, my bad. |
Author: | HermanHiddema [ Mon Oct 01, 2012 4:51 am ] |
Post subject: | Re: SGF Grammar Ambiguous? |
BTW, if you are using regular expressions to parse things, here's the one I used to parse PropValues: \[(?:\\.|[^\\\]])*\] This matches from an opening bracket [ until a closing bracket ], ignoring escaped closing brackets \] in between, which is a rather trickier thing than you'd think, in regular expressions. ![]() |
Author: | Li Kao [ Mon Oct 01, 2012 5:21 am ] |
Post subject: | Re: SGF Grammar Ambiguous? |
From what I remember treating SimpleText as Text and treating Text as SimpleText both break some corner cases. The way I'd treat is using a RawText type that preserves all linebreaks as is, and only convert to either Text/SimpleText when you know which one is applicable. The two problem cases I remember were `:`, which sometimes separates composite properties, and sometimes is a literal `:`. And a backslash followed by a linebreak, which is removed in Text properties(soft linebreak), but gets replaced by a space in SimpleText (At least that's how I read the spec, it might not be what was intended). So I'd keep properties as a raw text, and offer three helper methods: 1. ToSimpleText 2. ToText 3. SplitComplex, which splits on unescaped `:` and unescapes the escaped `:`s. |
Author: | hyperpape [ Mon Oct 01, 2012 6:00 am ] |
Post subject: | Re: SGF Grammar Ambiguous? |
Thanks everyone. A related question: has anyone compiled an SGF "bestiary" to do tests on? I suppose I might need to move this thread to the computer go list. |
Author: | hyperpape [ Wed Oct 24, 2012 8:57 am ] |
Post subject: | Re: SGF Grammar Ambiguous? |
There is another pain point: since Compose values are split via ":", and in Text or SimpleText nodes, ":" need not be escaped, there is no way to reliably distinguish between a Compose value and either kind of text value, except by making your parser know which properties take on Compose values. |
Page 1 of 1 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |