Beautiful SGF indentation

anazawa · #1

Hi,

I'm writing a JavaScript SGF parser/stringifier that supports FF[1], FF[3] and FF[4].
The stringify method takes optional "space" parameter that is used to insert white space into
the output SGF string, though it's not implemented yet.

I want the output SGF string to satisfy the SGF specification:

FF[1]

Quote:

Spaces, tabs, line breaks and so on can be inserted anywhere between properties and are also ignored.
http://www.red-bean.com/sgf/ff1_3/ff1.html

FF[3]

Quote:

"White space" (Spaces, tabs, carriage return, line feed, line breaks, vertical tab and so on) can be inserted before the first opening parenthesis or anywhere between properties and are ignored when reading a file.
http://www.red-bean.com/sgf/ff1_3/ff3.html

FF[4]

Quote:

White space (space, tab, carriage return, line feed, vertical tab and so on) may appear anywhere between PropValues, Properties, Nodes, Sequences and GameTrees.
http://www.red-bean.com/sgf/sgf4.html

[off topic]

By the way, the above description is a little bit ambiguous for me.
I'm not sure about where I can/can't insert white spaces exactly.
Do all of the following strings satisfy the specification?

FF[1]

Code:

( ; B [pd] ( ; W [qp] ) )

FF[3]

Code:

( ; FF [3] B [pd] ( ; W [qp] ) )

FF[4]

Code:

( ; FF [4] B [pd] ( ; W [qp] ) )

[/off topic]

What is your favorite SGF indentation?
My favorite one is something like this, while I'm not sure about whether it's valid SGF or not

Code:

(
  ;FF[4]
   C[root]
   B[pd]
  ;C[a]
   W[qp]
  (
    ;C[b]
     B[cd]
    ;C[c]
     W[dp]
  )
)

DrStraw · #2

I played around with this a few years ago. The following is how I formatted the first of my counting lessons. I have replaced all comments with simple text and truncated most sequences, so this is not a real game file, but it shows how I formatted it.

YeGO · #3

Very cool project!

By the way, I've been working on a javascript SGF editor and as part of that, I've also implemented an SGF parser. I will let you know when my project is ready (planning on a free, open-source release soon).

Based on the statement "White space (space, tab, carriage return, line feed, vertical tab and so on) may appear anywhere between PropValues, Properties, Nodes, Sequences and GameTrees." that you quoted from the spec for FF[4], it would seem that all of the SGF examples that you gave below have valid syntax. The specs for FF[3] and FF[1] are somewhat ambiguous, but I think the same was intended.

I've also been thinking about how to do line wrap and potentially indentation for SGF composition. My current SGF composer just produces ugly output with no line breaks at the moment. I think your example looks nice, but here are some thoughts/tweaks:
- Perhaps the ordering of properties could be tweaked (e.g., placing the move first, then markup, then comment).
- Perhaps some line breaks between properties within a node could be omitted (e.g., for the root node, I like the idea of just putting FF, GM, CA, AP, ST, SZ all on the first line together).
- For comments, one has to consider how to line wrap those. The SGF standard allows for "soft line breaks" to break up a text value across multiple lines, which may be helpful.

I also had some comments on the structure produced by the parser. The output structure seems to simply mirror the grammatical structure of the SGF file, which might make it a little tedious to traverse the tree structure of the game. An alternative output structure could simplify away the "GameTree" substructure and provide nodes in a tree structure that directly mirrors the structure of the game and its variations. For example:

Code:

Collection:
  [
    [Node], // Root nodes of gametrees
    [Node],
    ...
    [Node]
  ]

Node:
  { // Members are properties of node
    FF: 4,
    B: "pd"
      [ // List of child nodes (0 or more)
        [Node],
        [Node],
        ...
        [Node]
      ]
  }

YeGO · #4

To expand upon my previous post, I'm suggesting that parsing this SGF file (example from http://www.red-bean.com/sgf/sgf4.html#1):

Code:

(;FF[4]C[root](;C[a];C[b](;C[c])
(;C[d];C[e]))
(;C[f](;C[g];C[h];C[i])
(;C[j])))

Could result in something like the below structure, which may be easier to traverse and obviates the need for the game tree substructure:

Code:

{
  FF: 4,
  C: "root",
  children:
    [
      {
        C: "a",
        children:
          [
            {
              C: "b",
              children:
                [
                  {
                    C: "c",
                    children: []
                  },
                  {
                    C: "d",
                    children:
                      [
                        {
                          C: "e",
                          children: []
                        }
                      ]
                  },
                ]
            }
          ]
      },
      {
        C: "f"
        children:
          [
            {
              C: "g"
              children:
                [
                  {
                    C: "h"
                    children:
                      [
                        {
                          C: "i"
                          children: []
                        }
                      ]
                  }
                ]
            },
            {
              C: "j"
              children: []
            }
          ]
      }
    ]
}

anazawa · #5

DrStraw wrote:

I played around with this a few years ago. The following is how I formatted the first of my counting lessons. I have replaced all comments with simple text and truncated most sequences, so this is not a real game file, but it shows how I formatted it.

Thanks for your comment. I didn't know SGF is so readable.
Move properties are placed at the head of Node, and so we can easily follow the sequence.
The root node is also impressive. GM, FF and CA properties summarize the SGF file itself,
and so they are placed at the very beginning of the file.

I also didn't know the length of a Move property (B[]/W[]) is exactly equal to 5 (except for pass move).
I believe this fact helps you indent CR, LB and C properties consistently.

Thanks a lot!

anazawa · #6

@YeGo

Thanks for you suggestion

Quote:

Based on the statement "White space (space, tab, carriage return, line feed, vertical tab and so on) may appear anywhere between PropValues, Properties, Nodes, Sequences and GameTrees." that you quoted from the spec for FF[4], it would seem that all of the SGF examples that you gave below have valid syntax. The specs for FF[3] and FF[1] are somewhat ambiguous, but I think the same was intended.

I hope so, too. Since FF[3] and FF[1] are ambiguous, comparing to FF[4],
what a valid SGF indentation is depends on the reader. When we generate FF[3]/FF[1] files,
we should handle white spaces conservatively, i.e. no white spaces.

Quote:

Perhaps the ordering of properties could be tweaked (e.g., placing the move first, then markup, then comment).

I thinks so, too. DrStraw's SGF is very readable because properties are sorted nicely.

Quote:

Perhaps some line breaks between properties within a node could be omitted (e.g., for the root node, I like the idea of just putting FF, GM, CA, AP, ST, SZ all on the first line together).

I like that style, too. When I write a SGF by hand, I'll do so.
But when I write a program that generate a SGF file, I will not implement the feature.
I think that's kind of "pretty-print" program's job. Sorting properties may be one of them.
One-property-per-line would be helpful to debug the SGF generator, though.

Quote:

For comments, one has to consider how to line wrap those. The SGF standard allows for "soft line breaks" to break up a text value across multiple lines, which may be helpful.

Handling soft line breaks is the SGF parser/composer's job because they mean nothing to the users.
I like one-property-per line, and so I'll get rid of them. What do you think?

(continue to next post)

anazawa · #7

YeGO wrote:

I also had some comments on the structure produced by the parser. The output structure seems to simply mirror the grammatical structure of the SGF file, which might make it a little tedious to traverse the tree structure of the game. An alternative output structure could simplify away the "GameTree" substructure and provide nodes in a tree structure that directly mirrors the structure of the game and its variations.

That's true. The data structure that I proposed is far from a usual tree structure.
While I'm not really an expert on a tree structure, I believe the data structure
that you proposed can represent any n-ary tree.

As I wrote in HISTORY of README, it's based on a Perl module written in 2008.
I've been the user of that module, and so wrote a JavaScript port of the module,
though JavaScript one is not compatible with Perl one. I like the Perl module
because it's optimized for SGF, I mean, the data structure mirrors the concept of
Sequence defined by SGF. Most of SGF files look like this:

Code:

(
  ;FF[4]
   B[pd]
  ;W[qp]
  ;B[cd]
  ...
)

It's just a sequence of nodes, and so it should be represented by a list of nodes.
That's the whole idea. However, it also has its cons. In fact, you feel difficulty
in editing the data structure (BTW, I wanna see your SGF editor so that I can learn
something from your code :).

Anyway, the users should not touch the raw data structure. I'm writing the visitor/
iterator to encapsulate the structure. If I felt difficulty in writing the code,
I might modify the structure. I might adopt yours. I don't know.

YeGO · #8

Quote:

BTW, I wanna see your SGF editor so that I can learn
something from your code

Thanks for the interest in my coding project. My editor is still a work in progress, but once it is ready for release, I will announce it on this forum, and be happy to see people learn from it and build off it. I plan to release my code under a free, copy-left style license (most likely GPL or AGPL). I will certainly notify you when my code has been released.

Quote:

Handling soft line breaks is the SGF parser/composer's job because they mean nothing to the users.
I like one-property-per line, and so I'll get rid of them. What do you think?

What I meant is that dealing with very long comments without line breaks or ones that inherently contain new lines poses some additional challenges and design considerations for indenting SGF. It's quite common to see paragraphs of text within comments for annotated games. With the first case, most text editor/viewer would wrap long lines, which visually disrupts your indentation scheme. In the second case, one can't remove the hard line breaks, which are presumably desired by the user. Also, some text editors/viewers struggle with very long lines of text that are not naturally broken up with line breaks. I was suggesting that soft line breaks could be one way to address this last issue, but also raises some further design questions.

Quote:

I like the Perl module because it's optimized for SGF, I mean, the data structure mirrors the concept of
Sequence defined by SGF. Most of SGF files look like this ... It's just a sequence of nodes, and so it should be represented by a list of nodes.
That's the whole idea. However, it also has its cons.

If all SGF files were simply a linear record with zero variations, then of course it would be most natural to represent everything as a list of nodes. However, as soon as you allow for variations, you have to deal with an inherent tree structure. Then you are forced to created a hybrid tree of lists of nodes, which is more complex to traverse that a simple tree of nodes.

The structure of a game record is inherently a tree and can be easily represented as just a tree of nodes. Even a game record that is just one linear sequence with no variations is still a tree (just with each node having exactly one child, except for the last node which has no children). SGF is just representing tree of nodes in pre-order while dropping unnecessary nesting parentheses.

I think the original Perl module, and other projects inspired by it, have made a design error in misunderstanding the grammar of SGF to imply a more complex data structure than intended, which is a simple tree of nodes (see http://www.red-bean.com/sgf/var.htm). The lack of unnecessary parentheses (obviated by the use of semi-colons prefixing each node) perhaps is the source of this confusion.

Consider these example grammatical structures for trees represented in pre-order:

Code:

(root(a(b(c)(d(e))))(f(g(h(i)))(j)))
(;root(;a(;b(;c)(;d;e)))(;f(;g;h;i)(;j)))

Both represent the same tree structure, but the first uses nested parentheses, whereas the second simply has removed the parentheses made unnecessary by the semi-colon prefixes. In this example, since there is a high branching factor, the second format (which is the one used by SGF) turns out to be slightly less efficient, but it has clear benefits in clarity (in removing some unnecessary parentheses) and efficiency for trees with lower branching factors.

Quote:

While I'm not really an expert on a tree structure, I believe the data structure that you proposed can represent any n-ary tree.

Yes, of course, the basic tree structure can handle general (non-binary) trees as well; I was just using a simple binary example (taken from http://www.red-bean.com/sgf/sgf4.html#1) to illustrate.

The Eidogo code gives an example of how to parse into a basic tree structure:
https://github.com/jkk/eidogo/blob/mast ... /js/sgf.js

Also, here are some other related open-source projects:
https://github.com/Kashomon/glift
https://github.com/IlyaKirillov/GoProject

YeGO · #9

anazawa wrote:

Anyway, the users should not touch the raw data structure. I'm writing the visitor/
iterator to encapsulate the structure. If I felt difficulty in writing the code,
I might modify the structure. I might adopt yours. I don't know.

Why shouldn't the users (presumably those using this code as a library to parse SGF files), directly have access to the data structure? If a further layer, provided by a visitor/iterator, is needed to encapsulate the structure, then the output is inherently this encapsulation. Does this encapsulation basically provide a tree of nodes type of abstraction? If that's the case, why not just make the underlying data structure a tree of nodes and give that directly to the user?

Note that is also very easy to simply convert a tree of nodes backs into the SGF format, so keeping the underlying data structure something that closely resembles the grammar is not necessary for simple composition.

anazawa · **#10**

Thanks for your interesting suggestion :)

YeGo wrote:

What I meant is that dealing with very long comments without line breaks or ones that inherently contain new lines poses some additional challenges and design considerations for indenting SGF. It's quite common to see paragraphs of text within comments for annotated games. With the first case, most text editor/viewer would wrap long lines, which visually disrupts your indentation scheme. In the second case, one can't remove the hard line breaks, which are presumably desired by the user. Also, some text editors/viewers struggle with very long lines of text that are not naturally broken up with line breaks. I was suggesting that soft line breaks could be one way to address this last issue, but also raises some further design questions.

That's true, though I personally can accept the problem. I feel the following SGF is not so ugly:

Code:

<- editor width ->

(
  ;FF[4]
   C[line 1
line 2
looooooooooooooong
line wrapped by
editor]
   B[pd]
  ;W[qp]
)

How do you feel about the above case? We may be able to learn something from
JSON.stringify that has to handle a long string.

YeGo wrote:

If all SGF files were simply a linear record with zero variations, then of course it would be most natural to represent everything as a list of nodes. However, as soon as you allow for variations, you have to deal with an inherent tree structure. Then you are forced to created a hybrid tree of lists of nodes, which is more complex to traverse that a simple tree of nodes.

The structure of a game record is inherently a tree and can be easily represented as just a tree of nodes. Even a game record that is just one linear sequence with no variations is still a tree (just with each node having exactly one child, except for the last node which has no children). SGF is just representing tree of nodes in pre-order while dropping unnecessary nesting parentheses.

I think the original Perl module, and other projects inspired by it, have made a design error in misunderstanding the grammar of SGF to imply a more complex data structure than intended, which is a simple tree of nodes (see http://www.red-bean.com/sgf/var.htm). The lack of unnecessary parentheses (obviated by the use of semi-colons prefixing each node) perhaps is the source of this confusion.

I prefer practicality to universality in this case since we're not handling a generic tree
structure but a go game record played by human being. Though I'm not sure how computers think
about the next move, I think a sequence is the unit of the game record, not a single move.
Joseki is the good example. It's a set of sequences. I believe we can rebuild SGF considering
a sequence as the node of the game tree. At the risk of being misunderstood, I think the Perl
module adopted a sequence-oriented data structure. That's why I like it. Note that the Perl
data structure can be always converted into the data structure that your proposed , and vice versa.

YeGo wrote:

Consider these example grammatical structures for trees represented in pre-order:

Code:

(root(a(b(c)(d(e))))(f(g(h(i)))(j)))
(;root(;a(;b(;c)(;d;e)))(;f(;g;h;i)(;j)))

Both represent the same tree structure, but the first uses nested parentheses, whereas the second simply has removed the parentheses made unnecessary by the semi-colon prefixes. In this example, since there is a high branching factor, the second format (which is the one used by SGF) turns out to be slightly less efficient, but it has clear benefits in clarity (in removing some unnecessary parentheses) and efficiency for trees with lower branching factors.

That's totally true. Thanks to your clear explanation, we can understand why SGF was designed so.

YeGo wrote:

The Eidogo code gives an example of how to parse into a basic tree structure:
https://github.com/jkk/eidogo/blob/mast ... /js/sgf.js

Also, here are some other related open-source projects:
https://github.com/Kashomon/glift
https://github.com/IlyaKirillov/GoProject

Yeah, I've already read the (part of) code. They are great projects :)
My parser is something like this:
https://github.com/anazawa/sgf.js/blob/master/sgf.js#L951

YeGO wrote:

Why shouldn't the users (presumably those using this code as a library to parse SGF files), directly have access to the data structure? If a further layer, provided by a visitor/iterator, is needed to encapsulate the structure, then the output is inherently this encapsulation. Does this encapsulation basically provide a tree of nodes type of abstraction? If that's the case, why not just make the underlying data structure a tree of nodes and give that directly to the user?

Because it's like touching Rack's response array instead of using their middlewares or Ruby on Rails that is
a wrapper around Rack. And furthermore, SGF property names such as PB or PW are far from user friendly.
I'm talking about the user experience. If the user was a SGF expert like you, he/she would think
it's unnecessary to encapsulate the data structure. However, most of them including me are not.

Bantari · **#11**

I am not sure I understand this issue... you want to make SGF code more human-readable? Is that it?

Here is what I think:
SGF, as a format, is not designed to be human readable, but parser-readable.
If you really want something that a human can load into a text editor and easily read, why not just create some kind of YAML-like format (GAML?) for SGF and a parser that converts GAML to SGF and back. Then you can worry how to make this beautiful.

Personally, even if I look at an SGF file, with everything aligned and "beautiful" - i still cannot follow the game. The only readon I can possibly think of doing that would be to try to manually enter comments. But then - why not just use an SGF editor to do that?

I honestly see no value in maiing the indentation "beautiful" - whatever that word means in this context.

As a coder myself, I can see where you can take pleasure in your generated code is clean and proper, so I sort-of get what you feel in an abstract sense. But from practical point of view, I really scratch my head and wonder: why would somebody ask such question?

Having said that, from a coder's perspective, DrStraw's format looks the cleanest to me. And so this is what I would strive for. If I had too much time on my hands.

anazawa · **#12**

@Bantari

Thanks for your suggestion

SGF is a text-only format and certainly allows us to indent the text.
This means, given a game record, there should exist various styles to indent
the game record, not only one style. If so, I thought there should exist
the well-used/well-known/popular styles. Though I googled this kind of
discussion, I couldn't find the discussion itself. It seems no one cares
about a SGF indentation. I was so sad..., and so asked here, the last hope
of the internet. That's the story.

What a beautiful SGF indentation is depends on the reader/writer,
and so there is no correct answer. Thus my question is:
what is your own *favorite* SGF indentation? To collect those styles,
we may be able to find common patterns of the styles. Maybe not.
I just wanted to try.

Bantari wrote:

Here is what I think:
SGF, as a format, is not designed to be human readable, but parser-readable.
If you really want something that a human can load into a text editor and easily read, why not just create some kind of YAML-like format (GAML?) for SGF and a parser that converts GAML to SGF and back. Then you can worry how to make this beautiful.

How will you debug the converter that converts YAML to SGF with no indentation?
It would be painful at least for me.

Bantari wrote:

As a coder myself, I can see where you can take pleasure in your generated code is clean and proper, so I sort-of get what you feel in an abstract sense. But from practical point of view, I really scratch my head and wonder: why would somebody ask such question?

My question is neither practical nor productive. I just asked for fun.
It's important for me to have fun, especially in go-related cases.

DrStraw · **#13**

Bantari wrote:

I honestly see no value in making the indentation "beautiful" - whatever that word means in this context.

I did it for a practical reason. I was writing an SGF merge utility and need to have a clearer view of how the file was organized.

Bantari · **#14**

DrStraw wrote:

Bantari wrote:

I honestly see no value in making the indentation "beautiful" - whatever that word means in this context.

I did it for a practical reason. I was writing an SGF merge utility and need to have a clearer view of how the file was organized.

I can see the value of that in the software design phase. But in general - not really.

Bantari · **#15**

anazawa wrote:

@Bantari

Thanks for your suggestion

SGF is a text-only format and certainly allows us to indent the text.
This means, given a game record, there should exist various styles to indent
the game record, not only one style. If so, I thought there should exist
the well-used/well-known/popular styles.

I don't think any of that follows at all. But that's just me.

anazawa wrote:

Bantari wrote:

Here is what I think:
SGF, as a format, is not designed to be human readable, but parser-readable.
If you really want something that a human can load into a text editor and easily read, why not just create some kind of YAML-like format (GAML?) for SGF and a parser that converts GAML to SGF and back. Then you can worry how to make this beautiful.

How will you debug the converter that converts YAML to SGF with no indentation?
It would be painful at least for me.

As I answered to DrStraw - there might be a need for some more human-readable format during software development/debug phase. I can see that. But even for that purpose it seems clearly immaterial if the indentation is "beautiful" or widely accepted or popular - it just has to make sense to you, as the software developer. It might be that for different kinds of development, different styles of SGF presentation/indentation would be appropriate.

So I can see you asking yourself "what do I think is appropriate in the specific case of the software I develop?" rather than asking everybody "hey, what do you guys like, what is pretty?"

What if everybody reaches a consensus that "pretty" is something absolutely not appropriate for you?

anazawa wrote:

Bantari wrote:

As a coder myself, I can see where you can take pleasure in your generated code is clean and proper, so I sort-of get what you feel in an abstract sense. But from practical point of view, I really scratch my head and wonder: why would somebody ask such question?

My question is neither practical nor productive. I just asked for fun.
It's important for me to have fun, especially in go-related cases.

Now - this is a good reason! I was puzzled.

anazawa · **#16**

[off topic]

SGF representaion of HTML:

Code:

(
  ;FF[4]                  <-- required
   HTML[lang:en]          <-- Compose type
  (
    ;HEAD[]               <-- None
    (
      ;META[charset:utf8] <-- Compose type
    )
    (
      ;TITLE[]            <-- None
       C[Not Found]
    )
    (
      ;SCRIPT[type:text/javascript]
       C[function sum (a, b) {
  return a + b;
}
]
    )
  )
  (
    ;BODY[]
    (
      ;HI[]               <-- H1 is illegal
       C[Not Found]
    )
    (
      ;P[class:text-muted]
       C[The requested URL was not found on this server.]
    )
  )
)

How should I handle "<p>foo<br>bar</p>"?

[/off topic]

anazawa · **#17**

@Bantari

Thanks for your kind words

I was thinking about the use cases of SGF files:

1. A SGF file that is transferred by HTTP and opened by MultiGo or something via web browsers
2. A SGF file that is opened by CUI viewers such as less/more to debug
3. A SGF file that is attached to email and sent to his/her friends/students/teachers
4. A SGF file that is bundled by SGF applications such as SmartGo
5. A SGF file that is generated by SGF editors such as MultiGo, CGoban or SmartGo
...

I may be missing something, though.

It seems a SGF indentation is not required in most cases. Like you said,
even when I'm debugging SGF generators, the indentation has to make sense to only me
if I'm developing the software alone.

Anyway, thinking about a SGF indentation is fun at least for me
because it's not a binary format but text one. We are allowed to think about
how to indent the SGF file. In addition, I don't want to invent the styles
whenever I debug the generator, but rather choose one of them.

anazawa · **#18**

KGS (CGoban) style:

Code:

(;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]
RU[Japanese]SZ[19]KM[6.50]TM[0]OT[80x10 byo-yomi]
PW[foo]PW[bar]WR[6d]BR[6d]DR[2015-02-26]PC[The KGS Go Server at http://www.gogks.com/]RE[B+Resign]
;B[pd]BL[10]OB[80]C[foo [6d\]: hi
bar [6d\]: hi
]
;W[qp]WL[10]OW[80]
...
)

Line 1: Root properties that summarizes the SGF file itself
Line 2: Rules of the game (a part of gameinfo properties)
Line 3: Player name/rank, date, place, game result, etc. (the rest of gameinfo properties?)
Line 4-: The main line of play

- one node per line except for the root node
- no indentation

Beautiful SGF indentation

Who is online