It is currently Thu Apr 25, 2024 9:24 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 12 posts ] 
Author Message
Offline
 Post subject: An SGF collection
Post #1 Posted: Fri Oct 09, 2015 1:27 pm 
Dies with sente

Posts: 99
Liked others: 5
Was liked: 33
Reacting to a post by DrStraw and to reactions by others:
Yes, there are hundreds of games at
http://homepages.cwi.nl/~aeb/go/games/games/

You can download 25000+ via
http://homepages.cwi.nl/~aeb/go/games/games.7z

They all come from the net. As follows.
1. Send out spiders and scrape. Find blogs, gowebsites,
newsgroups, images, pdfs of old go journals, newspapers.
2. Extract the information. Do OCR on the diagrams,
convert files from SJIS or Big5 or so to UTF-8,
convert file formats from ugi, ngf or whatever to sgf.
Keep only the games that look like they might be pro games
(look at names, tournament, player ranks, site).
This yields for example 2500000 game records.
Full of mistakes, because the original data was faulty,
or because of problems with OCR, removal of markup, etc.
Some sources also insert errors intentionally.
3. Remove garbage. Compare the reasonable files and decide
which ones are versions of the same game. Choose the best
version, or create a better version by merging.
This yields for example 150000 different games.
4. Fix the game records. Correct illegal moves. Correct the
spelling of the player names. Compare with tournament tables.
A reasonable game record may turn out to be between different players,
or with player colors reversed, or belong to a different tournament.
Fix dates, and consider that the date might be a broadcast date
or have been a lunar date.
5. Almost all of the above is fully automated, it would not
be possible to handle a million game records by hand.
But in the end human inspection is needed, and that takes time.
So far I have published 25000 game records, generally of good quality.
All Japanese pro games. I think this is the largest such collection
in the public domain.

Of course there is no reason to expect completeness.

Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #2 Posted: Fri Oct 09, 2015 3:10 pm 
Oza

Posts: 3656
Liked others: 20
Was liked: 4631
Quote:
They all come from the net.


You may have got them from the net, but that's not where they come from in the first place. You are most definitely putting out some GoGoD files, judging by the Shusai list referenced in the other thread - and encouraging others to do the same.

If you are doing all this checking, you presumably have at least some inkling where the files originate?

And you have been in touch with me, yet never thought to ask? Nothing to do with legality, just good manners.

To use your analogy, you invite me to dinner at your house - and feed me food you took from my fridge.

What's for dessert?


This post by John Fairbairn was liked by: DrStraw
Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #3 Posted: Fri Oct 09, 2015 4:01 pm 
Oza

Posts: 2180
Location: ʍoquıɐɹ ǝɥʇ ɹǝʌo 'ǝɹǝɥʍǝɯos
Liked others: 237
Was liked: 662
Rank: AGA 5d
GD Posts: 4312
Online playing schedule: Every tenth February 29th from 20:00-20:01 (if time permits)
Please do not mention my name in this conversation. I had no idea my original reference referred to material pirated from GoGod. I do not condone this at all. It was merely an innocent query prompted by what I assumed to be a legitimate source. I have purchased GoGod in the past and believe that they should be supported without being illegally copied elsewhere.

_________________
Still officially AGA 5d but I play so irregularly these days that I am probably only 3d or 4d over the board (but hopefully still 5d in terms of knowledge, theory and the ability to contribute).


Last edited by DrStraw on Sat Oct 10, 2015 10:14 am, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #4 Posted: Fri Oct 09, 2015 5:03 pm 
Dies with sente

Posts: 99
Liked others: 5
Was liked: 33
John Fairbairn wrote:
If you are doing all this checking, you presumably have at least some inkling where the files originate?

To JF:
See, I knew that you would answer and say this.
I have often thought of buying your collection,
but never did, to avoid the possible complaint that I copied from it.
So, no, I do not possess GoGoD, and have never seen it.

Where do these files originate? They have hundreds of authors,
hundreds of sources. Most game records will be results of comparing
and merging information from several sources, most of them Japanese
or Chinese or Korean.

I asked Jan van Rongen for permission to distribute his Friday Night Files
and got it. His files form a large part of the present Cho Chikun collection.
In case you give me your files, I will ask you what I can do with them.
As it is, I do not have your files, and the question does not arise.

That also makes it difficult to give credit. However, I conjecture
that GoGoD uses dates as filenames. And that you introduced the OH label,
and use roman numerals to represent lunar dates.
So these can be taken as signs that it may be that GoGoD was involved.
I fixed many errors in such files. Had I had a copy of GoGoD,
I would have mailed you with suggestions for corrections.
As it is, I only wrote about the Go Seigen collection
"A very large part of this was probably derived from GoGoD".


To DrStraw:
I am not quite sure what you mean. I would find it unreasonable if you called
my activity piracy. But maybe you meant that my sources pirated these games?
That is of course quite possible, even likely, but today these games can be found
on many websites, in many different languages and character sets and file formats,
and with minor differences. They are certainly not all straight copies from a single source.
Here is a European source: http://www.romaniango.org/partide/

Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #5 Posted: Fri Oct 09, 2015 11:58 pm 
Oza

Posts: 3656
Liked others: 20
Was liked: 4631
Quote:
That also makes it difficult to give credit. However, I conjecture
that GoGoD uses dates as filenames. And that you introduced the OH label,
and use roman numerals to represent lunar dates.
So these can be taken as signs that it may be that GoGoD was involved.
I fixed many errors in such files.


So you are admitting such files have passed through your hands and that you were aware of their likely provenance. Thank you, though as your suspicions were aroused would it not have been appropriate to check with us? There are several other ways of identifying GoGoD files, incidentally, and not just on your site.

Many of Jan van Rongen's files came from GoGoD BTW (and he gave us many in return). We also did such swaps with MasterGo and others. There used to be (I hope I can say) an upstanding community of database workers. O tempora, o mores!

Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #6 Posted: Sat Oct 10, 2015 8:04 am 
Dies with sente

Posts: 99
Liked others: 5
Was liked: 33
John Fairbairn wrote:
O tempora, o mores!

Ha, you mutter, but I think you are mistaken.

"Gave us many in return" - that suggests that I did not give the
go community anything. But I provide thousands of game records. For free.
Maybe my collection contains 5000 games not in go4go today.
(And there is other go-related stuff as well.)

"Check with us?" In fact we exchanged some email, and I offered help,
I have powerful software. Listed a few handfuls of corrections
to logan's games, which possibly are derived from GoGoD.
You had no time for such interaction. Personal reasons.
But I am completely happy to open our email conversation again.
Also to discuss ownership of game records, and copyright and the like.
In fact, maybe we already had some of that discussion.

"Files have passed through your hands" - That could only be relevant if I published such files,
which I did not. What is published is a merged version from many sources.
Some sources know the date, some the place, some have more moves.
Sometimes the merging process fixes errors that occur in all sources,
so that the file I publish is the only known good copy of the game record.

Games by famous players like Go Seigen may have fifty sources with minor differences.
It is not really possible or meaningful to start investigating
which of these sources got their material directly or indirectly via GoGoD.
Discarding some sources will have little impact on the final result.
But I am quite happy to remove some OH labels and/or comments,
if that is what you would like.

Funny enough, I recognize my own collection in many sources.
That is good and bad. Good that others use this work.
Bad that I can no longer use other sites as confirmation of my own guesses.

Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #7 Posted: Sat Oct 10, 2015 10:04 am 
Oza

Posts: 3656
Liked others: 20
Was liked: 4631
Quote:
"Files have passed through your hands" - That could only be relevant if I published such files,
which I did not.


You did. They are on your site. Some have [OH] in. Others are identifiable in other ways. In fact, on the basis of the small sample I took, I suspect all your Shusai games came from GoGoD's (especially Mark's) work. Minor differences in various sources do not necessarily mean other people have done any significant work on the files. It is common for people to start with GoGoD and omit comments but retain EV[] etc, or to reformat dates, all with a view to disguising the source. And of course GoGoD itself makes changes all the time, either corrections or extra data, and so re-issues sources.

For the record, you did offer help, but only on the basis of getting GoGoD games to publish free in return. We politely agreed to disagree on that, but you never thought to mention that you already suspected you had GoGoD games. I find that odd.

The GoDoD operation has changed now, because of Mark's death and because my own circumstances have changed to leave me little time. But when it was running as originally intended, we cleared the operation with the relevant pro bodies, we gave acknowledgement for all help, and we gave all those who helped a free subscription. The number of free subscriptions almost always exceeded paid subscriptions, I believe. But even those who paid were being subsidised in effect, because we spent so much on sources. For example, I have an almost complete collection of Japanese, Korean and Chinese year books, and many sets of collected games. Mark had a similar collection, but unlike me kept his Amex invoices to prove what tbey cost - £600 for a collection of Kaizen games, for example, which gave us about 100 new records. Mark always regretted not also keeping a precise record of how many games he had done, but we made an estimate once towards the end, and it was perhaps as many as 30,000 games. With marking up as well as transcription, that accounts for 30,000 hours. I must have spent a similar amount of time (only possible because we both took early retirement at the same time). That's why Mark and I always got shirty about people trying to steal the moral high ground by offering our stuff for free. It's like politicians who always love to claim credit for spending other people's money.

There are those who think GoGoD is a gold mine. Nowhere near it. Go players don't like spending money on go. The demise of Go World and of go booksellers tells you that. Of course there are exceptions. For example, my latest book on Senchi has hit the bestseller charts and has already sold 9 copies. Two years on, Life of Shuei has even hit the 100 mark. The new Merc is on order.


This post by John Fairbairn was liked by 2 people: Bill Spight, wineandgolover
Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #8 Posted: Sat Oct 10, 2015 10:38 am 
Lives in sente
User avatar

Posts: 866
Liked others: 318
Was liked: 345
John Fairbairn wrote:
There are those who think GoGoD is a gold mine. Nowhere near it. Go players don't like spending money on go. The demise of Go World and of go booksellers tells you that. Of course there are exceptions. For example, my latest book on Senchi has hit the bestseller charts and has already sold 9 copies. Two years on, Life of Shuei has even hit the 100 mark. The new Merc is on order.


John,

Thank you very much for the time and effort you and Mark have put into this over the years.

But, nine copies, and 100 copies? Damn. You are redefining "go saint."

Be well!

_________________
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders

Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #9 Posted: Sat Oct 10, 2015 11:32 am 
Dies with sente

Posts: 99
Liked others: 5
Was liked: 33
John Fairbairn wrote:
For the record, you did offer help, but only on the basis of getting GoGoD games to publish free in return.

Please stay honest. We can have a polite conversation without inventing untruths.
I offered help without any conditions. And when you voiced your suspicion "I infer that you may want games from us"
I replied "that is not what I had in mind".

John Fairbairn wrote:
...because we spent so much on sources.

You are telling us that you did (and are doing) an impressive job. And you did. Very impressive.
Now read the first post of this thread. Also that was an impressive job. Of an entirely different kind.
It resulted in almost 150000 pro games that I may or may not publish over time after suitably polishing them.
Since I understand that you are going to publish 100000 pro games, it will not be surprising if
between half and two-thirds of my collection is also in your collection.

John Fairbairn wrote:
I suspect all your Shusai games came from GoGoD's (especially Mark's) work.

That is possible. The sources were mostly in Chinese. It is long ago I did this Shusai part. Still wrote RE[0] instead of RE[Jigo]. If you want to see details I can show what corrections were made to these games. I recall my satisfaction when for one of these games my programs automatically decided between two versions on the grounds that the wrong move would leave a group with only one eye.

On the other hand I object against "came from" suggesting that there is a single source. In the Shusai case there were many sources. None of these covers my entire set of games, although the romaniango site I pointed at almost covers it.

Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #10 Posted: Sat Oct 10, 2015 11:54 am 
Oza

Posts: 2180
Location: ʍoquıɐɹ ǝɥʇ ɹǝʌo 'ǝɹǝɥʍǝɯos
Liked others: 237
Was liked: 662
Rank: AGA 5d
GD Posts: 4312
Online playing schedule: Every tenth February 29th from 20:00-20:01 (if time permits)
Would it be better is this private conversation were taken to PM?

_________________
Still officially AGA 5d but I play so irregularly these days that I am probably only 3d or 4d over the board (but hopefully still 5d in terms of knowledge, theory and the ability to contribute).

Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #11 Posted: Sat Oct 10, 2015 3:04 pm 
Oza

Posts: 3656
Liked others: 20
Was liked: 4631
[quote]Would it be better is this private conversation were taken to PM?[/quote

It was raised in public, and what's the point of a public forum if things can't be discussed?

But I think I've made by point that sgf games do not appear on the web out of thin air - somebody has to do the work. Not necessarily GoGoD - some of the Romanian site games look to me like they may be of MasterGo origin, and we know Yutopian, Bob Terry and Jan van der Steen were victims of cuckoos in the nest (including Chinese cuckoos), as I believe is well known. Internet scrapers and trawlers should allow for that.

And I've already spent far more time on it than I should have. So I will now desist - dispirited rather than triumphalist - and leave the last word to whoever wants it.

Top
 Profile  
 
Offline
 Post subject: Re: An SGF collection
Post #12 Posted: Sat Oct 10, 2015 4:52 pm 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
Interesting discussion - feels like John and aeb have some history.

John Fairbairn wrote:

But I think I've made by point that sgf games do not appear on the web out of thin air - somebody has to do the work.


Regarding "the work", I feel that it doesn't come from one person, but perhaps becomes easier over time.

The ones that have done the most work for an SGF, in my view, are the pros that created the game. In that sense, anyone that distributes SGF for benefit already profits from the pros that made the game.

Perhaps after that come people that have spent hours compiling these collections. I'm sure that takes a lot of work, too. So I can understand the feeling that others have made profit from your work.

As technology advances, it takes less work to make these compilations, so it is a lot easier to profit starting with what others have done as a basis. Probably this is inevitable.

I don't really know what point I'm trying to make here... Maybe just that the collections we have today are produced from many people, starting from the pros. So it's quite tricky to pinpoint "ownership".

_________________
be immersed

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group