Reacting to a post by DrStraw and to reactions by others:
Yes, there are hundreds of games at
http://homepages.cwi.nl/~aeb/go/games/games/You can download 25000+ via
http://homepages.cwi.nl/~aeb/go/games/games.7zThey all come from the net. As follows.
1. Send out spiders and scrape. Find blogs, gowebsites,
newsgroups, images, pdfs of old go journals, newspapers.
2. Extract the information. Do OCR on the diagrams,
convert files from SJIS or Big5 or so to UTF-8,
convert file formats from ugi, ngf or whatever to sgf.
Keep only the games that look like they might be pro games
(look at names, tournament, player ranks, site).
This yields for example 2500000 game records.
Full of mistakes, because the original data was faulty,
or because of problems with OCR, removal of markup, etc.
Some sources also insert errors intentionally.
3. Remove garbage. Compare the reasonable files and decide
which ones are versions of the same game. Choose the best
version, or create a better version by merging.
This yields for example 150000 different games.
4. Fix the game records. Correct illegal moves. Correct the
spelling of the player names. Compare with tournament tables.
A reasonable game record may turn out to be between different players,
or with player colors reversed, or belong to a different tournament.
Fix dates, and consider that the date might be a broadcast date
or have been a lunar date.
5. Almost all of the above is fully automated, it would not
be possible to handle a million game records by hand.
But in the end human inspection is needed, and that takes time.
So far I have published 25000 game records, generally of good quality.
All Japanese pro games. I think this is the largest such collection
in the public domain.
Of course there is no reason to expect completeness.