Page 1 of 2

Mass downloader??? Help!

Posted: Mon Oct 05, 2015 8:32 pm
by MP4Life
In Fuseki Info for Tygem, there are over 200,000 games in database.

What I want to know is-how did all these games were collected?

I'm in search for a way to mass-download all the recent datas.

I asked baduk.org via e-mail but received no answer.

I also asked Tygem about the database but they said only way they
know of is to download each games manually... which would take scary amount of time and energy.



Also, what pattern-search program do you recommend? Kombilo? Drago? SmartGo? I'm willing to pay big bucks for the best program if I need to. :bow:

Re: Mass downloader??? Help!

Posted: Mon Oct 05, 2015 8:43 pm
by Kirby
Either curl or wget will work. Look them up.

Re: Mass downloader??? Help!

Posted: Mon Oct 05, 2015 9:08 pm
by MP4Life
Kirby wrote:Either curl or wget will work. Look them up.
WOW youre my savior man. Though I didnt look em up yet I'm sure they will work like charm!

Re: Mass downloader??? Help!

Posted: Mon Oct 05, 2015 9:25 pm
by MP4Life
Kirby wrote:Either curl or wget will work. Look them up.
Well.. Im such a terible tech guy this seems scary.

Have you used them before to download Tygem game data? If so, any tips on that matter would be greatly appreciated :salute:

Re: Mass downloader??? Help!

Posted: Mon Oct 05, 2015 9:31 pm
by Kirby
MP4Life wrote:
Kirby wrote:Either curl or wget will work. Look them up.
Well.. Im such a terible tech guy this seems scary.

Have you used them before to download Tygem game data? If so, any tips on that matter would be greatly appreciated :salute:
These programs can be used to download a url, so just make a script to download the page, parse the links, and download each link.

If the site has robots.txt file, you might want to follow it.

Re: Mass downloader??? Help!

Posted: Mon Oct 05, 2015 11:25 pm
by macelee
I hate to see people attempting doing a mass download without consulting the owner of the service. At the moment I even hate to see people discussing it on this forum.

When you do this, things can go wrong. And often things will go wrong.

Last night at approximately 22:48 UK time, go4go.net was apparently brought down by a badly written script. My early analysis shows that the script sent 150 or so requests to my server within minutes to grab some data intensive pages, causing the server to run out of memory. The traffic was from a host in Amazon EC2's network 52.89.xxx.xxx (damn, I have to protect the privacy of whoever did this).

Re: Mass downloader??? Help!

Posted: Tue Oct 06, 2015 12:01 am
by Kirby
macelee wrote: I hate to see people attempting doing a mass download without consulting the owner of the service. At the moment I even hate to see people discussing it on this forum.
1. I didn't say he should make a script without consulting the owner of the service, and 'MP4Life' didn't say that he was doing this either.
2. IMO, contacting the owner of the service is a courtesy more than an obligation.
3. I also suggested following 'robots.txt', even though that's not an obligation.

Since we're all friends here in the go community, yeah, sure. Try to be courteous.

But "I hate to see people" trying to suggest that I've done something morally wrong here by bringing up to 'MP4Life' publicly available information.

Re: Mass downloader??? Help!

Posted: Tue Oct 06, 2015 12:04 am
by macelee
Kirby thanks for your reply and I withdraw my comments. Please understand my frustration - having to waste one hour on such matter the first thing in the morning.

Re: Mass downloader??? Help!

Posted: Tue Oct 06, 2015 1:34 am
by HermanHiddema
Kirby wrote:But "I hate to see people" trying to suggest that I've done something morally wrong here by bringing up to 'MP4Life' publicly available information.
Just because information is publicly available does not automatically make it morally right to point people to it. If someone goes online and asks for information on how to make their own fireworks, I'd expect people to at least include a warning on the dangers of doing so when they're providing information. IMO, it is pretty obvious that MP4Life has very little expertise on the matter, and if he does manage to cobble together a mass downloader he is quite likely to severely impact someone's server and make someone's morning miserable. I think it is appropriate to warn him of the risks and suggesting alternatives, rather than just throwing some information his way and washing your hands of it.

Re: Mass downloader??? Help!

Posted: Tue Oct 06, 2015 2:08 am
by Jujube
Suggesting writing a curl script to a "terrible tech guy" is a bit like asking them to lick their own elbows (maybe it's not impossible, but it's entertaining to watch). How about a code review session? :cool:

Curl is a great tool though. I use it to download .asx files from Wbaduk so I can see the hidden url for lecture videos which I capture with VLC for offline viewing.
I think it is appropriate to warn him of the risks and suggesting alternatives
Like, Googling what a web scraper is?

Re: Mass downloader??? Help!

Posted: Tue Oct 06, 2015 3:34 am
by Uberdude
MP4Life wrote:Also, what pattern-search program do you recommend? Kombilo? Drago? SmartGo? I'm willing to pay big bucks for the best program if I need to. :bow:
fuseki.info seems associated with BiGo (http://bigo.baduk.org/index.html), so why not buy that to get the huge database and pattern searching software*? Or do you want the very latest Tygem games from yesterday they don't have? Or actively want the intellectual/programming challenge of writing a web scraper?

*An answer might be because they pinched it from GoGoD or some other source (not an accusation with any evidence, just hypothesising) and you might not to wish to give your money to people you consider ethically dubious (e.g. the MoyoGo saga).

Re: Mass downloader??? Help!

Posted: Tue Oct 06, 2015 6:35 am
by Mike Novack
Look, I've said my piece on related topics. It doesn't even have to be an ERROR in a script, just an error in conception. So might work OK in testing (making 100 access requests against that database) but bring that service to a halt when dumping in 100,000 (the server might be able to handle 40,000 per day if each takes a couple seconds).

a) You should be experienced before tackling something like this, ideally real world experience.

b) It is not just courtesy. Those whose database it is have a perfect right to consider something bringing down their database an attack. They may have a way for you to do this safely, and may be willing to let you do that (go against their backup loaded copy, or download their flat backup copy and you load on your own hardware).

In the real world I came from, we had full size "test" versions of our production databases, and it's the test version that programmers used when developing software, so if something went wrong, it didn't affect production.

Understand? When making a database available to public access they mean to people sitting at a terminal entering keystrokes on a keyboard. hey can estimate how many people and how fast a person can type, and size the system to be adequate for that volume of activity. But a program could be firing off requests MUCH faster than that.

Re: Mass downloader??? Help!

Posted: Tue Oct 06, 2015 6:55 am
by Kirby
HermanHiddema wrote:I think it is appropriate to warn him of the risks and suggesting alternatives, rather than just throwing some information his way and washing your hands of it.
Then feel free to do so. I don't personally feel any such moral obligation.

Re: Mass downloader??? Help!

Posted: Tue Oct 06, 2015 6:59 am
by Kirby
Mike Novack wrote:
b) It is not just courtesy. Those whose database it is have a perfect right to consider something bringing down their database an attack.
Then the db owners should take those precautions. Nobody has "attacked" anyone here, and knowing about tools like wget and curl is quite useful.

Re: Mass downloader??? Help!

Posted: Tue Oct 06, 2015 7:26 am
by HermanHiddema
Kirby wrote:...we're all friends here in the go community...
Kirby wrote:I don't personally feel any such moral obligation.
Moral obligation is a strong term, but IMO some effort to prevent an inexperienced developer from accidentally causing grief to others in the go community would certainly be friendly. As macelee's example shows, mass downloading causes real frustration and real work for fellow players.