Life In 19x19
http://www.lifein19x19.com/

Contribute to Katago training using google colab
http://www.lifein19x19.com/viewtopic.php?f=18&t=18076
Page 1 of 2

Author:  seventeen [ Fri Feb 26, 2021 3:24 pm ]
Post subject:  Contribute to Katago training using google colab

I've made google colab notebook image which can contribute to katago training.
You can join the contribution without GPU now.
Just check the link below.
https://colab.research.google.com/drive ... sp=sharing

Attachments:
20210224_colab.png
20210224_colab.png [ 274.27 KiB | Viewed 10244 times ]

Author:  wineandgolover [ Fri Feb 26, 2021 5:36 pm ]
Post subject:  Re: Contribute to Katago training using google colab

This looks cool. I’d love to know more. To start...

What gpu's does this use?

Is it free? For how long?

Is there some sort of limitation that might affect other google services?

Does it run in the background?

Thanks.

Author:  go4thewin [ Fri Feb 26, 2021 6:21 pm ]
Post subject:  Re: Contribute to Katago training using google colab

edit: Is it possible to make a script like this to train the 15b net on the new s663 40b data? thanks!

Author:  wineandgolover [ Fri Feb 26, 2021 8:40 pm ]
Post subject:  Re: Contribute to Katago training using google colab

go4thewin wrote:
Really simple and fun to use. it uses a T4, it is free with usage limits. it turns off after 12 hours or less if you paste the following in the google chrome console (f12)
Code:
function ClickConnect(){
  console.log("Connnect Clicked - Start");
  document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click();
  console.log("Connnect Clicked - End");
};
setInterval(ClickConnect, 60000)


in 12 hours, you get more than 250 training games, 13000 rows, and a few ratings games, which is really nice. it will turn off in 90 minutes or less without the code above. if you close the browser, it will turn off. If you use it 12 hours everyday, you might get kicked off for a couple months, not sure. Every other day might be ok. It will not effect other google services. Thanks seventeen!


If one is non-technical, and has never messed with the chrome console, and doesn't want to screw things up, where exactly in the chrome console should one paste this? Top? Bottom, embedded somewhere, doesn't matter?

Also, I assume you mean. "it turns off after 12 hours or less UNLESS you paste..."?

Finally, I agree that it is really simple. Even I got it running easily. If you want to help make the strongest open-sourced go engine even better, please run this in the background when you use your computer. Highly recommended!!!!!

Author:  go4thewin [ Sat Feb 27, 2021 3:03 am ]
Post subject:  Re: Contribute to Katago training using google colab

Yes, sorry about that. In the picture below, paste the following code where the bottom most > sign is. Ill delete the previous redundant post.

Code:
function ClickConnect(){
    console.log("Clicked on connect button");
    document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect,60000)

Author:  ez4u [ Sat Feb 27, 2021 3:42 am ]
Post subject:  Re: Contribute to Katago training using google colab

I am trying this out also. It is indeed simple to do. :tmbup:
I am running it in Firefox and for whatever reason, it does not shut down by itself after 90 minutes. Just now I came back after about 3 hours and it's still running. Great job! Thanks

Author:  wineandgolover [ Sat Feb 27, 2021 11:57 am ]
Post subject:  Re: Contribute to Katago training using google colab

wineandgolover wrote:
This looks cool. I’d love to know more. To start...

1. What gpu's does this use?

2. Is it free? For how long?

3. Is there some sort of limitation that might affect other google services?

4. Does it run in the background?

Thanks.


To answer my own questions.

1. I’ve connected to a Tesla T4 each time. Getting around 390nn evals/ second

2. It’s completely free. It uses Google Colab, a free machine learning tool. It runs in the browser, so it’s platform independent.

3. It does not affect other google services. There is a limitation within Colab in that it will stop working for heavy users. Looking on discord, it seems running it for twelve hours every other day avoids any problems.

4. Yes it runs in the background. Google provides the CPU's and GPU's. All you need is a google drive account and a browser.

If you’d like to run it more stably, for longer, and probably get assigned a better GPU, you can consider Colab Pro, which costs $10 per month with a US or Canadian address. That seems pretty reasonable versus buying and powering your own Tesla V100. I might try Pro soon to check out it’s performance. I’d love to hear if anybody else has already done so.

Again, I encourage anyone who wishes to help make katago stronger to consider running this completely free utility in the background.

Author:  deungsan [ Sun Feb 28, 2021 3:48 pm ]
Post subject:  Re: Contribute to Katago training using google colab

I followed your instruction and got an error as following...

Starting KataGo training...
2021-02-28 22:39:59+0000: Distributed Self Play Engine starting...
2021-02-28 22:39:59+0000: Attempting to connect to server
2021-02-28 22:39:59+0000: isSSL: true
2021-02-28 22:39:59+0000: host: katagotraining.org
2021-02-28 22:39:59+0000: port: 443
2021-02-28 22:39:59+0000: baseResourcePath: /
2021-02-28 22:39:59+0000: KataGo v1.8.0
2021-02-28 22:39:59+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 22:39:59+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 22:39:59+0000: nnRandSeed0 = 10486611865130445872
2021-02-28 22:39:59+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
terminate called after throwing an instance of 'StringError'
what(): OpenCL error at /home/dwugcloud/data/kata/cpp/neuralnet/openclhelpers.cpp, func err, line 263, error CL_PLATFORM_NOT_FOUND_KHR

Author:  ez4u [ Sun Feb 28, 2021 4:42 pm ]
Post subject:  Re: Contribute to Katago training using google colab

When the process is running on colab, I am seeing these security warnings constantly in the console
Code:
Content Security Policy: Ignoring “'report-sample'” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “https:” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “http:” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “'unsafe-inline'” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “https://www.google.com/js/bg/” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “https://www.google.com/recaptcha/” within script-src: ‘strict-dynamic’ specified

Is this something that should be fixed or can we just ignore it?

Meanwhile I am currently increasing the "maxSimultaneousGames" at the bottom of the script. Going from 8 (default) to 12 jumped "nn evals" from around 380/second to around 470/second.

Author:  ez4u [ Sun Feb 28, 2021 4:53 pm ]
Post subject:  Re: Contribute to Katago training using google colab

deungsan wrote:
I followed your instruction and got an error as following...

Starting KataGo training...
2021-02-28 22:39:59+0000: Distributed Self Play Engine starting...
2021-02-28 22:39:59+0000: Attempting to connect to server
2021-02-28 22:39:59+0000: isSSL: true
2021-02-28 22:39:59+0000: host: katagotraining.org
2021-02-28 22:39:59+0000: port: 443
2021-02-28 22:39:59+0000: baseResourcePath: /
2021-02-28 22:39:59+0000: KataGo v1.8.0
2021-02-28 22:39:59+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 22:39:59+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 22:39:59+0000: nnRandSeed0 = 10486611865130445872
2021-02-28 22:39:59+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
terminate called after throwing an instance of 'StringError'
what(): OpenCL error at /home/dwugcloud/data/kata/cpp/neuralnet/openclhelpers.cpp, func err, line 263, error CL_PLATFORM_NOT_FOUND_KHR


My startup looks like this...
Code:
Starting KataGo training...
2021-02-28 21:51:52+0000: Distributed Self Play Engine starting...
2021-02-28 21:51:52+0000: Attempting to connect to server
2021-02-28 21:51:52+0000: isSSL: true
2021-02-28 21:51:52+0000: host: katagotraining.org
2021-02-28 21:51:52+0000: port: 443
2021-02-28 21:51:52+0000: baseResourcePath: /
2021-02-28 21:51:52+0000: KataGo v1.8.0
2021-02-28 21:51:52+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 21:51:52+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 21:51:52+0000: nnRandSeed0 = 1331183443207076973
2021-02-28 21:51:52+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
2021-02-28 21:51:52+0000: Cuda backend thread 0: Found GPU Tesla T4 memory 15843721216 compute capability major 7 minor 5
2021-02-28 21:51:52+0000: Cuda backend thread 0: Model version 9 useFP16 = true useNHWC = true
2021-02-28 21:51:52+0000: Cuda backend thread 0: Model name: rect15-b2c16-s13679744-d94886722
2021-02-28 21:51:54+0000: Tiny net sanity check complete

As far as I understand what we are doing here (questionable right there! :blackeye: ), should not be using OpenCL for anything. You should be using CUDA instead.
At the very beginning of the output from your run do you see...
Code:
Using Katago Backend :  CUDA
GPU :  TeslaT4
/content
Cloning into 'katago-colab'...

This is what I get every time.

Author:  deungsan [ Sun Feb 28, 2021 6:16 pm ]
Post subject:  Re: Contribute to Katago training using google colab

My errors are fixed by changing notebook setting. Setting hardware accelerator to "GPU' lets colab use TeslaT4.

Now it works fine.

Author:  ez4u [ Sun Feb 28, 2021 6:23 pm ]
Post subject:  Re: Contribute to Katago training using google colab

ez4u wrote:
...

Meanwhile I am currently increasing the "maxSimultaneousGames" at the bottom of the script. Going from 8 (default) to 12 jumped "nn evals" from around 380/second to around 470/second.

The story so far...
Code:
"maxSimultaneousGames" 08  "nn evals"  ~380/sec
"maxSimultaneousGames" 12  "nn evals"  ~470/sec
"maxSimultaneousGames" 16  "nn evals"  ~525/sec
"maxSimultaneousGames" 24  "nn evals"  ~545/sec
"maxSimultaneousGames" 32  "nn evals"  ~555/sec

Author:  lightvector [ Mon Mar 01, 2021 5:45 am ]
Post subject:  Re: Contribute to Katago training using google colab

Yeah, GPUs really like it when you have large batches to run in parallel, and more games helps with that.

The one thing I would caution is - please don't make the number of simultaneous games too large compared to the number of games you are playing in a given run before you shut it down or it shuts itself down - ideally make sure the total number of games you're getting per session would be at least 10x or 20x the number of simultaneous games.

The reason is if the total is too small, such that the games are coming in relatively few "waves" before it gets killed, it will create a bias towards short games in the data - because in the last wave, disproportionately short games will be the ones that finish and get uploaded and not the longer ones. Games that were on small boards, or that had fewer fights and were more peaceful, or that initialized starting from later positions, etc. will be favored over the configured and desired distribution.

Author:  wineandgolover [ Tue Mar 02, 2021 2:08 pm ]
Post subject:  Re: Contribute to Katago training using google colab

ez4u wrote:
As far as I understand what we are doing here (questionable right there! :blackeye: ), should not be using OpenCL for anything. You should be using CUDA instead.
At the very beginning of the output from your run do you see...
Code:
Using Katago Backend :  CUDA
GPU :  TeslaT4
/content
Cloning into 'katago-colab'...

This is what I get every time.


Yeah, I also saw that every time until last night.

I ran the Colab script very late last night, and it assigned me to an A100, so I was excited. (Note, this information is wrong, and corrected in subsequent posts) But it didn't use CUDA, instead opting for OpenCL. I waited a half hour and it hadn't finished any games, so I quit and went to bed. I can't guarantee that my interpretation of what happened is perfect, but I think I'm right. Maybe the cat hit a kill switch. I should have taken a screenshot, sorry.

I assume the A100 should support CUDA, right?

Is there a known reason why OpenCL should fail on Colab?

Should I change the very beginning of the script currently set to KATAGO_BACKEND="AUTO"
to read KATAGO_BACKEND="CUDA"?

Anyway, today I'm back on a good old T4 and it chose CUDA again. Following recommendations, I increased the number of games to 16, and it's chugging along nicely. (530'ish nn evals /sec).

Thanks!

Author:  ez4u [ Tue Mar 02, 2021 8:02 pm ]
Post subject:  Re: Contribute to Katago training using google colab

At the beginning of the script, probably we can take this
Code:
  if gpu_name == "TeslaT4":
    KATAGO_BACKEND="CUDA"
  else:
    KATAGO_BACKEND="OPENCL"

and make it this?
Code:
  if gpu_name == "TeslaT4":
    KATAGO_BACKEND="CUDA"
  elif gpu_name == "A100":
    KATAGO_BACKEND="CUDA"
  else:
    KATAGO_BACKEND="OPENCL"

Author:  wineandgolover [ Tue Mar 02, 2021 9:22 pm ]
Post subject:  Re: Contribute to Katago training using google colab

After a few searches, I now suspect I was connected to a P100, not A100. A huge difference of course.

I'd still guess that P100 supports CUDA though.

Author:  wineandgolover [ Thu Mar 04, 2021 12:39 pm ]
Post subject:  Re: Contribute to Katago training using google colab

wineandgolover wrote:
After a few searches, I now suspect I was connected to a P100, not A100. A huge difference of course.

I'd still guess that P100 supports CUDA though.

Again to answer my own question.

I was so excited to contribute to KataGo that I used the script a lot and Google put the breaks on my use of Colab. (Don't run it all the time, if you want to use it for free, folks)

I've upgraded to Colab Pro, because I'm still excited to help. And paying $10/month for the duration of the project is a hell of a lot cheaper than buying and running a powerful GPU.

Since upgrading, I've 100% been assigned P100 GPU, which I believe is supposed to be better than the T4. Perhaps that is so, but not for this job, with this Colab script.

After several tests on the P100, in which I forced it to use CUDA, it worked, but was significantly slower than OpenCL. So the script is right to use OpenCL.

Unfortunately, the P100 running OpenCL is also slower than the T4 GPU running CUDA. I am currently getting around 340 nnevals/second with 8 simultaneous games, and 385 nnevals/second with 16. (versus 380 and 525 nnevals/sec for the T4 running CUDA)

Unfortunately, with Colab, I believe you have to take the GPU you are given. And because I am a good paying customer, I consistently get the "superior" P100.

The good news is that I seem to be able to run scripts in two browser windows, and it disconnects far less often. And so far, Google hasn't told me to back off.

I still strongly recommend every go player try it. It's kind of like the SETI project for go players, except that Google is running the GPU's, so all you need is a simple browser tab. It's also easy, I'm running the script from the first post, just with my user name and password, no other changes. And it's free for those of you less obsessive than me.

Author:  go4thewin [ Mon Mar 08, 2021 7:17 am ]
Post subject:  Re: Contribute to Katago training using google colab

If you don't want to train katago, but just play rating games, would this work?
https://github.com/portkata/KataGo/blob ... ames.ipynb
probably have to replace cuda with auto.

Author:  seventeen [ Mon Mar 15, 2021 12:50 am ]
Post subject:  Re: Contribute to Katago training using google colab

Notebook image's been updated to use Katago v1.8.1 today.

Author:  seventeen [ Mon Apr 19, 2021 1:28 am ]
Post subject:  Re: Contribute to Katago training using google colab

KataGo v1.8.2 engines have released today.

So I've updated colab image to use new KataGo version.

You may need to copy colab image again so far.

Thanks.

Page 1 of 2 All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/