Contribute to Katago training using google colab
-
seventeen
- Dies in gote
- Posts: 25
- Joined: Mon Sep 16, 2019 7:29 pm
- Rank: 18 kyu
- GD Posts: 0
- Been thanked: 12 times
Contribute to Katago training using google colab
I've made google colab notebook image which can contribute to katago training.
You can join the contribution without GPU now.
Just check the link below.
https://colab.research.google.com/drive ... sp=sharing
You can join the contribution without GPU now.
Just check the link below.
https://colab.research.google.com/drive ... sp=sharing
- Attachments
-
- 20210224_colab.png (274.27 KiB) Viewed 15905 times
- wineandgolover
- Lives in sente
- Posts: 866
- Joined: Sun Jul 25, 2010 6:05 am
- GD Posts: 0
- Has thanked: 318 times
- Been thanked: 345 times
Re: Contribute to Katago training using google colab
This looks cool. I’d love to know more. To start...
What gpu's does this use?
Is it free? For how long?
Is there some sort of limitation that might affect other google services?
Does it run in the background?
Thanks.
What gpu's does this use?
Is it free? For how long?
Is there some sort of limitation that might affect other google services?
Does it run in the background?
Thanks.
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
-
go4thewin
- Lives with ko
- Posts: 150
- Joined: Thu Jan 23, 2020 6:09 am
- Rank: 25 kyu
- GD Posts: 0
- Has thanked: 200 times
- Been thanked: 30 times
Re: Contribute to Katago training using google colab
edit: Is it possible to make a script like this to train the 15b net on the new s663 40b data? thanks!
Last edited by go4thewin on Mon Mar 01, 2021 7:26 am, edited 2 times in total.
- wineandgolover
- Lives in sente
- Posts: 866
- Joined: Sun Jul 25, 2010 6:05 am
- GD Posts: 0
- Has thanked: 318 times
- Been thanked: 345 times
Re: Contribute to Katago training using google colab
If one is non-technical, and has never messed with the chrome console, and doesn't want to screw things up, where exactly in the chrome console should one paste this? Top? Bottom, embedded somewhere, doesn't matter?go4thewin wrote:Really simple and fun to use. it uses a T4, it is free with usage limits. it turns off after 12 hours or less if you paste the following in the google chrome console (f12)in 12 hours, you get more than 250 training games, 13000 rows, and a few ratings games, which is really nice. it will turn off in 90 minutes or less without the code above. if you close the browser, it will turn off. If you use it 12 hours everyday, you might get kicked off for a couple months, not sure. Every other day might be ok. It will not effect other google services. Thanks seventeen!Code: Select all
function ClickConnect(){ console.log("Connnect Clicked - Start"); document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click(); console.log("Connnect Clicked - End"); }; setInterval(ClickConnect, 60000)
Also, I assume you mean. "it turns off after 12 hours or less UNLESS you paste..."?
Finally, I agree that it is really simple. Even I got it running easily. If you want to help make the strongest open-sourced go engine even better, please run this in the background when you use your computer. Highly recommended!!!!!
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
-
go4thewin
- Lives with ko
- Posts: 150
- Joined: Thu Jan 23, 2020 6:09 am
- Rank: 25 kyu
- GD Posts: 0
- Has thanked: 200 times
- Been thanked: 30 times
Re: Contribute to Katago training using google colab
Yes, sorry about that. In the picture below, paste the following code where the bottom most > sign is. Ill delete the previous redundant post.
Code: Select all
function ClickConnect(){
console.log("Clicked on connect button");
document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect,60000) - ez4u
- Oza
- Posts: 2414
- Joined: Wed Feb 23, 2011 10:15 pm
- Rank: Jp 6 dan
- GD Posts: 0
- KGS: ez4u
- Location: Tokyo, Japan
- Has thanked: 2351 times
- Been thanked: 1332 times
Re: Contribute to Katago training using google colab
I am trying this out also. It is indeed simple to do.
I am running it in Firefox and for whatever reason, it does not shut down by itself after 90 minutes. Just now I came back after about 3 hours and it's still running. Great job! Thanks
I am running it in Firefox and for whatever reason, it does not shut down by itself after 90 minutes. Just now I came back after about 3 hours and it's still running. Great job! Thanks
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
- wineandgolover
- Lives in sente
- Posts: 866
- Joined: Sun Jul 25, 2010 6:05 am
- GD Posts: 0
- Has thanked: 318 times
- Been thanked: 345 times
Re: Contribute to Katago training using google colab
To answer my own questions.wineandgolover wrote:This looks cool. I’d love to know more. To start...
1. What gpu's does this use?
2. Is it free? For how long?
3. Is there some sort of limitation that might affect other google services?
4. Does it run in the background?
Thanks.
1. I’ve connected to a Tesla T4 each time. Getting around 390nn evals/ second
2. It’s completely free. It uses Google Colab, a free machine learning tool. It runs in the browser, so it’s platform independent.
3. It does not affect other google services. There is a limitation within Colab in that it will stop working for heavy users. Looking on discord, it seems running it for twelve hours every other day avoids any problems.
4. Yes it runs in the background. Google provides the CPU's and GPU's. All you need is a google drive account and a browser.
If you’d like to run it more stably, for longer, and probably get assigned a better GPU, you can consider Colab Pro, which costs $10 per month with a US or Canadian address. That seems pretty reasonable versus buying and powering your own Tesla V100. I might try Pro soon to check out it’s performance. I’d love to hear if anybody else has already done so.
Again, I encourage anyone who wishes to help make katago stronger to consider running this completely free utility in the background.
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
Re: Contribute to Katago training using google colab
I followed your instruction and got an error as following...
Starting KataGo training...
2021-02-28 22:39:59+0000: Distributed Self Play Engine starting...
2021-02-28 22:39:59+0000: Attempting to connect to server
2021-02-28 22:39:59+0000: isSSL: true
2021-02-28 22:39:59+0000: host: katagotraining.org
2021-02-28 22:39:59+0000: port: 443
2021-02-28 22:39:59+0000: baseResourcePath: /
2021-02-28 22:39:59+0000: KataGo v1.8.0
2021-02-28 22:39:59+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 22:39:59+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 22:39:59+0000: nnRandSeed0 = 10486611865130445872
2021-02-28 22:39:59+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
terminate called after throwing an instance of 'StringError'
what(): OpenCL error at /home/dwugcloud/data/kata/cpp/neuralnet/openclhelpers.cpp, func err, line 263, error CL_PLATFORM_NOT_FOUND_KHR
Starting KataGo training...
2021-02-28 22:39:59+0000: Distributed Self Play Engine starting...
2021-02-28 22:39:59+0000: Attempting to connect to server
2021-02-28 22:39:59+0000: isSSL: true
2021-02-28 22:39:59+0000: host: katagotraining.org
2021-02-28 22:39:59+0000: port: 443
2021-02-28 22:39:59+0000: baseResourcePath: /
2021-02-28 22:39:59+0000: KataGo v1.8.0
2021-02-28 22:39:59+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 22:39:59+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 22:39:59+0000: nnRandSeed0 = 10486611865130445872
2021-02-28 22:39:59+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
terminate called after throwing an instance of 'StringError'
what(): OpenCL error at /home/dwugcloud/data/kata/cpp/neuralnet/openclhelpers.cpp, func err, line 263, error CL_PLATFORM_NOT_FOUND_KHR
- ez4u
- Oza
- Posts: 2414
- Joined: Wed Feb 23, 2011 10:15 pm
- Rank: Jp 6 dan
- GD Posts: 0
- KGS: ez4u
- Location: Tokyo, Japan
- Has thanked: 2351 times
- Been thanked: 1332 times
Re: Contribute to Katago training using google colab
When the process is running on colab, I am seeing these security warnings constantly in the console
Is this something that should be fixed or can we just ignore it?
Meanwhile I am currently increasing the "maxSimultaneousGames" at the bottom of the script. Going from 8 (default) to 12 jumped "nn evals" from around 380/second to around 470/second.
Code: Select all
Content Security Policy: Ignoring “'report-sample'” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “https:” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “http:” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “'unsafe-inline'” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “https://www.google.com/js/bg/” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “https://www.google.com/recaptcha/” within script-src: ‘strict-dynamic’ specified
Meanwhile I am currently increasing the "maxSimultaneousGames" at the bottom of the script. Going from 8 (default) to 12 jumped "nn evals" from around 380/second to around 470/second.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
- ez4u
- Oza
- Posts: 2414
- Joined: Wed Feb 23, 2011 10:15 pm
- Rank: Jp 6 dan
- GD Posts: 0
- KGS: ez4u
- Location: Tokyo, Japan
- Has thanked: 2351 times
- Been thanked: 1332 times
Re: Contribute to Katago training using google colab
My startup looks like this...deungsan wrote:I followed your instruction and got an error as following...
Starting KataGo training...
2021-02-28 22:39:59+0000: Distributed Self Play Engine starting...
2021-02-28 22:39:59+0000: Attempting to connect to server
2021-02-28 22:39:59+0000: isSSL: true
2021-02-28 22:39:59+0000: host: katagotraining.org
2021-02-28 22:39:59+0000: port: 443
2021-02-28 22:39:59+0000: baseResourcePath: /
2021-02-28 22:39:59+0000: KataGo v1.8.0
2021-02-28 22:39:59+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 22:39:59+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 22:39:59+0000: nnRandSeed0 = 10486611865130445872
2021-02-28 22:39:59+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
terminate called after throwing an instance of 'StringError'
what(): OpenCL error at /home/dwugcloud/data/kata/cpp/neuralnet/openclhelpers.cpp, func err, line 263, error CL_PLATFORM_NOT_FOUND_KHR
Code: Select all
Starting KataGo training...
2021-02-28 21:51:52+0000: Distributed Self Play Engine starting...
2021-02-28 21:51:52+0000: Attempting to connect to server
2021-02-28 21:51:52+0000: isSSL: true
2021-02-28 21:51:52+0000: host: katagotraining.org
2021-02-28 21:51:52+0000: port: 443
2021-02-28 21:51:52+0000: baseResourcePath: /
2021-02-28 21:51:52+0000: KataGo v1.8.0
2021-02-28 21:51:52+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 21:51:52+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 21:51:52+0000: nnRandSeed0 = 1331183443207076973
2021-02-28 21:51:52+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
2021-02-28 21:51:52+0000: Cuda backend thread 0: Found GPU Tesla T4 memory 15843721216 compute capability major 7 minor 5
2021-02-28 21:51:52+0000: Cuda backend thread 0: Model version 9 useFP16 = true useNHWC = true
2021-02-28 21:51:52+0000: Cuda backend thread 0: Model name: rect15-b2c16-s13679744-d94886722
2021-02-28 21:51:54+0000: Tiny net sanity check completeAt the very beginning of the output from your run do you see...
Code: Select all
Using Katago Backend : CUDA
GPU : TeslaT4
/content
Cloning into 'katago-colab'...Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
Re: Contribute to Katago training using google colab
My errors are fixed by changing notebook setting. Setting hardware accelerator to "GPU' lets colab use TeslaT4.
Now it works fine.
Now it works fine.
Last edited by deungsan on Sun Feb 28, 2021 6:33 pm, edited 2 times in total.
- ez4u
- Oza
- Posts: 2414
- Joined: Wed Feb 23, 2011 10:15 pm
- Rank: Jp 6 dan
- GD Posts: 0
- KGS: ez4u
- Location: Tokyo, Japan
- Has thanked: 2351 times
- Been thanked: 1332 times
Re: Contribute to Katago training using google colab
The story so far...ez4u wrote:...
Meanwhile I am currently increasing the "maxSimultaneousGames" at the bottom of the script. Going from 8 (default) to 12 jumped "nn evals" from around 380/second to around 470/second.
Code: Select all
"maxSimultaneousGames" 08 "nn evals" ~380/sec
"maxSimultaneousGames" 12 "nn evals" ~470/sec
"maxSimultaneousGames" 16 "nn evals" ~525/sec
"maxSimultaneousGames" 24 "nn evals" ~545/sec
"maxSimultaneousGames" 32 "nn evals" ~555/sec
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
-
lightvector
- Lives in sente
- Posts: 759
- Joined: Sat Jun 19, 2010 10:11 pm
- Rank: maybe 2d
- GD Posts: 0
- Has thanked: 114 times
- Been thanked: 916 times
Re: Contribute to Katago training using google colab
Yeah, GPUs really like it when you have large batches to run in parallel, and more games helps with that.
The one thing I would caution is - please don't make the number of simultaneous games too large compared to the number of games you are playing in a given run before you shut it down or it shuts itself down - ideally make sure the total number of games you're getting per session would be at least 10x or 20x the number of simultaneous games.
The reason is if the total is too small, such that the games are coming in relatively few "waves" before it gets killed, it will create a bias towards short games in the data - because in the last wave, disproportionately short games will be the ones that finish and get uploaded and not the longer ones. Games that were on small boards, or that had fewer fights and were more peaceful, or that initialized starting from later positions, etc. will be favored over the configured and desired distribution.
The one thing I would caution is - please don't make the number of simultaneous games too large compared to the number of games you are playing in a given run before you shut it down or it shuts itself down - ideally make sure the total number of games you're getting per session would be at least 10x or 20x the number of simultaneous games.
The reason is if the total is too small, such that the games are coming in relatively few "waves" before it gets killed, it will create a bias towards short games in the data - because in the last wave, disproportionately short games will be the ones that finish and get uploaded and not the longer ones. Games that were on small boards, or that had fewer fights and were more peaceful, or that initialized starting from later positions, etc. will be favored over the configured and desired distribution.
- wineandgolover
- Lives in sente
- Posts: 866
- Joined: Sun Jul 25, 2010 6:05 am
- GD Posts: 0
- Has thanked: 318 times
- Been thanked: 345 times
Re: Contribute to Katago training using google colab
Yeah, I also saw that every time until last night.ez4u wrote: As far as I understand what we are doing here (questionable right there!), should not be using OpenCL for anything. You should be using CUDA instead.
At the very beginning of the output from your run do you see...This is what I get every time.Code: Select all
Using Katago Backend : CUDA GPU : TeslaT4 /content Cloning into 'katago-colab'...
I ran the Colab script very late last night, and it assigned me to an A100, so I was excited. (Note, this information is wrong, and corrected in subsequent posts) But it didn't use CUDA, instead opting for OpenCL. I waited a half hour and it hadn't finished any games, so I quit and went to bed. I can't guarantee that my interpretation of what happened is perfect, but I think I'm right. Maybe the cat hit a kill switch. I should have taken a screenshot, sorry.
I assume the A100 should support CUDA, right?
Is there a known reason why OpenCL should fail on Colab?
Should I change the very beginning of the script currently set to KATAGO_BACKEND="AUTO"
to read KATAGO_BACKEND="CUDA"?
Anyway, today I'm back on a good old T4 and it chose CUDA again. Following recommendations, I increased the number of games to 16, and it's chugging along nicely. (530'ish nn evals /sec).
Thanks!
Last edited by wineandgolover on Tue Mar 16, 2021 3:35 pm, edited 1 time in total.
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
- ez4u
- Oza
- Posts: 2414
- Joined: Wed Feb 23, 2011 10:15 pm
- Rank: Jp 6 dan
- GD Posts: 0
- KGS: ez4u
- Location: Tokyo, Japan
- Has thanked: 2351 times
- Been thanked: 1332 times
Re: Contribute to Katago training using google colab
At the beginning of the script, probably we can take this
and make it this?
Code: Select all
if gpu_name == "TeslaT4":
KATAGO_BACKEND="CUDA"
else:
KATAGO_BACKEND="OPENCL"
Code: Select all
if gpu_name == "TeslaT4":
KATAGO_BACKEND="CUDA"
elif gpu_name == "A100":
KATAGO_BACKEND="CUDA"
else:
KATAGO_BACKEND="OPENCL"Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
