It is currently Thu Apr 18, 2024 9:01 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 11 posts ] 
Author Message
Offline
 Post subject: KataGo self-play on a Macbook Pro
Post #1 Posted: Sat Feb 22, 2020 11:13 am 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
Hi,

I would like to try the self-play of KataGo on my Macbook Pro. My goal is not to create a very strong NN, but to create a NN that is playable on amateur level. I noticed that the Readme recommends to run on 4 machines. However I don't have the recommended setup. So if I run them from same machine what will happen? How do I tweak the config to make the programs play nicely with each other?

Thank you for any suggestions.
Cao

Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #2 Posted: Sat Feb 22, 2020 1:55 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
There are two options.

* One is you could still try to run them all simultaneously. In this case, you'll need to provide appropriate command line options to train.py to use less memory - Tensorflow will tend to hog the entire GPU's memory unless you specify otherwise. You will also probably want to slow down train.py by forcing it to wait extra time in-between training epochs doing nothing - because otherwise training will outpace selfplay. Commonly, I think you want 5x-50x of your compute to be on selfplay compared to training, at least on 19x19. You'll also need a ton of RAM probably.

* Perhaps a better option is to run everything sequentially. For selfplay, you can specify a fixed number of games to play in the .cfg file. Take a look at how the shuffle loop works in the script and run shuffle as a single invocation rather than in a loop, calling it once each time after you finish a new set of games. For train.py, as of the tip of the master branch on Github, you can specify a number of epochs to train before the script terminates. Similarly, run model exporting not in a loop, but rather only once after train.py finishes saving the next model and quits. If you choose to use gatekeeper, gatekeeper also has a command line option to terminate once it has nothing to do.

Either of these two ways, run the various KataGo commands and python scripts with "-help" for more info about the relevant command line options you need to use, and don't be afraid to take a look at the implementation of any of the bash scripts. You should be prepared to have to dig into some of these details and get your hands dirty with things - there will be parameters you should want to tweak based on your setup: choose how many shuffle threads to use depending on how many CPU cores you have, adjust shuffle and training batch size smaller if it's taking too much GPU memory, set selfplay to use fewer playouts if you want just short-term learning speed and don't mind long-term weaker strength, etc, etc.

Start only one piece at a time and make sure its working/producing data, and/or check its log files for obvious errors, watch a system monitor to make sure you aren't running your machine out of memory or that you're actually using the CPU/GPU as expected - before moving on to the next thing.


This post by lightvector was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #3 Posted: Sat Feb 22, 2020 3:33 pm 
Lives in gote

Posts: 586
Location: Adelaide, South Australia
Liked others: 208
Was liked: 265
Rank: Australian 2 dan
GD Posts: 200
gcao wrote:
My goal is not to create a very strong NN, but to create a NN that is playable on amateur level.

By the way, there are some weaker KataGo networks at https://d3dndmfyhecmj0.cloudfront.net/g ... index.html . I think the first of the 10-block networks is already at least high dan level. You might want to try some of the 6-block networks.

Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #4 Posted: Sat Feb 22, 2020 4:14 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Even 6 blocks 96 channels can reach high amateur dan level. However, 6 blocks also is few enough that the neural net will do very strange things with large dragons, due to simply being incapable of perceiving the whole group. (6 blocks = ~12 layers, so the max distance any stone can influence another is about distance 12-14, and less if some nontrivial computation needs to be done, and less if the group winds around a little), causing very non-human-like errors.

You might consider something like 10 blocks 64 channels or 10 blocks 48 channels or something like that, to get wider board perception but also still have few parameters and remain at weaker levels.

Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #5 Posted: Sat Feb 22, 2020 10:15 pm 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
Thanks a lot for the information. Will try it out and see how it goes.

Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #6 Posted: Sun Feb 23, 2020 6:17 am 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
Thanks a lot for the information. It's good to know it's possible to run the selfplay+training on one machine. I'll give it a try and see how it goes.

BTW, my end goal is to adapt the program to handle Daoqi / Toroidal Go(https://senseis.xmp.net/?ToroidalGo), before that, I want to get training to work on my computers.

Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #7 Posted: Sun Feb 23, 2020 7:51 am 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
I'm getting this error:
Training data json file does not exist, waiting and trying again later: .../KataGo/shared/shuffleddata/20200222-143315/train.json

After searching the whole codebase, I didn't find where train.json is created. I've run self-play and shuffle/export before the training step. Did I miss anything?

Here are the commands I ran
Code:
    cpp/katago selfplay -output-dir shared/selfplay -models-dir shared/models -config-file cpp/configs/selfplay1.cfg

    ./selfplay/shuffle_and_export_loop.sh CAO ../shared/ ../shared/scratch 4 1

    ./selfplay/train.sh ../shared/ CAO b6c96 main -lr-scale 1.0

Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #8 Posted: Sun Feb 23, 2020 8:54 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
That's normal, it's just letting you know that the shuffler has not produced any shuffled data yet. Since there is no data, training will not proceed, instead it will wait until there is data. If all is working, and you're running in asynchronous everything-at-once mode, then eventually there should be enough data. Or there could be an error, and there will never be enough.

The *reason* there is no data - you can investigate. Did the self-play actually generate data? Take a look at the logs, and take a look at the directory where it should have output data.

And did it generate enough data? Take a look at the shuffler's logs (the shuffle loop script should write an "outshuffle.txt" file where you ran it) for how much data the shuffler thinks it found. Shuffler is configured by default to start out with a training window of 250K rows. Fewer than that and it will not proceed. Too-small of training windows will lead to more overfitting, but you can also still decrease the initial window size if you like to proceed anyways with less than that much data.

I'll make the message about the train.json slightly clearer. Sorry about the confusion. Generally when debugging it helps to be looking at the logs and/or output all the way along, not just at the step you think isn't working.

(edit: some clarifications, less stupid phrasing)

Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #9 Posted: Sun Feb 23, 2020 10:04 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Pushed some improvements to the docs:
https://github.com/lightvector/KataGo/c ... 7f6f9e4d67

Hope that helps. Let me know if you have further questions! :study:


This post by lightvector was liked by: xela
Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #10 Posted: Sun Mar 01, 2020 7:27 am 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
Thank you!

I'm able to run self-play/shuffle/export/train sequentially after I changed a few parameters(games=100, move-per-games=500, min-rows=1). I understand this won't create a playable model. I just want to get this process to run end-to-end first.

I noticed even when there are only 100 games played, the training takes very long time (much longer than the other steps). One epoch took 5 hours, The other steps took several minutes only. I have a very low-end GPU(Intel Iris Pro GPU with 1.5GB memory). That may have slowed down the training. However I wonder whether I missed anything else.

I inspected train.py as well. Looks like it runs an endless loop. Is that true? If I want to run it as part of play-shuffle-training process, do I just remove the "while True" loop.


Attachments:
File comment: Log including self-play, shuffle, training
katago.txt [67.45 KiB]
Downloaded 342 times
Top
 Profile  
 
Offline
 Post subject: Re: KataGo self-play on a Macbook Pro
Post #11 Posted: Sun Mar 01, 2020 11:01 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
You can configure train.py to stop after a certain number of epochs. Run it with -help to see all the arguments.

Or look here, this is the argument you want:
https://github.com/lightvector/KataGo/b ... ain.py#L48

An epoch is defined as 1 million training steps by default, it doesn't not depend on the amount of data. It really couldn't, since in normal operation more data is continuously being generated as you train, there's no well-defined notion of when you're "done" passing over the data. So probably the 5 hours were spent making a very large number of passes over the tiny amount of data you have, it didn't matter that it was a small amount. You can also change how many steps are considered one epoch with another of the command line flags.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group