Life In 19x19
http://www.lifein19x19.com/

KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORTED
http://www.lifein19x19.com/viewtopic.php?f=18&t=18750
Page 1 of 1

Author:  gcao [ Tue May 24, 2022 8:24 am ]
Post subject:  KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORTED

Hi @lightvector,

Hope this finds you well!

Not sure whether you remember me. Two years ago I spent a few months trying to set up KataGo on my laptop to train a model to play Go and also worked on adapting KataGo to play one of Go's variants - Daoqi. However I wasn't able to get very far because I didn't have a decent GPU and it's too expensive to get one.

Now two years later GPUs are more affordable. So I built a brand new machine with AMD Ryzen 9 5900x + Nvidia GeForce Rtx 3080Ti(12GB) + 64GB RAM. I installed Ubuntu 20.04 with CUDA 11.7.1, CUDNN 8.4.0, Python 3.7, TensorFlow 1.15 etc. I was able to compile KataGo with CUDA backend and run the synchronous_loop.sh. The selfplay, shuffle, train etc worked fine. However the gatekeeper is throwing below error. I understand gatekeeper is optional but this error might occur while I run the model as well I guess. Wonder what I should do to fix this error. Any help would be highly appreciated.

Code:
...
2022-05-24 10:57:03-0400: Game loop thread 127 starting game testing candidate: mbp-s656768-d204361
terminate called after throwing an instance of 'StringError'
  what():  CUBLAS Error, for ginputw file /home/gcao/KataGo2/cpp/neuralnet/cudabackend.cpp, func cublasHgemm( cudaHandles->cublas, CUBLAS_OP_N, CUBLAS_OP_N, outChannels, batchSize, inChannels, alpha, (const half*)matBuf,outChannels, (const half*)inputBuf,inChannels, beta, (half*)outputBuf,outChannels ), line 663, error CUBLAS_STATUS_NOT_SUPPORTED
Aborted (core dumped)

Author:  lightvector [ Wed May 25, 2022 4:26 pm ]
Post subject:  Re: KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORT

That's a little surprising. I don't know. Some thoughts:

* I have never tested KataGo with CUDA 11.7.1. You may notice the release is back at 11.1 or 11.2 (https://github.com/lightvector/KataGo/r ... ag/v1.11.0), but I've also successfully used cuda 11.4 (along with cudnn 8.2.4). Does installing a side-by-side downgraded CUDA 11.4 and cudnn 8.2.4 and using that instead work for you?

(As a side note, if you're on Linux, although slightly out of date, https://www.iridescent.io/tech-blogs-in ... right-way/ is a good guide to installing cuda in a way that won't bork future attempts to upgrade/downgrade, easily allows having multiple side-by-side versions installed at once, etc. In general the secret is to use the runfile version - I've used the deb version in the past and it always leaves apt packages in a messy state when I try to change versions. Indeed, the runfile version is also the one you can do without sudo: https://stackoverflow.com/questions/674 ... thout-sudo, i.e. you can do it in an entirely local and self-contained way)

* Does KataGo's OpenCL version work for you and use your GPU successfully? (this might distinguish a GPU/GPU-driver issue from a CUDA-library-level issue).

* Instead of running gatekeeper right away, how about just running plain old KataGo benchmark, or hooking up to any popular game analysis GUI and just doing plain game analysis?

* Does it work if you disable FP16 in the config? (e.g. cudaUseFP16 = false in the config)

* There is some chance some other user in the discord https://discord.gg/45EWcZu7 will have seen a similar error and can help you troubleshoot.

Author:  gcao [ Thu May 26, 2022 6:07 am ]
Post subject:  Re: KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORT

Thanks a lot. I did try to run benchmark and got same error. I'll try the downgrade and other suggestions.

Author:  gcao [ Thu May 26, 2022 10:14 am ]
Post subject:  Re: KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORT

I tried to set cudaUseFP16 to false. Both gatekeeper and benchmark worked fine.

Page 1 of 1 All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/