Life In 19x19 http://www.lifein19x19.com/ |
|
KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORTED http://www.lifein19x19.com/viewtopic.php?f=18&t=18750 |
Page 1 of 1 |
Author: | gcao [ Tue May 24, 2022 8:24 am ] |
Post subject: | KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORTED |
Hi @lightvector, Hope this finds you well! Not sure whether you remember me. Two years ago I spent a few months trying to set up KataGo on my laptop to train a model to play Go and also worked on adapting KataGo to play one of Go's variants - Daoqi. However I wasn't able to get very far because I didn't have a decent GPU and it's too expensive to get one. Now two years later GPUs are more affordable. So I built a brand new machine with AMD Ryzen 9 5900x + Nvidia GeForce Rtx 3080Ti(12GB) + 64GB RAM. I installed Ubuntu 20.04 with CUDA 11.7.1, CUDNN 8.4.0, Python 3.7, TensorFlow 1.15 etc. I was able to compile KataGo with CUDA backend and run the synchronous_loop.sh. The selfplay, shuffle, train etc worked fine. However the gatekeeper is throwing below error. I understand gatekeeper is optional but this error might occur while I run the model as well I guess. Wonder what I should do to fix this error. Any help would be highly appreciated. Code: ...
2022-05-24 10:57:03-0400: Game loop thread 127 starting game testing candidate: mbp-s656768-d204361 terminate called after throwing an instance of 'StringError' what(): CUBLAS Error, for ginputw file /home/gcao/KataGo2/cpp/neuralnet/cudabackend.cpp, func cublasHgemm( cudaHandles->cublas, CUBLAS_OP_N, CUBLAS_OP_N, outChannels, batchSize, inChannels, alpha, (const half*)matBuf,outChannels, (const half*)inputBuf,inChannels, beta, (half*)outputBuf,outChannels ), line 663, error CUBLAS_STATUS_NOT_SUPPORTED Aborted (core dumped) |
Author: | lightvector [ Wed May 25, 2022 4:26 pm ] |
Post subject: | Re: KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORT |
That's a little surprising. I don't know. Some thoughts: * I have never tested KataGo with CUDA 11.7.1. You may notice the release is back at 11.1 or 11.2 (https://github.com/lightvector/KataGo/r ... ag/v1.11.0), but I've also successfully used cuda 11.4 (along with cudnn 8.2.4). Does installing a side-by-side downgraded CUDA 11.4 and cudnn 8.2.4 and using that instead work for you? (As a side note, if you're on Linux, although slightly out of date, https://www.iridescent.io/tech-blogs-in ... right-way/ is a good guide to installing cuda in a way that won't bork future attempts to upgrade/downgrade, easily allows having multiple side-by-side versions installed at once, etc. In general the secret is to use the runfile version - I've used the deb version in the past and it always leaves apt packages in a messy state when I try to change versions. Indeed, the runfile version is also the one you can do without sudo: https://stackoverflow.com/questions/674 ... thout-sudo, i.e. you can do it in an entirely local and self-contained way) * Does KataGo's OpenCL version work for you and use your GPU successfully? (this might distinguish a GPU/GPU-driver issue from a CUDA-library-level issue). * Instead of running gatekeeper right away, how about just running plain old KataGo benchmark, or hooking up to any popular game analysis GUI and just doing plain game analysis? * Does it work if you disable FP16 in the config? (e.g. cudaUseFP16 = false in the config) * There is some chance some other user in the discord https://discord.gg/45EWcZu7 will have seen a similar error and can help you troubleshoot. |
Author: | gcao [ Thu May 26, 2022 6:07 am ] |
Post subject: | Re: KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORT |
Thanks a lot. I did try to run benchmark and got same error. I'll try the downgrade and other suggestions. |
Author: | gcao [ Thu May 26, 2022 10:14 am ] |
Post subject: | Re: KataGo gatekeeper throws error CUBLAS_STATUS_NOT_SUPPORT |
I tried to set cudaUseFP16 to false. Both gatekeeper and benchmark worked fine. |
Page 1 of 1 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |