Life In 19x19

Posted: **Sat May 06, 2023 12:28 am**

For a desktop with Ryzen 7700 and B650 Tomahawk and RTX 4070 to build, I can choose RAM

- DDR5-5600 CL28-34-34-89 with at most 2 DIMM (64GB)

- DDR5-4800 CL38-38-38 with up to 4 DIMM (now 64GB, later optionally 128GB)

The slower RAM has the option of 4 DIMMs.

I think the speed 5600 or 4800 is immaterial, right?

The latency might be more relevant, but is it? Does it make a significant difference for KataGo to use CL28-34-34 or CL38-38-38?

Posted: **Sat May 06, 2023 2:19 am**

I would run the test on the existing hardware, then change the timings and run the test again

Posted: **Sat May 06, 2023 2:43 am**

Someone might. I can't because first I need to buy the computer:) When doing so, I want to get the more appropriate RAM.

Posted: **Sat May 06, 2023 2:48 am**

if you are interested, I can test on different frequencies, write an approximate configuration of the katago version, network

Posted: **Sat May 06, 2023 2:56 am**

you will most likely run a non-CPU version, in my opinion the difference will be very insignificant

Posted: **Sat May 06, 2023 5:13 am**

For my Igo Hatsuyôron 120 @ KataGo 60b training project, selfplay is running on Ryzen 5950 / RTX 3090, using about "only" 18 GByte of SYSTEM memory.
Therefore, I do not assume that SYSTEM memory is any limiting factor, especially not for your likely purposes of analysis.

Posted: **Sat May 06, 2023 5:34 am**

As has been reported, ca. 6h of calculating ca. 20 million playouts by two 2080TI can fill ca. 64GB of RAM. As I might also want to study complex global ko fights, many playouts can be useful.

Frequencies are less interesting so please do not spend more than a few minutes on them. I have been hoping for somebody to have done RAM tests before or collected experience with different RAM parameters.

I am more interested in RAM latencies. So a bit of more study time for them would be welcome.

Yes, I would run on the GPU rather than nothing but the CPU. That is what graphics cards are for, after all:)

Posted: **Sun May 07, 2023 9:45 am**

What do you expect from 20 million playouts?

Posted: **Sun May 07, 2023 12:35 pm**

Greater confidence than from 20 thousand.

Posted: **Sun May 07, 2023 2:56 pm**

RobertJasiek wrote:Greater confidence than from 20 thousand.

If your net is familiar with the position under examination, 20 thousand playouts are more than enough.
If it is not, even 20 billion playouts will not really help.

The more unrealistic the position (in the sense of "never seen in training so far"), the more likely it is that you will have to train your net accordingly.

Posted: **Mon May 08, 2023 2:36 am**

I started katago_tensorRT (gf 1650 gtx, 18b, 60 sec) 3 times (DDR4 2400 17 17 17 39): 19098, 15387, 16296 visits
and 3 times (DDR4 2133 15 15 15 35): 19735, 21400, 17456 visits
I started it in sabaki, connected the engine again every time and katago made the first move

Posted: **Mon May 08, 2023 2:59 am**

https://ram.userbenchmark.com/Software

Posted: **Mon May 08, 2023 3:38 am**

Code: Select all

(DDR4 2400 17 17 17 39): average, 19098, 15387, 16296 visits
---------- 100% -------- 100%     100%   100%   100%
(DDR4 2133 15 15 15 35): average, 19735, 21400, 17456 visits
-----------113.3% ------ 116.5%   103.3% 139.1% 107.1%

Supposing only the RAM parameters differed and software details were immaterial, we get the following: The relative speed of the first RAM latency value is almost the average improved percentage of numbers of visits. The RAM frequency seems to be immaterial, or is dominated by the minor latencies. Of course the test sample is small. If, however, we trust it, the conclusion seems to be that RAM latency is a decisive factor!

Posted: **Mon May 08, 2023 7:48 am**

I suspect you are drawing incorrect conclusions.

Maybe one device is faster but CL of 15 or CL of 17 is the CAS latency in clock cycles, which needs to be divided by the bus speed in hertz to get the cache latency in seconds (usually reported in nano seconds). When a CL 15 device runs on a bus speed of 1066.67 MHz and a CL 17 device runs on 1200 Mhz (which is supposed to be the standard specification) then the CAS latency is 14.063 ns and 14.16 ns respectively. A difference of 0.69%.

The experiment compares a device with a lower data rate and a higher latency with a device with a higher data rate and a lower latency, or that would appear to be the case. I don't know if it's the case or not, but possibly the DDR4 2133 was actually running at the same bus speed as the DDR4 2400. Usually RAM and mother boards have documentation for recommended settings, that would mean the DDR4 2133 15-15-15-35 was basically a DDR4 2400 15-15-15-35 and the CAS latency would be 12.5 ns. These RAM devices use an external clock and usually there are more than one recommended setting.

I'd expect (maybe as a first hypothesis) that comparison of the wall time for a fully memory bound computation that doesn't take advantage of prefetching (or can't benefit) to have a close correspondence to the difference in clock latency measured in nano seconds. If the computation takes advantage of prefetching (from memory into CPU cache) I'd expect a correspondence to the difference in memory transfer rate. One possibility is that the behavior is more indicative of the program used to control KataGo, I think I remember that Sabaki is a Python program and I usually use Katrain, another Python program, and observe that it occasionally uses a lot of memory and CPU and GPU resources (that is the Graphical Processing Unit not the Go Processing Unit).

The DDR5 devices quoted (far) above are different in that one has both a higher data rate and a lower latency than the other, it's should be better all around. The only issue is the possibility the cost of upgrading memory later.

By the way the official AMD website states that the CPU (same CPU or?) supports 4 DIMMs for DDR5-3600 or less and 2 DIMMs for DDR5-5200 or less. With this information I'd assume that the 4x DDR5-4200 or 2x DDR5-5600 would operate as DDR5-3600 or DDR5-5200, since the memory controller is, I believe, built into the AMD CPUs. Then again AMD does somewhat encourage overclocking, it's supposed to void the warranty though.

===Edit for some reason I wrote cache latency but it's called call CAS latency (column address strobe latency).

Posted: **Mon May 08, 2023 11:07 am**

Are you effectively trying to say that we do not have enough information to judge whether the RAM with higher frequency or the RAM with lower latency is better for us?

Life In 19x19

RAM Latency

RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency

Re: RAM Latency