RAM Latency

For discussing go computing, software announcements, etc.
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

RAM Latency

Post by RobertJasiek »

For a desktop with Ryzen 7700 and B650 Tomahawk and RTX 4070 to build, I can choose RAM

- DDR5-5600 CL28-34-34-89 with at most 2 DIMM (64GB)

- DDR5-4800 CL38-38-38 with up to 4 DIMM (now 64GB, later optionally 128GB)

The slower RAM has the option of 4 DIMMs.

I think the speed 5600 or 4800 is immaterial, right?

The latency might be more relevant, but is it? Does it make a significant difference for KataGo to use CL28-34-34 or CL38-38-38?
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: RAM Latency

Post by And »

I would run the test on the existing hardware, then change the timings and run the test again
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: RAM Latency

Post by RobertJasiek »

Someone might. I can't because first I need to buy the computer:) When doing so, I want to get the more appropriate RAM.
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: RAM Latency

Post by And »

if you are interested, I can test on different frequencies, write an approximate configuration of the katago version, network
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: RAM Latency

Post by And »

you will most likely run a non-CPU version, in my opinion the difference will be very insignificant
User avatar
Cassandra
Lives in sente
Posts: 1326
Joined: Wed Apr 28, 2010 11:33 am
Rank: German 1 Kyu
GD Posts: 0
Has thanked: 14 times
Been thanked: 153 times

Re: RAM Latency

Post by Cassandra »

For my Igo Hatsuyôron 120 @ KataGo 60b training project, selfplay is running on Ryzen 5950 / RTX 3090, using about "only" 18 GByte of SYSTEM memory.
Therefore, I do not assume that SYSTEM memory is any limiting factor, especially not for your likely purposes of analysis.
The really most difficult Go problem ever: https://igohatsuyoron120.de/index.htm
Igo Hatsuyōron #120 (really solved by KataGo)
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: RAM Latency

Post by RobertJasiek »

As has been reported, ca. 6h of calculating ca. 20 million playouts by two 2080TI can fill ca. 64GB of RAM. As I might also want to study complex global ko fights, many playouts can be useful.

Frequencies are less interesting so please do not spend more than a few minutes on them. I have been hoping for somebody to have done RAM tests before or collected experience with different RAM parameters.

I am more interested in RAM latencies. So a bit of more study time for them would be welcome.

Yes, I would run on the GPU rather than nothing but the CPU. That is what graphics cards are for, after all:)
User avatar
Cassandra
Lives in sente
Posts: 1326
Joined: Wed Apr 28, 2010 11:33 am
Rank: German 1 Kyu
GD Posts: 0
Has thanked: 14 times
Been thanked: 153 times

Re: RAM Latency

Post by Cassandra »

What do you expect from 20 million playouts?
The really most difficult Go problem ever: https://igohatsuyoron120.de/index.htm
Igo Hatsuyōron #120 (really solved by KataGo)
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: RAM Latency

Post by RobertJasiek »

Greater confidence than from 20 thousand.
User avatar
Cassandra
Lives in sente
Posts: 1326
Joined: Wed Apr 28, 2010 11:33 am
Rank: German 1 Kyu
GD Posts: 0
Has thanked: 14 times
Been thanked: 153 times

Re: RAM Latency

Post by Cassandra »

RobertJasiek wrote:Greater confidence than from 20 thousand.
If your net is familiar with the position under examination, 20 thousand playouts are more than enough.
If it is not, even 20 billion playouts will not really help.

The more unrealistic the position (in the sense of "never seen in training so far"), the more likely it is that you will have to train your net accordingly.
The really most difficult Go problem ever: https://igohatsuyoron120.de/index.htm
Igo Hatsuyōron #120 (really solved by KataGo)
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: RAM Latency

Post by And »

I started katago_tensorRT (gf 1650 gtx, 18b, 60 sec) 3 times (DDR4 2400 17 17 17 39): 19098, 15387, 16296 visits
and 3 times (DDR4 2133 15 15 15 35): 19735, 21400, 17456 visits
I started it in sabaki, connected the engine again every time and katago made the first move
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: RAM Latency

Post by And »

RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: RAM Latency

Post by RobertJasiek »

Code: Select all

(DDR4 2400 17 17 17 39): average, 19098, 15387, 16296 visits
---------- 100% -------- 100%     100%   100%   100%
(DDR4 2133 15 15 15 35): average, 19735, 21400, 17456 visits
-----------113.3% ------ 116.5%   103.3% 139.1% 107.1%
Supposing only the RAM parameters differed and software details were immaterial, we get the following: The relative speed of the first RAM latency value is almost the average improved percentage of numbers of visits. The RAM frequency seems to be immaterial, or is dominated by the minor latencies. Of course the test sample is small. If, however, we trust it, the conclusion seems to be that RAM latency is a decisive factor!
kvasir
Lives in sente
Posts: 1040
Joined: Sat Jul 28, 2012 12:29 am
Rank: panda 5 dan
GD Posts: 0
IGS: kvasir
Has thanked: 25 times
Been thanked: 187 times

Re: RAM Latency

Post by kvasir »

I suspect you are drawing incorrect conclusions.

Maybe one device is faster but CL of 15 or CL of 17 is the CAS latency in clock cycles, which needs to be divided by the bus speed in hertz to get the cache latency in seconds (usually reported in nano seconds). When a CL 15 device runs on a bus speed of 1066.67 MHz and a CL 17 device runs on 1200 Mhz (which is supposed to be the standard specification) then the CAS latency is 14.063 ns and 14.16 ns respectively. A difference of 0.69%.

The experiment compares a device with a lower data rate and a higher latency with a device with a higher data rate and a lower latency, or that would appear to be the case. I don't know if it's the case or not, but possibly the DDR4 2133 was actually running at the same bus speed as the DDR4 2400. Usually RAM and mother boards have documentation for recommended settings, that would mean the DDR4 2133 15-15-15-35 was basically a DDR4 2400 15-15-15-35 and the CAS latency would be 12.5 ns. These RAM devices use an external clock and usually there are more than one recommended setting.

I'd expect (maybe as a first hypothesis) that comparison of the wall time for a fully memory bound computation that doesn't take advantage of prefetching (or can't benefit) to have a close correspondence to the difference in clock latency measured in nano seconds. If the computation takes advantage of prefetching (from memory into CPU cache) I'd expect a correspondence to the difference in memory transfer rate. One possibility is that the behavior is more indicative of the program used to control KataGo, I think I remember that Sabaki is a Python program and I usually use Katrain, another Python program, and observe that it occasionally uses a lot of memory and CPU and GPU resources (that is the Graphical Processing Unit not the Go Processing Unit).

The DDR5 devices quoted (far) above are different in that one has both a higher data rate and a lower latency than the other, it's should be better all around. The only issue is the possibility the cost of upgrading memory later.

By the way the official AMD website states that the CPU (same CPU or?) supports 4 DIMMs for DDR5-3600 or less and 2 DIMMs for DDR5-5200 or less. With this information I'd assume that the 4x DDR5-4200 or 2x DDR5-5600 would operate as DDR5-3600 or DDR5-5200, since the memory controller is, I believe, built into the AMD CPUs. Then again AMD does somewhat encourage overclocking, it's supposed to void the warranty though.

===Edit for some reason I wrote cache latency but it's called call CAS latency (column address strobe latency).
Last edited by kvasir on Mon May 08, 2023 11:23 am, edited 1 time in total.
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: RAM Latency

Post by RobertJasiek »

Are you effectively trying to say that we do not have enough information to judge whether the RAM with higher frequency or the RAM with lower latency is better for us?
Post Reply