Uberdude wrote:
Or do we just trust the magic extrapolation abilities of neural networks?
No we don't

But it's fun to read the tea leaves.
Here's a couple of interesting things. First, the winrates (my scores are only slightly different from gennan's: this could be the random number see or a different number of playouts. I'm using 15,000).
Code:
handicap score winrate
1 9.0 74.6%
2 28.7 96.9%
3 44.0 98.8%
4 56.8 99.2%
5 59.8 99.1%
So for more than 2 stones, the winrate difference for each extra stone is too small to measure accurately.
Second, the above numbers (and I guess gennan's too) are obtained by putting the handicap stones on the board, set komi to zero and ask KataGo to evaluate the position. But if you try different choices of komi, then it explores different moves, and this affects the results.
On 2 stones with 29 points komi, it puts black 7 points behind, winrate 31%. But 2 stones with 22 points komi gives black only 0.5 point behind, 48% winrate.
On 4 and 5 stones, the most balanced komi is 53 points for 4 stones and 63 points for 5 stones. So by this measure, the 5th stone might be worth a bit less, but the difference isn't so drastic.
Is either measure more accurate (or less meaningless) than the other?
The question I would ask, as you did, is what is the proper komi for each handicap.
IMX, however, one stone per rank is a better measure than X pts. per rank. Somewhat surprising, and I only have a few examples. But I did give 40 stones once and won by 10 pts.