Improving cloud performance with caching

hhyyrylainen · February 3, 2026, 11:15am

Recently the compound cloud performance has been attempted to be improved. One of those was to introduce caching for cloud to world conversion. That requires 81 vectors in total to handle all combinations, and a dictionary was added to cache this. Then later it was brought up that a FrozenDictionary should be faster. So I decided to uset BenchmarkDotNet so that it would be clear what is faster. For fun I also added a benchmark that recalculated the world vector each time, and the results are a bit surprising.

Here’s the raw table output:

Method	NonSequentialReads	ExtraReaderThreads	Mean	Error	StdDev	Allocated
RecalculateKeyEachTime	0	0	26.93 ms	0.325 ms	0.271 ms	328 B
DictionaryTryGetValue	0	0	41.54 ms	0.380 ms	0.337 ms	2640 B
DictionaryMarshalGetRef	0	0	39.84 ms	0.241 ms	0.226 ms	2640 B
FrozenDictionaryTryGet	0	0	39.71 ms	0.256 ms	0.240 ms	3776 B
FrozenDictionaryGetValueRef	0	0	39.54 ms	0.324 ms	0.303 ms	3776 B
RecalculateKeyEachTime	0	1	26.61 ms	0.125 ms	0.105 ms	896 B
DictionaryTryGetValue	0	1	40.11 ms	0.187 ms	0.175 ms	3080 B
DictionaryMarshalGetRef	0	1	40.21 ms	0.228 ms	0.202 ms	3080 B
FrozenDictionaryTryGet	0	1	37.41 ms	0.206 ms	0.182 ms	4216 B
FrozenDictionaryGetValueRef	0	1	35.49 ms	0.135 ms	0.126 ms	4284 B
RecalculateKeyEachTime	0	2	27.19 ms	0.122 ms	0.115 ms	1216 B
DictionaryTryGetValue	0	2	42.54 ms	0.380 ms	0.356 ms	3456 B
DictionaryMarshalGetRef	0	2	42.06 ms	0.369 ms	0.345 ms	3456 B
FrozenDictionaryTryGet	0	2	37.56 ms	0.332 ms	0.310 ms	4602 B
FrozenDictionaryGetValueRef	0	2	38.64 ms	0.171 ms	0.143 ms	4592 B
RecalculateKeyEachTime	0.01	0	27.77 ms	0.252 ms	0.224 ms	328 B
DictionaryTryGetValue	0.01	0	42.06 ms	0.201 ms	0.178 ms	2640 B
DictionaryMarshalGetRef	0.01	0	42.04 ms	0.317 ms	0.281 ms	2640 B
FrozenDictionaryTryGet	0.01	0	37.06 ms	0.124 ms	0.104 ms	3776 B
FrozenDictionaryGetValueRef	0.01	0	41.23 ms	0.211 ms	0.187 ms	3776 B
RecalculateKeyEachTime	0.01	1	28.50 ms	0.160 ms	0.150 ms	772 B
DictionaryTryGetValue	0.01	1	44.47 ms	0.247 ms	0.219 ms	3080 B
DictionaryMarshalGetRef	0.01	1	45.20 ms	0.349 ms	0.326 ms	3080 B
FrozenDictionaryTryGet	0.01	1	37.46 ms	0.276 ms	0.258 ms	4216 B
FrozenDictionaryGetValueRef	0.01	1	39.89 ms	0.243 ms	0.216 ms	4216 B
RecalculateKeyEachTime	0.01	2	28.77 ms	0.153 ms	0.136 ms	1327 B
DictionaryTryGetValue	0.01	2	42.90 ms	0.322 ms	0.301 ms	3592 B
DictionaryMarshalGetRef	0.01	2	42.45 ms	0.227 ms	0.202 ms	3490 B
FrozenDictionaryTryGet	0.01	2	39.44 ms	0.074 ms	0.066 ms	4592 B
FrozenDictionaryGetValueRef	0.01	2	39.26 ms	0.504 ms	0.472 ms	4592 B
RecalculateKeyEachTime	0.1	0	41.39 ms	0.108 ms	0.096 ms	328 B
DictionaryTryGetValue	0.1	0	58.23 ms	0.312 ms	0.277 ms	2640 B
DictionaryMarshalGetRef	0.1	0	61.12 ms	0.295 ms	0.261 ms	2640 B
FrozenDictionaryTryGet	0.1	0	48.05 ms	0.149 ms	0.124 ms	3776 B
FrozenDictionaryGetValueRef	0.1	0	50.45 ms	0.492 ms	0.436 ms	3776 B
RecalculateKeyEachTime	0.1	1	41.51 ms	0.266 ms	0.236 ms	886 B
DictionaryTryGetValue	0.1	1	61.36 ms	1.043 ms	0.975 ms	3080 B
DictionaryMarshalGetRef	0.1	1	65.37 ms	0.867 ms	0.811 ms	3080 B
FrozenDictionaryTryGet	0.1	1	48.73 ms	0.419 ms	0.392 ms	4216 B
FrozenDictionaryGetValueRef	0.1	1	47.94 ms	0.130 ms	0.115 ms	4216 B
RecalculateKeyEachTime	0.1	2	42.11 ms	0.326 ms	0.289 ms	1234 B
DictionaryTryGetValue	0.1	2	66.24 ms	0.339 ms	0.300 ms	3456 B
DictionaryMarshalGetRef	0.1	2	60.65 ms	0.201 ms	0.168 ms	3456 B
FrozenDictionaryTryGet	0.1	2	48.95 ms	0.203 ms	0.180 ms	4604 B
FrozenDictionaryGetValueRef	0.1	2	49.50 ms	0.231 ms	0.181 ms	4592 B
RecalculateKeyEachTime	0.9	0	101.42 ms	0.276 ms	0.244 ms	328 B
DictionaryTryGetValue	0.9	0	130.20 ms	0.642 ms	0.569 ms	2640 B
DictionaryMarshalGetRef	0.9	0	130.23 ms	0.287 ms	0.254 ms	2640 B
FrozenDictionaryTryGet	0.9	0	86.01 ms	0.336 ms	0.315 ms	3776 B
FrozenDictionaryGetValueRef	0.9	0	86.36 ms	0.318 ms	0.282 ms	3776 B
RecalculateKeyEachTime	0.9	1	103.33 ms	0.147 ms	0.131 ms	768 B
DictionaryTryGetValue	0.9	1	132.43 ms	0.554 ms	0.462 ms	3144 B
DictionaryMarshalGetRef	0.9	1	134.23 ms	0.728 ms	0.608 ms	3080 B
FrozenDictionaryTryGet	0.9	1	87.85 ms	0.150 ms	0.117 ms	4307 B
FrozenDictionaryGetValueRef	0.9	1	88.98 ms	0.824 ms	0.730 ms	4234 B
RecalculateKeyEachTime	0.9	2	107.62 ms	0.572 ms	0.478 ms	1208 B
DictionaryTryGetValue	0.9	2	134.71 ms	1.245 ms	1.039 ms	3524 B
DictionaryMarshalGetRef	0.9	2	134.57 ms	1.533 ms	1.434 ms	3483 B
FrozenDictionaryTryGet	0.9	2	90.13 ms	0.958 ms	0.896 ms	4683 B
FrozenDictionaryGetValueRef	0.9	2	90.46 ms	1.590 ms	1.409 ms	4660 B

It is quite surprising that the recalculation is faster in many cases. However FrozenDictionary clearly wins over normal one. And interestingly enough when there is enough randomness in the reads, it wins over the precalculation approach.

So I guess the ultimate question is if most of our cloud calculations are actually sequential or more random? I would bet that we have quite a lot of sequential reads because cells absorb things in a circle around them thus leading to quite similar positions in a row.

Here’s my draft PR where I have the benchmark code:

github.com/Revolutionary-Games/Thrive

Cloud vector cache improvements

master ← cloud_vector_cache_improvements

opened 11:16AM - 03 Feb 26 UTC

hhyyrylainen

+512 -3

**Brief Description of What This PR Does** WIP changes for the cloud vector c…ache **Related Issues** **Progress Checklist** Note: before starting this checklist the PR should be marked as non-draft. - [ ] PR author has checked that this PR works as intended and doesn't break existing features: https://wiki.revolutionarygamesstudio.com/wiki/Testing_Checklist (this is important as to not waste the time of Thrive team members reviewing this PR) - [ ] Initial code review passed (this and further items should not be checked by the PR author) - [ ] Functionality is confirmed working by another person (see above checklist link) - [ ] Final code review is passed and code conforms to the [styleguide](https://github.com/Revolutionary-Games/Thrive/blob/master/doc/style_guide.md). Before merging all CI jobs should finish on this PR without errors, if there are automatically detected style issues they should be fixed by the PR author. Merging must follow our [styleguide](https://github.com/Revolutionary-Games/Thrive/blob/master/doc/style_guide.md#git).

xfractalino · February 3, 2026, 11:26am

That’s pretty interesting.
It looks like direct computation is actually faster due to CPU caching for sequential access, as the number of different configurations for the calculations is small. If the access is random FrozenDictionary is better because it relies on precomputed keys, making the reading overhead almost zero.

hhyyrylainen · February 3, 2026, 1:25pm

Interestingly by making the random number generations always happen, that changes the results significantly. Now the frozen dictionary always wins, and even over the recalculation each time:

Method	NonSequentialReads	ExtraReaderThreads	Mean	Error	StdDev
RecalculateKeyEachTime	0	0	117.79 ms	0.386 ms	0.361 ms
DictionaryTryGetValue	0	0	98.94 ms	0.738 ms	0.654 ms
DictionaryMarshalGetRef	0	0	98.40 ms	0.396 ms	0.331 ms
FrozenDictionaryTryGet	0	0	93.57 ms	0.515 ms	0.457 ms
FrozenDictionaryGetValueRef	0	0	91.02 ms	0.566 ms	0.502 ms
RecalculateKeyEachTime	0	1	117.64 ms	0.383 ms	0.339 ms
DictionaryTryGetValue	0	1	104.39 ms	0.662 ms	0.587 ms
DictionaryMarshalGetRef	0	1	100.53 ms	0.553 ms	0.491 ms
FrozenDictionaryTryGet	0	1	95.67 ms	0.486 ms	0.431 ms
FrozenDictionaryGetValueRef	0	1	94.42 ms	0.391 ms	0.365 ms
RecalculateKeyEachTime	0	2	122.23 ms	0.586 ms	0.519 ms
DictionaryTryGetValue	0	2	105.33 ms	0.777 ms	0.689 ms
DictionaryMarshalGetRef	0	2	99.32 ms	0.666 ms	0.556 ms
FrozenDictionaryTryGet	0	2	96.77 ms	1.037 ms	0.919 ms
FrozenDictionaryGetValueRef	0	2	110.77 ms	2.179 ms	2.332 ms
RecalculateKeyEachTime	0.01	0	121.49 ms	0.420 ms	0.372 ms
DictionaryTryGetValue	0.01	0	99.49 ms	0.490 ms	0.459 ms
DictionaryMarshalGetRef	0.01	0	100.45 ms	0.893 ms	0.836 ms
FrozenDictionaryTryGet	0.01	0	91.49 ms	0.734 ms	0.686 ms
FrozenDictionaryGetValueRef	0.01	0	96.39 ms	0.334 ms	0.296 ms
RecalculateKeyEachTime	0.01	1	121.24 ms	0.862 ms	0.806 ms
DictionaryTryGetValue	0.01	1	103.98 ms	0.785 ms	0.656 ms
DictionaryMarshalGetRef	0.01	1	103.60 ms	0.593 ms	0.555 ms
FrozenDictionaryTryGet	0.01	1	95.69 ms	0.452 ms	0.423 ms
FrozenDictionaryGetValueRef	0.01	1	95.45 ms	0.364 ms	0.340 ms
RecalculateKeyEachTime	0.01	2	124.16 ms	0.649 ms	0.575 ms
DictionaryTryGetValue	0.01	2	105.92 ms	0.677 ms	0.565 ms
DictionaryMarshalGetRef	0.01	2	104.74 ms	0.760 ms	0.673 ms
FrozenDictionaryTryGet	0.01	2	97.99 ms	1.604 ms	1.500 ms
FrozenDictionaryGetValueRef	0.01	2	100.18 ms	0.588 ms	0.521 ms
RecalculateKeyEachTime	0.1	0	117.61 ms	0.272 ms	0.241 ms
DictionaryTryGetValue	0.1	0	107.77 ms	0.647 ms	0.573 ms
DictionaryMarshalGetRef	0.1	0	108.34 ms	0.449 ms	0.398 ms
FrozenDictionaryTryGet	0.1	0	100.19 ms	0.946 ms	0.838 ms
FrozenDictionaryGetValueRef	0.1	0	96.80 ms	0.695 ms	0.650 ms
RecalculateKeyEachTime	0.1	1	128.40 ms	0.387 ms	0.362 ms
DictionaryTryGetValue	0.1	1	112.01 ms	0.441 ms	0.412 ms
DictionaryMarshalGetRef	0.1	1	112.70 ms	0.414 ms	0.387 ms
FrozenDictionaryTryGet	0.1	1	101.37 ms	2.010 ms	2.064 ms
FrozenDictionaryGetValueRef	0.1	1	99.01 ms	0.262 ms	0.204 ms
RecalculateKeyEachTime	0.1	2	124.14 ms	0.396 ms	0.330 ms
DictionaryTryGetValue	0.1	2	114.56 ms	1.038 ms	0.920 ms
DictionaryMarshalGetRef	0.1	2	113.30 ms	0.721 ms	0.674 ms
FrozenDictionaryTryGet	0.1	2	98.69 ms	0.393 ms	0.328 ms
FrozenDictionaryGetValueRef	0.1	2	101.33 ms	0.510 ms	0.478 ms
RecalculateKeyEachTime	0.9	0	110.35 ms	0.281 ms	0.235 ms
DictionaryTryGetValue	0.9	0	127.40 ms	0.710 ms	0.664 ms
DictionaryMarshalGetRef	0.9	0	127.97 ms	1.092 ms	0.968 ms
FrozenDictionaryTryGet	0.9	0	93.13 ms	0.452 ms	0.377 ms
FrozenDictionaryGetValueRef	0.9	0	94.26 ms	0.779 ms	0.729 ms
RecalculateKeyEachTime	0.9	1	111.31 ms	0.255 ms	0.239 ms
DictionaryTryGetValue	0.9	1	130.53 ms	0.596 ms	0.497 ms
DictionaryMarshalGetRef	0.9	1	131.03 ms	0.776 ms	0.688 ms
FrozenDictionaryTryGet	0.9	1	93.97 ms	0.354 ms	0.314 ms
FrozenDictionaryGetValueRef	0.9	1	95.32 ms	0.436 ms	0.408 ms
RecalculateKeyEachTime	0.9	2	117.28 ms	0.354 ms	0.313 ms
DictionaryTryGetValue	0.9	2	132.47 ms	0.754 ms	0.630 ms
DictionaryMarshalGetRef	0.9	2	133.18 ms	0.490 ms	0.383 ms
FrozenDictionaryTryGet	0.9	2	96.27 ms	0.688 ms	0.574 ms
FrozenDictionaryGetValueRef	0.9	2	96.10 ms	0.312 ms	0.277 ms

I’ll try profiling the game itself a bit to see if Thrive itself is closer to the situation with not much other happening in the loop with the linear scan or with a bunch of interleaved work / random access happening. But I’ll at least change the game to use a frozen dictionary when using a dictionary for the cloud processing.

Edit: looks like the game itself is closer to the more complex scenario where FrozenDictionary wins over recalculation so I’ll update my PR accordingly, which is now also ready for review.

xfractalino · February 3, 2026, 4:09pm

I reclaimed a bit more performance by ditching the dictionary completely and using a flat array.

Method	NonSequentialReads	Reads	Mean	Error	StdDev	Allocated
RecalculateKeyEachTime	0	5000000	230.3 ms	0.37 ms	0.33 ms	328 B
DictionaryTryGetValue	0	5000000	223.7 ms	4.35 ms	7.38 ms	2640 B
DictionaryMarshalGetRef	0	5000000	208.1 ms	4.14 ms	4.43 ms	2640 B
FrozenDictionaryTryGet	0	5000000	187.9 ms	1.09 ms	0.97 ms	3776 B
FrozenDictionaryGetValueRef	0	5000000	189.0 ms	3.46 ms	3.24 ms	3776 B
ArrayFlatIndex	0	5000000	147.3 ms	0.39 ms	0.37 ms	328 B
RecalculateKeyEachTime	0.01	5000000	233.3 ms	2.77 ms	2.31 ms	328 B
DictionaryTryGetValue	0.01	5000000	206.9 ms	3.00 ms	2.81 ms	2640 B
DictionaryMarshalGetRef	0.01	5000000	204.7 ms	4.03 ms	3.96 ms	2640 B
FrozenDictionaryTryGet	0.01	5000000	189.1 ms	2.54 ms	2.38 ms	3776 B
FrozenDictionaryGetValueRef	0.01	5000000	205.8 ms	4.09 ms	5.31 ms	3776 B
ArrayFlatIndex	0.01	5000000	152.1 ms	0.30 ms	0.24 ms	328 B
RecalculateKeyEachTime	0.1	5000000	234.5 ms	0.33 ms	0.29 ms	328 B
DictionaryTryGetValue	0.1	5000000	222.7 ms	4.40 ms	10.28 ms	2640 B
DictionaryMarshalGetRef	0.1	5000000	232.8 ms	4.58 ms	10.34 ms	2640 B
FrozenDictionaryTryGet	0.1	5000000	200.2 ms	3.97 ms	5.16 ms	3776 B
FrozenDictionaryGetValueRef	0.1	5000000	206.1 ms	3.91 ms	3.65 ms	3776 B
ArrayFlatIndex	0.1	5000000	157.4 ms	0.36 ms	0.30 ms	328 B
RecalculateKeyEachTime	0.9	5000000	217.6 ms	1.04 ms	0.97 ms	328 B
DictionaryTryGetValue	0.9	5000000	235.0 ms	4.60 ms	4.73 ms	2640 B
DictionaryMarshalGetRef	0.9	5000000	232.7 ms	4.55 ms	5.24 ms	2640 B
FrozenDictionaryTryGet	0.9	5000000	176.7 ms	2.12 ms	1.98 ms	3776 B
FrozenDictionaryGetValueRef	0.9	5000000	181.6 ms	3.56 ms	3.33 ms	3776 B
ArrayFlatIndex	0.9	5000000	148.4 ms	0.13 ms	0.11 ms	328 B

The benchmark code I added is

[Benchmark]
    public void ArrayFlatIndex()
    {
        var tasks = new Task[ExtraReaderThreads];

        void TryRead(int key)
        {
            var cached = sourceDataArray[key];

            _ = cachedWorldPosition + cached;
        }

        for (int task = 0; task < ExtraReaderThreads; ++task)
        {
            tasks[task] = Task.Run(() =>
            {
                var random = new Random(RandomSeed);
                int xRead = 0;
                int yRead = 0;
                int playerXRead = 0;
                int playerYRead = 0;

                for (int i = 0; i < Reads; ++i)
                {
                    int key = GetRandomKeyArray(random);
                    if (random.NextDouble() < NonSequentialReads)
                    {
                        TryRead(key);
                    }
                    else
                    {
                        int index = GetIndexKeyArray(xRead, yRead, playerXRead, playerYRead);
                        TryRead(index);
                        IncrementRead(ref xRead, ref yRead, ref playerXRead, ref playerYRead);
                    }
                }
            });
        }

        {
            var random = new Random(RandomSeed);
            int xRead = 0;
            int yRead = 0;
            int playerXRead = 0;
            int playerYRead = 0;

            for (int i = 0; i < Reads; ++i)
            {
                int key = GetRandomKeyArray(random);
                if (random.NextDouble() < NonSequentialReads)
                {
                    TryRead(key);
                }
                else
                {
                    int index = GetIndexKeyArray(xRead, yRead, playerXRead, playerYRead);
                    TryRead(index);
                    IncrementRead(ref xRead, ref yRead, ref playerXRead, ref playerYRead);
                }
            }
        }

        Task.WaitAll(tasks);
    }

    private static int GetRandomKeyArray(Random random)
    {
        var key = random.Next(PlaneOffsets.Length) << 6 | random.Next(PlaneOffsets.Length) << 4 |
            random.Next(PlayerPositions.Length) << 2 | random.Next(PlayerPositions.Length);
        return key;
    }

    private static int GetIndexKeyArray(int xRead, int yRead, int playerXRead, int playerYRead)
    {
        return (xRead << 6) | (yRead << 4) | (playerXRead << 2) | playerYRead;
    }

Where sourceDataArray is initialized as a 256 length array.

xfractalino · February 3, 2026, 4:19pm

Forgot about the precalculation

private static Vector2[] PrecalculateWorldShiftVectorsArray()
    {
        var shiftCache = new Vector2[256];
        int worldShift = CLOUD_SIZE / CLOUD_PLANE_SQUARES_PER_SIDE * cloudResolution;

        for (int xi = 0; xi < 3; xi++)
        {
            for (int yi = 0; yi < 3; yi++)
            {
                for (int pX = 0; pX < 3; pX++)
                {
                    for (int pY = 0; pY < 3; pY++)
                    {
                        int x0 = xi * 100;
                        int y0 = yi * 100;

                        int xShift = GetEdgeShift(x0, pX);
                        int yShift = GetEdgeShift(y0, pY);

                        var wholePlaneShift = new Vector2(
                            worldShift * ((4 - pX) % 3 - 1) - CLOUD_SIZE,
                            worldShift * ((4 - pY) % 3 - 1) - CLOUD_SIZE
                        );

                        var edgePlanesShift = new Vector2(xShift * worldShift, yShift * worldShift);
                        int index = (xi << 6) | (yi << 4) | (pX << 2) | pY;

                        shiftCache[index] = wholePlaneShift + edgePlanesShift;
                    }
                }
            }
        }

        return shiftCache;
    }

hhyyrylainen · February 3, 2026, 4:55pm

I thought about using a flat array, however I decided against it as the key-space was seemingly quite sparse (and would require like thousands of useless entries). Looks like it is not the case.

However, I do have one concern which is that if the key space has to be increased then the simple bit-packing approach you showed cannot be expanded (or at least I would lack the skill to converting that to morton order function). So while that approach would work right now, there’s a threat that it might require updating in the future and then it would be a lot harder to do the update for the code…

xfractalino · February 3, 2026, 5:22pm

In principle the general key for n²×m² such loop (where n is the range for x,y and m is the range for playerX and playerY) can be constructed as, if we define

a = ceil(log_2(m))
b = ceil(log_2(n))
key = x << 2*(a+b) | y << 2a+b | playerX << 2a | playerY << a

Anyways, this is not morton code but a simple bitpacking. True morton code requires n*m to be a power of two.

hhyyrylainen · February 4, 2026, 1:40pm

Okay, so I’m done testing now, and here’s the final numbers before I talk about them (I excluded the 0.9 random reads numbers as they tell just more of the same story).

Method	NonSequentialReads	ExtraReaderThreads	Mean	Error	StdDev	Allocated
RecalculateKeyEachTime	0	0	116.23 ms	0.394 ms	0.329 ms	328 B
DictionaryTryGetValue	0	0	98.61 ms	0.792 ms	0.702 ms	2640 B
DictionaryMarshalGetRef	0	0	98.91 ms	0.985 ms	0.873 ms	2640 B
FrozenDictionaryTryGet	0	0	94.52 ms	0.563 ms	0.499 ms	3776 B
FrozenDictionaryGetValueRef	0	0	93.80 ms	0.294 ms	0.261 ms	3776 B
FrozenDictionaryDirectRandomTry	0	0	97.59 ms	0.565 ms	0.529 ms	3776 B
FrozenDictionaryDirectRandomRef	0	0	94.59 ms	0.656 ms	0.613 ms	3776 B
FlatArray	0	0	118.41 ms	0.554 ms	0.491 ms	368 B
FlatArrayDirectRandom	0	0	84.24 ms	0.284 ms	0.266 ms	368 B
RecalculateKeyEachTime	0	1	119.72 ms	0.349 ms	0.310 ms	800 B
DictionaryTryGetValue	0	1	99.64 ms	0.871 ms	0.772 ms	3080 B
DictionaryMarshalGetRef	0	1	101.91 ms	1.209 ms	1.072 ms	3080 B
FrozenDictionaryTryGet	0	1	96.46 ms	1.054 ms	0.934 ms	4216 B
FrozenDictionaryGetValueRef	0	1	97.28 ms	1.606 ms	1.502 ms	4234 B
FrozenDictionaryDirectRandomTry	0	1	95.71 ms	0.304 ms	0.269 ms	4253 B
FrozenDictionaryDirectRandomRef	0	1	96.15 ms	0.832 ms	0.737 ms	4234 B
FlatArray	0	1	119.69 ms	0.301 ms	0.266 ms	808 B
FlatArrayDirectRandom	0	1	121.85 ms	0.703 ms	0.624 ms	869 B
RecalculateKeyEachTime	0	2	121.47 ms	0.590 ms	0.552 ms	1344 B
DictionaryTryGetValue	0	2	101.15 ms	0.391 ms	0.346 ms	3501 B
DictionaryMarshalGetRef	0	2	102.29 ms	1.011 ms	0.844 ms	3456 B
FrozenDictionaryTryGet	0	2	95.61 ms	0.600 ms	0.532 ms	4728 B
FrozenDictionaryGetValueRef	0	2	97.57 ms	1.045 ms	0.816 ms	4728 B
FrozenDictionaryDirectRandomTry	0	2	97.48 ms	0.712 ms	0.631 ms	4592 B
FrozenDictionaryDirectRandomRef	0	2	97.22 ms	0.493 ms	0.411 ms	4637 B
FlatArray	0	2	121.71 ms	0.350 ms	0.310 ms	1184 B
FlatArrayDirectRandom	0	2	121.51 ms	1.107 ms	1.035 ms	1320 B
RecalculateKeyEachTime	0.01	0	118.18 ms	0.513 ms	0.455 ms	328 B
DictionaryTryGetValue	0.01	0	99.17 ms	0.675 ms	0.632 ms	2640 B
DictionaryMarshalGetRef	0.01	0	100.84 ms	0.936 ms	0.875 ms	2640 B
FrozenDictionaryTryGet	0.01	0	93.94 ms	0.909 ms	0.850 ms	3776 B
FrozenDictionaryGetValueRef	0.01	0	94.52 ms	1.053 ms	0.934 ms	3776 B
FrozenDictionaryDirectRandomTry	0.01	0	94.08 ms	0.327 ms	0.273 ms	3776 B
FrozenDictionaryDirectRandomRef	0.01	0	94.38 ms	0.412 ms	0.365 ms	3776 B
FlatArray	0.01	0	119.39 ms	0.615 ms	0.545 ms	368 B
FlatArrayDirectRandom	0.01	0	84.69 ms	0.294 ms	0.245 ms	368 B
RecalculateKeyEachTime	0.01	1	124.17 ms	0.615 ms	0.575 ms	768 B
DictionaryTryGetValue	0.01	1	100.22 ms	1.089 ms	0.966 ms	3080 B
DictionaryMarshalGetRef	0.01	1	103.16 ms	0.539 ms	0.478 ms	3080 B
FrozenDictionaryTryGet	0.01	1	98.96 ms	0.468 ms	0.438 ms	4344 B
FrozenDictionaryGetValueRef	0.01	1	96.06 ms	0.496 ms	0.464 ms	4216 B
FrozenDictionaryDirectRandomTry	0.01	1	95.90 ms	0.624 ms	0.553 ms	4253 B
FrozenDictionaryDirectRandomRef	0.01	1	98.90 ms	1.110 ms	0.984 ms	4237 B
FlatArray	0.01	1	124.48 ms	0.319 ms	0.282 ms	936 B
FlatArrayDirectRandom	0.01	1	124.15 ms	0.880 ms	0.824 ms	1195 B
RecalculateKeyEachTime	0.01	2	122.89 ms	0.546 ms	0.510 ms	1305 B
DictionaryTryGetValue	0.01	2	102.98 ms	0.787 ms	0.657 ms	3456 B
DictionaryMarshalGetRef	0.01	2	104.25 ms	0.595 ms	0.556 ms	3456 B
FrozenDictionaryTryGet	0.01	2	96.95 ms	0.612 ms	0.573 ms	4615 B
FrozenDictionaryGetValueRef	0.01	2	97.15 ms	1.223 ms	1.144 ms	4705 B
FrozenDictionaryDirectRandomTry	0.01	2	97.31 ms	0.526 ms	0.492 ms	4660 B
FrozenDictionaryDirectRandomRef	0.01	2	97.42 ms	0.431 ms	0.360 ms	4728 B
FlatArray	0.01	2	124.16 ms	0.528 ms	0.468 ms	1207 B
FlatArrayDirectRandom	0.01	2	124.14 ms	0.857 ms	0.760 ms	1320 B
RecalculateKeyEachTime	0.1	0	120.52 ms	0.645 ms	0.603 ms	328 B
DictionaryTryGetValue	0.1	0	109.20 ms	0.936 ms	0.782 ms	2640 B
DictionaryMarshalGetRef	0.1	0	109.37 ms	0.772 ms	0.644 ms	2640 B
FrozenDictionaryTryGet	0.1	0	98.26 ms	0.945 ms	0.838 ms	3776 B
FrozenDictionaryGetValueRef	0.1	0	100.45 ms	0.900 ms	0.798 ms	3776 B
FrozenDictionaryDirectRandomTry	0.1	0	97.39 ms	0.495 ms	0.439 ms	3776 B
FrozenDictionaryDirectRandomRef	0.1	0	98.52 ms	0.625 ms	0.554 ms	3776 B
FlatArray	0.1	0	121.64 ms	1.102 ms	1.031 ms	368 B
FlatArrayDirectRandom	0.1	0	90.21 ms	0.406 ms	0.360 ms	368 B
RecalculateKeyEachTime	0.1	1	123.12 ms	0.392 ms	0.367 ms	800 B
DictionaryTryGetValue	0.1	1	111.55 ms	0.500 ms	0.468 ms	3080 B
DictionaryMarshalGetRef	0.1	1	112.78 ms	0.935 ms	0.875 ms	3101 B
FrozenDictionaryTryGet	0.1	1	99.58 ms	0.519 ms	0.460 ms	4323 B
FrozenDictionaryGetValueRef	0.1	1	101.14 ms	0.474 ms	0.396 ms	4216 B
FrozenDictionaryDirectRandomTry	0.1	1	99.09 ms	0.366 ms	0.286 ms	4216 B
FrozenDictionaryDirectRandomRef	0.1	1	100.47 ms	0.530 ms	0.470 ms	4216 B
FlatArray	0.1	1	126.76 ms	0.519 ms	0.485 ms	1067 B
FlatArrayDirectRandom	0.1	1	123.01 ms	0.330 ms	0.275 ms	936 B
RecalculateKeyEachTime	0.1	2	126.19 ms	0.434 ms	0.363 ms	1321 B
DictionaryTryGetValue	0.1	2	113.35 ms	0.620 ms	0.550 ms	3479 B
DictionaryMarshalGetRef	0.1	2	113.39 ms	0.348 ms	0.326 ms	3456 B
FrozenDictionaryTryGet	0.1	2	101.27 ms	0.782 ms	0.693 ms	4705 B
FrozenDictionaryGetValueRef	0.1	2	100.92 ms	0.936 ms	0.781 ms	4637 B
FrozenDictionaryDirectRandomTry	0.1	2	101.67 ms	0.586 ms	0.520 ms	4637 B
FrozenDictionaryDirectRandomRef	0.1	2	101.19 ms	0.615 ms	0.513 ms	4615 B
FlatArray	0.1	2	128.57 ms	0.531 ms	0.470 ms	1320 B
FlatArrayDirectRandom	0.1	2	127.42 ms	0.682 ms	0.569 ms	1320 B

Okay so my final conclusion is that the FrozenDictionary TryGet is actually the fastest option, because when adding one extra reader thread (so 2 at once) the very specific runtime optimizations that the flat array code provided above hits become ineffective. The direct ref get from the dictionary sometimes wins but it is very close in performance to the other option and if I counted right the TryGet variant actually wins slightly more often.

What I noticed is that when I did my own flat array implementation it was actually the slowest option, so calculating the random read values directly is very fast when the runtime optimization hits it (likely due to the avoided array reads). But that optimization seems to only work with a single reader thread and all other tests the array code is very slow. As Thrive is meant to run multiple cloud absorbers /emitters at once for performance reasons, I choose to put more weight on the more reader thread numbers. And also it would be unclear whether in the game the the runtime optimizer would manage to do the same optimization as in the benchmark.

So FrozenDictionary is the fastest of the tested options that doesn’t really care if there’s 0, 1 or 2 extra threads competing for the reads. I’ve updated my PR and as I’ve basically now spent 2 days on this, I’m going to merge it once the CI checks pass.