Recently the compound cloud performance has been attempted to be improved. One of those was to introduce caching for cloud to world conversion. That requires 81 vectors in total to handle all combinations, and a dictionary was added to cache this. Then later it was brought up that a FrozenDictionary should be faster. So I decided to uset BenchmarkDotNet so that it would be clear what is faster. For fun I also added a benchmark that recalculated the world vector each time, and the results are a bit surprising.
Hereās the raw table output:
Method
NonSequentialReads
ExtraReaderThreads
Mean
Error
StdDev
Allocated
RecalculateKeyEachTime
0
0
26.93 ms
0.325 ms
0.271 ms
328 B
DictionaryTryGetValue
0
0
41.54 ms
0.380 ms
0.337 ms
2640 B
DictionaryMarshalGetRef
0
0
39.84 ms
0.241 ms
0.226 ms
2640 B
FrozenDictionaryTryGet
0
0
39.71 ms
0.256 ms
0.240 ms
3776 B
FrozenDictionaryGetValueRef
0
0
39.54 ms
0.324 ms
0.303 ms
3776 B
RecalculateKeyEachTime
0
1
26.61 ms
0.125 ms
0.105 ms
896 B
DictionaryTryGetValue
0
1
40.11 ms
0.187 ms
0.175 ms
3080 B
DictionaryMarshalGetRef
0
1
40.21 ms
0.228 ms
0.202 ms
3080 B
FrozenDictionaryTryGet
0
1
37.41 ms
0.206 ms
0.182 ms
4216 B
FrozenDictionaryGetValueRef
0
1
35.49 ms
0.135 ms
0.126 ms
4284 B
RecalculateKeyEachTime
0
2
27.19 ms
0.122 ms
0.115 ms
1216 B
DictionaryTryGetValue
0
2
42.54 ms
0.380 ms
0.356 ms
3456 B
DictionaryMarshalGetRef
0
2
42.06 ms
0.369 ms
0.345 ms
3456 B
FrozenDictionaryTryGet
0
2
37.56 ms
0.332 ms
0.310 ms
4602 B
FrozenDictionaryGetValueRef
0
2
38.64 ms
0.171 ms
0.143 ms
4592 B
RecalculateKeyEachTime
0.01
0
27.77 ms
0.252 ms
0.224 ms
328 B
DictionaryTryGetValue
0.01
0
42.06 ms
0.201 ms
0.178 ms
2640 B
DictionaryMarshalGetRef
0.01
0
42.04 ms
0.317 ms
0.281 ms
2640 B
FrozenDictionaryTryGet
0.01
0
37.06 ms
0.124 ms
0.104 ms
3776 B
FrozenDictionaryGetValueRef
0.01
0
41.23 ms
0.211 ms
0.187 ms
3776 B
RecalculateKeyEachTime
0.01
1
28.50 ms
0.160 ms
0.150 ms
772 B
DictionaryTryGetValue
0.01
1
44.47 ms
0.247 ms
0.219 ms
3080 B
DictionaryMarshalGetRef
0.01
1
45.20 ms
0.349 ms
0.326 ms
3080 B
FrozenDictionaryTryGet
0.01
1
37.46 ms
0.276 ms
0.258 ms
4216 B
FrozenDictionaryGetValueRef
0.01
1
39.89 ms
0.243 ms
0.216 ms
4216 B
RecalculateKeyEachTime
0.01
2
28.77 ms
0.153 ms
0.136 ms
1327 B
DictionaryTryGetValue
0.01
2
42.90 ms
0.322 ms
0.301 ms
3592 B
DictionaryMarshalGetRef
0.01
2
42.45 ms
0.227 ms
0.202 ms
3490 B
FrozenDictionaryTryGet
0.01
2
39.44 ms
0.074 ms
0.066 ms
4592 B
FrozenDictionaryGetValueRef
0.01
2
39.26 ms
0.504 ms
0.472 ms
4592 B
RecalculateKeyEachTime
0.1
0
41.39 ms
0.108 ms
0.096 ms
328 B
DictionaryTryGetValue
0.1
0
58.23 ms
0.312 ms
0.277 ms
2640 B
DictionaryMarshalGetRef
0.1
0
61.12 ms
0.295 ms
0.261 ms
2640 B
FrozenDictionaryTryGet
0.1
0
48.05 ms
0.149 ms
0.124 ms
3776 B
FrozenDictionaryGetValueRef
0.1
0
50.45 ms
0.492 ms
0.436 ms
3776 B
RecalculateKeyEachTime
0.1
1
41.51 ms
0.266 ms
0.236 ms
886 B
DictionaryTryGetValue
0.1
1
61.36 ms
1.043 ms
0.975 ms
3080 B
DictionaryMarshalGetRef
0.1
1
65.37 ms
0.867 ms
0.811 ms
3080 B
FrozenDictionaryTryGet
0.1
1
48.73 ms
0.419 ms
0.392 ms
4216 B
FrozenDictionaryGetValueRef
0.1
1
47.94 ms
0.130 ms
0.115 ms
4216 B
RecalculateKeyEachTime
0.1
2
42.11 ms
0.326 ms
0.289 ms
1234 B
DictionaryTryGetValue
0.1
2
66.24 ms
0.339 ms
0.300 ms
3456 B
DictionaryMarshalGetRef
0.1
2
60.65 ms
0.201 ms
0.168 ms
3456 B
FrozenDictionaryTryGet
0.1
2
48.95 ms
0.203 ms
0.180 ms
4604 B
FrozenDictionaryGetValueRef
0.1
2
49.50 ms
0.231 ms
0.181 ms
4592 B
RecalculateKeyEachTime
0.9
0
101.42 ms
0.276 ms
0.244 ms
328 B
DictionaryTryGetValue
0.9
0
130.20 ms
0.642 ms
0.569 ms
2640 B
DictionaryMarshalGetRef
0.9
0
130.23 ms
0.287 ms
0.254 ms
2640 B
FrozenDictionaryTryGet
0.9
0
86.01 ms
0.336 ms
0.315 ms
3776 B
FrozenDictionaryGetValueRef
0.9
0
86.36 ms
0.318 ms
0.282 ms
3776 B
RecalculateKeyEachTime
0.9
1
103.33 ms
0.147 ms
0.131 ms
768 B
DictionaryTryGetValue
0.9
1
132.43 ms
0.554 ms
0.462 ms
3144 B
DictionaryMarshalGetRef
0.9
1
134.23 ms
0.728 ms
0.608 ms
3080 B
FrozenDictionaryTryGet
0.9
1
87.85 ms
0.150 ms
0.117 ms
4307 B
FrozenDictionaryGetValueRef
0.9
1
88.98 ms
0.824 ms
0.730 ms
4234 B
RecalculateKeyEachTime
0.9
2
107.62 ms
0.572 ms
0.478 ms
1208 B
DictionaryTryGetValue
0.9
2
134.71 ms
1.245 ms
1.039 ms
3524 B
DictionaryMarshalGetRef
0.9
2
134.57 ms
1.533 ms
1.434 ms
3483 B
FrozenDictionaryTryGet
0.9
2
90.13 ms
0.958 ms
0.896 ms
4683 B
FrozenDictionaryGetValueRef
0.9
2
90.46 ms
1.590 ms
1.409 ms
4660 B
It is quite surprising that the recalculation is faster in many cases. However FrozenDictionary clearly wins over normal one. And interestingly enough when there is enough randomness in the reads, it wins over the precalculation approach.
So I guess the ultimate question is if most of our cloud calculations are actually sequential or more random? I would bet that we have quite a lot of sequential reads because cells absorb things in a circle around them thus leading to quite similar positions in a row.
Hereās my draft PR where I have the benchmark code:
Thatās pretty interesting.
It looks like direct computation is actually faster due to CPU caching for sequential access, as the number of different configurations for the calculations is small. If the access is random FrozenDictionary is better because it relies on precomputed keys, making the reading overhead almost zero.
Interestingly by making the random number generations always happen, that changes the results significantly. Now the frozen dictionary always wins, and even over the recalculation each time:
Method
NonSequentialReads
ExtraReaderThreads
Mean
Error
StdDev
RecalculateKeyEachTime
0
0
117.79 ms
0.386 ms
0.361 ms
DictionaryTryGetValue
0
0
98.94 ms
0.738 ms
0.654 ms
DictionaryMarshalGetRef
0
0
98.40 ms
0.396 ms
0.331 ms
FrozenDictionaryTryGet
0
0
93.57 ms
0.515 ms
0.457 ms
FrozenDictionaryGetValueRef
0
0
91.02 ms
0.566 ms
0.502 ms
RecalculateKeyEachTime
0
1
117.64 ms
0.383 ms
0.339 ms
DictionaryTryGetValue
0
1
104.39 ms
0.662 ms
0.587 ms
DictionaryMarshalGetRef
0
1
100.53 ms
0.553 ms
0.491 ms
FrozenDictionaryTryGet
0
1
95.67 ms
0.486 ms
0.431 ms
FrozenDictionaryGetValueRef
0
1
94.42 ms
0.391 ms
0.365 ms
RecalculateKeyEachTime
0
2
122.23 ms
0.586 ms
0.519 ms
DictionaryTryGetValue
0
2
105.33 ms
0.777 ms
0.689 ms
DictionaryMarshalGetRef
0
2
99.32 ms
0.666 ms
0.556 ms
FrozenDictionaryTryGet
0
2
96.77 ms
1.037 ms
0.919 ms
FrozenDictionaryGetValueRef
0
2
110.77 ms
2.179 ms
2.332 ms
RecalculateKeyEachTime
0.01
0
121.49 ms
0.420 ms
0.372 ms
DictionaryTryGetValue
0.01
0
99.49 ms
0.490 ms
0.459 ms
DictionaryMarshalGetRef
0.01
0
100.45 ms
0.893 ms
0.836 ms
FrozenDictionaryTryGet
0.01
0
91.49 ms
0.734 ms
0.686 ms
FrozenDictionaryGetValueRef
0.01
0
96.39 ms
0.334 ms
0.296 ms
RecalculateKeyEachTime
0.01
1
121.24 ms
0.862 ms
0.806 ms
DictionaryTryGetValue
0.01
1
103.98 ms
0.785 ms
0.656 ms
DictionaryMarshalGetRef
0.01
1
103.60 ms
0.593 ms
0.555 ms
FrozenDictionaryTryGet
0.01
1
95.69 ms
0.452 ms
0.423 ms
FrozenDictionaryGetValueRef
0.01
1
95.45 ms
0.364 ms
0.340 ms
RecalculateKeyEachTime
0.01
2
124.16 ms
0.649 ms
0.575 ms
DictionaryTryGetValue
0.01
2
105.92 ms
0.677 ms
0.565 ms
DictionaryMarshalGetRef
0.01
2
104.74 ms
0.760 ms
0.673 ms
FrozenDictionaryTryGet
0.01
2
97.99 ms
1.604 ms
1.500 ms
FrozenDictionaryGetValueRef
0.01
2
100.18 ms
0.588 ms
0.521 ms
RecalculateKeyEachTime
0.1
0
117.61 ms
0.272 ms
0.241 ms
DictionaryTryGetValue
0.1
0
107.77 ms
0.647 ms
0.573 ms
DictionaryMarshalGetRef
0.1
0
108.34 ms
0.449 ms
0.398 ms
FrozenDictionaryTryGet
0.1
0
100.19 ms
0.946 ms
0.838 ms
FrozenDictionaryGetValueRef
0.1
0
96.80 ms
0.695 ms
0.650 ms
RecalculateKeyEachTime
0.1
1
128.40 ms
0.387 ms
0.362 ms
DictionaryTryGetValue
0.1
1
112.01 ms
0.441 ms
0.412 ms
DictionaryMarshalGetRef
0.1
1
112.70 ms
0.414 ms
0.387 ms
FrozenDictionaryTryGet
0.1
1
101.37 ms
2.010 ms
2.064 ms
FrozenDictionaryGetValueRef
0.1
1
99.01 ms
0.262 ms
0.204 ms
RecalculateKeyEachTime
0.1
2
124.14 ms
0.396 ms
0.330 ms
DictionaryTryGetValue
0.1
2
114.56 ms
1.038 ms
0.920 ms
DictionaryMarshalGetRef
0.1
2
113.30 ms
0.721 ms
0.674 ms
FrozenDictionaryTryGet
0.1
2
98.69 ms
0.393 ms
0.328 ms
FrozenDictionaryGetValueRef
0.1
2
101.33 ms
0.510 ms
0.478 ms
RecalculateKeyEachTime
0.9
0
110.35 ms
0.281 ms
0.235 ms
DictionaryTryGetValue
0.9
0
127.40 ms
0.710 ms
0.664 ms
DictionaryMarshalGetRef
0.9
0
127.97 ms
1.092 ms
0.968 ms
FrozenDictionaryTryGet
0.9
0
93.13 ms
0.452 ms
0.377 ms
FrozenDictionaryGetValueRef
0.9
0
94.26 ms
0.779 ms
0.729 ms
RecalculateKeyEachTime
0.9
1
111.31 ms
0.255 ms
0.239 ms
DictionaryTryGetValue
0.9
1
130.53 ms
0.596 ms
0.497 ms
DictionaryMarshalGetRef
0.9
1
131.03 ms
0.776 ms
0.688 ms
FrozenDictionaryTryGet
0.9
1
93.97 ms
0.354 ms
0.314 ms
FrozenDictionaryGetValueRef
0.9
1
95.32 ms
0.436 ms
0.408 ms
RecalculateKeyEachTime
0.9
2
117.28 ms
0.354 ms
0.313 ms
DictionaryTryGetValue
0.9
2
132.47 ms
0.754 ms
0.630 ms
DictionaryMarshalGetRef
0.9
2
133.18 ms
0.490 ms
0.383 ms
FrozenDictionaryTryGet
0.9
2
96.27 ms
0.688 ms
0.574 ms
FrozenDictionaryGetValueRef
0.9
2
96.10 ms
0.312 ms
0.277 ms
Iāll try profiling the game itself a bit to see if Thrive itself is closer to the situation with not much other happening in the loop with the linear scan or with a bunch of interleaved work / random access happening. But Iāll at least change the game to use a frozen dictionary when using a dictionary for the cloud processing.
Edit: looks like the game itself is closer to the more complex scenario where FrozenDictionary wins over recalculation so Iāll update my PR accordingly, which is now also ready for review.
I thought about using a flat array, however I decided against it as the key-space was seemingly quite sparse (and would require like thousands of useless entries). Looks like it is not the case.
However, I do have one concern which is that if the key space has to be increased then the simple bit-packing approach you showed cannot be expanded (or at least I would lack the skill to converting that to morton order function). So while that approach would work right now, thereās a threat that it might require updating in the future and then it would be a lot harder to do the update for the codeā¦
In principle the general key for n²Ćm² such loop (where n is the range for x,y and m is the range for playerX and playerY) can be constructed as, if we define
a = ceil(log_2(m))
b = ceil(log_2(n))
key = x << 2*(a+b) | y << 2a+b | playerX << 2a | playerY << a
Anyways, this is not morton code but a simple bitpacking. True morton code requires n*m to be a power of two.
Okay, so Iām done testing now, and hereās the final numbers before I talk about them (I excluded the 0.9 random reads numbers as they tell just more of the same story).
Method
NonSequentialReads
ExtraReaderThreads
Mean
Error
StdDev
Allocated
RecalculateKeyEachTime
0
0
116.23 ms
0.394 ms
0.329 ms
328 B
DictionaryTryGetValue
0
0
98.61 ms
0.792 ms
0.702 ms
2640 B
DictionaryMarshalGetRef
0
0
98.91 ms
0.985 ms
0.873 ms
2640 B
FrozenDictionaryTryGet
0
0
94.52 ms
0.563 ms
0.499 ms
3776 B
FrozenDictionaryGetValueRef
0
0
93.80 ms
0.294 ms
0.261 ms
3776 B
FrozenDictionaryDirectRandomTry
0
0
97.59 ms
0.565 ms
0.529 ms
3776 B
FrozenDictionaryDirectRandomRef
0
0
94.59 ms
0.656 ms
0.613 ms
3776 B
FlatArray
0
0
118.41 ms
0.554 ms
0.491 ms
368 B
FlatArrayDirectRandom
0
0
84.24 ms
0.284 ms
0.266 ms
368 B
RecalculateKeyEachTime
0
1
119.72 ms
0.349 ms
0.310 ms
800 B
DictionaryTryGetValue
0
1
99.64 ms
0.871 ms
0.772 ms
3080 B
DictionaryMarshalGetRef
0
1
101.91 ms
1.209 ms
1.072 ms
3080 B
FrozenDictionaryTryGet
0
1
96.46 ms
1.054 ms
0.934 ms
4216 B
FrozenDictionaryGetValueRef
0
1
97.28 ms
1.606 ms
1.502 ms
4234 B
FrozenDictionaryDirectRandomTry
0
1
95.71 ms
0.304 ms
0.269 ms
4253 B
FrozenDictionaryDirectRandomRef
0
1
96.15 ms
0.832 ms
0.737 ms
4234 B
FlatArray
0
1
119.69 ms
0.301 ms
0.266 ms
808 B
FlatArrayDirectRandom
0
1
121.85 ms
0.703 ms
0.624 ms
869 B
RecalculateKeyEachTime
0
2
121.47 ms
0.590 ms
0.552 ms
1344 B
DictionaryTryGetValue
0
2
101.15 ms
0.391 ms
0.346 ms
3501 B
DictionaryMarshalGetRef
0
2
102.29 ms
1.011 ms
0.844 ms
3456 B
FrozenDictionaryTryGet
0
2
95.61 ms
0.600 ms
0.532 ms
4728 B
FrozenDictionaryGetValueRef
0
2
97.57 ms
1.045 ms
0.816 ms
4728 B
FrozenDictionaryDirectRandomTry
0
2
97.48 ms
0.712 ms
0.631 ms
4592 B
FrozenDictionaryDirectRandomRef
0
2
97.22 ms
0.493 ms
0.411 ms
4637 B
FlatArray
0
2
121.71 ms
0.350 ms
0.310 ms
1184 B
FlatArrayDirectRandom
0
2
121.51 ms
1.107 ms
1.035 ms
1320 B
RecalculateKeyEachTime
0.01
0
118.18 ms
0.513 ms
0.455 ms
328 B
DictionaryTryGetValue
0.01
0
99.17 ms
0.675 ms
0.632 ms
2640 B
DictionaryMarshalGetRef
0.01
0
100.84 ms
0.936 ms
0.875 ms
2640 B
FrozenDictionaryTryGet
0.01
0
93.94 ms
0.909 ms
0.850 ms
3776 B
FrozenDictionaryGetValueRef
0.01
0
94.52 ms
1.053 ms
0.934 ms
3776 B
FrozenDictionaryDirectRandomTry
0.01
0
94.08 ms
0.327 ms
0.273 ms
3776 B
FrozenDictionaryDirectRandomRef
0.01
0
94.38 ms
0.412 ms
0.365 ms
3776 B
FlatArray
0.01
0
119.39 ms
0.615 ms
0.545 ms
368 B
FlatArrayDirectRandom
0.01
0
84.69 ms
0.294 ms
0.245 ms
368 B
RecalculateKeyEachTime
0.01
1
124.17 ms
0.615 ms
0.575 ms
768 B
DictionaryTryGetValue
0.01
1
100.22 ms
1.089 ms
0.966 ms
3080 B
DictionaryMarshalGetRef
0.01
1
103.16 ms
0.539 ms
0.478 ms
3080 B
FrozenDictionaryTryGet
0.01
1
98.96 ms
0.468 ms
0.438 ms
4344 B
FrozenDictionaryGetValueRef
0.01
1
96.06 ms
0.496 ms
0.464 ms
4216 B
FrozenDictionaryDirectRandomTry
0.01
1
95.90 ms
0.624 ms
0.553 ms
4253 B
FrozenDictionaryDirectRandomRef
0.01
1
98.90 ms
1.110 ms
0.984 ms
4237 B
FlatArray
0.01
1
124.48 ms
0.319 ms
0.282 ms
936 B
FlatArrayDirectRandom
0.01
1
124.15 ms
0.880 ms
0.824 ms
1195 B
RecalculateKeyEachTime
0.01
2
122.89 ms
0.546 ms
0.510 ms
1305 B
DictionaryTryGetValue
0.01
2
102.98 ms
0.787 ms
0.657 ms
3456 B
DictionaryMarshalGetRef
0.01
2
104.25 ms
0.595 ms
0.556 ms
3456 B
FrozenDictionaryTryGet
0.01
2
96.95 ms
0.612 ms
0.573 ms
4615 B
FrozenDictionaryGetValueRef
0.01
2
97.15 ms
1.223 ms
1.144 ms
4705 B
FrozenDictionaryDirectRandomTry
0.01
2
97.31 ms
0.526 ms
0.492 ms
4660 B
FrozenDictionaryDirectRandomRef
0.01
2
97.42 ms
0.431 ms
0.360 ms
4728 B
FlatArray
0.01
2
124.16 ms
0.528 ms
0.468 ms
1207 B
FlatArrayDirectRandom
0.01
2
124.14 ms
0.857 ms
0.760 ms
1320 B
RecalculateKeyEachTime
0.1
0
120.52 ms
0.645 ms
0.603 ms
328 B
DictionaryTryGetValue
0.1
0
109.20 ms
0.936 ms
0.782 ms
2640 B
DictionaryMarshalGetRef
0.1
0
109.37 ms
0.772 ms
0.644 ms
2640 B
FrozenDictionaryTryGet
0.1
0
98.26 ms
0.945 ms
0.838 ms
3776 B
FrozenDictionaryGetValueRef
0.1
0
100.45 ms
0.900 ms
0.798 ms
3776 B
FrozenDictionaryDirectRandomTry
0.1
0
97.39 ms
0.495 ms
0.439 ms
3776 B
FrozenDictionaryDirectRandomRef
0.1
0
98.52 ms
0.625 ms
0.554 ms
3776 B
FlatArray
0.1
0
121.64 ms
1.102 ms
1.031 ms
368 B
FlatArrayDirectRandom
0.1
0
90.21 ms
0.406 ms
0.360 ms
368 B
RecalculateKeyEachTime
0.1
1
123.12 ms
0.392 ms
0.367 ms
800 B
DictionaryTryGetValue
0.1
1
111.55 ms
0.500 ms
0.468 ms
3080 B
DictionaryMarshalGetRef
0.1
1
112.78 ms
0.935 ms
0.875 ms
3101 B
FrozenDictionaryTryGet
0.1
1
99.58 ms
0.519 ms
0.460 ms
4323 B
FrozenDictionaryGetValueRef
0.1
1
101.14 ms
0.474 ms
0.396 ms
4216 B
FrozenDictionaryDirectRandomTry
0.1
1
99.09 ms
0.366 ms
0.286 ms
4216 B
FrozenDictionaryDirectRandomRef
0.1
1
100.47 ms
0.530 ms
0.470 ms
4216 B
FlatArray
0.1
1
126.76 ms
0.519 ms
0.485 ms
1067 B
FlatArrayDirectRandom
0.1
1
123.01 ms
0.330 ms
0.275 ms
936 B
RecalculateKeyEachTime
0.1
2
126.19 ms
0.434 ms
0.363 ms
1321 B
DictionaryTryGetValue
0.1
2
113.35 ms
0.620 ms
0.550 ms
3479 B
DictionaryMarshalGetRef
0.1
2
113.39 ms
0.348 ms
0.326 ms
3456 B
FrozenDictionaryTryGet
0.1
2
101.27 ms
0.782 ms
0.693 ms
4705 B
FrozenDictionaryGetValueRef
0.1
2
100.92 ms
0.936 ms
0.781 ms
4637 B
FrozenDictionaryDirectRandomTry
0.1
2
101.67 ms
0.586 ms
0.520 ms
4637 B
FrozenDictionaryDirectRandomRef
0.1
2
101.19 ms
0.615 ms
0.513 ms
4615 B
FlatArray
0.1
2
128.57 ms
0.531 ms
0.470 ms
1320 B
FlatArrayDirectRandom
0.1
2
127.42 ms
0.682 ms
0.569 ms
1320 B
Okay so my final conclusion is that the FrozenDictionary TryGet is actually the fastest option, because when adding one extra reader thread (so 2 at once) the very specific runtime optimizations that the flat array code provided above hits become ineffective. The direct ref get from the dictionary sometimes wins but it is very close in performance to the other option and if I counted right the TryGet variant actually wins slightly more often.
What I noticed is that when I did my own flat array implementation it was actually the slowest option, so calculating the random read values directly is very fast when the runtime optimization hits it (likely due to the avoided array reads). But that optimization seems to only work with a single reader thread and all other tests the array code is very slow. As Thrive is meant to run multiple cloud absorbers /emitters at once for performance reasons, I choose to put more weight on the more reader thread numbers. And also it would be unclear whether in the game the the runtime optimizer would manage to do the same optimization as in the benchmark.
So FrozenDictionary is the fastest of the tested options that doesnāt really care if thereās 0, 1 or 2 extra threads competing for the reads. Iāve updated my PR and as Iāve basically now spent 2 days on this, Iām going to merge it once the CI checks pass.