Increasing game performance

We now have an entity cap for how much stuff can spawn in general, and it can be cranked way down. Spawning depends on population so we should still get a somewhat good representation even with the tiniest limit of all species in the patch. I see limiting the number of species as a cruder alternative as their populations could get very high and spawn a ton of stuff if there is no global limit. And we already basically have species limiting per patch as in auto-evo adds huge penalties to species to keep the count from going above a configured limit (I think it’s maybe 10 or 12 currently).

Another spitball idea I had related to this:

It’s clear that bigger cells cause more performance issues, purely because they have more going on (more nodes, more compound processing, etc.). Currently though big cells count for exactly the same as small cells as far as the entity limit is concerned. Could we have some kind of weighting so that the bigger the cell, the more “entities” it represents? Obviously we’d have to change the nomenclature, and this will have severe gameplay implications since fewer big cells will spawn. I can also forsee a few other issues, such as the game refusing to spawn any cells at all if the entity cap is set at tiny but all possible species are too big. But I think it’s worth an investigation at least to see how it affects performance.

That’s certainly possible. Calling the option still “entity limit” makes sense as different kind of entities naturally have different performance impacts, so not having a direct 1-to-1 impact on some specific entity type is not a problem.

There’s already an issue open about making cells in colonies count differently (though I’m not a bit unsure if it should be a discount or actually a penalty):

If I remember right the spawn system, spawns stuff if the limit is not reached, so a single spawn is allowed to go over the limit, but then it should prevent entities from spawning (with the caveat that there currently seems to be a bug with this).

The entity limit seems to be the most critical to performance, due to this I’m confident we should also add this option to the new game settings up front. An accompanying tooltip or text should then let the player know about its benefit.

I don’t think it should be part of the new game settings, since it’s nothing to do with the world of a specific game and is instead relevant across multiple games, and can even be changed partway through a game. But I agree it needs to be more prominent at the start of a game. Maybe a pop up when opening the main menu for the first time prompting the player to visit the options menu to set it?

It’s just a matter of discoverability and I feel it’s is the most intuitive. This will be like those options in games where you can set them in the new game setup but also tells you that they can be changed later whenever. Having a pop up in the main menu can probably be jarring in my opinion.

1 Like

I disabled organelle graphics. It only gave me an extra 10 FPS in this save (by lowering the number of draw calls by about 800 per frame):

So while rendering a bunch of individual (transparent) organelles does impact performance, removing that entirely does not fix the performance.

So Godot needing to sort and issue draw calls for the individual organelles doesn’t seem to be our biggest performance problem.

One other potential thing people said is that we should try to do something about the fact that each individual organelle is a sphere collision that is added to the overall shape of a cell. Basically what we could do is generate a mesh collision from the organelles and use just 2 shapes one for general and one for pilus collisions (or maybe keeping pilus collisions as is would be easier and not impact performance that much).

But I think it does add up so if we were to apply different combination of optimizations: multimesh + frustum culling + entity limit + reducing collision shapes + etc, etc. That would potentially give 20 or 30+ FPS increase which honestly doesn’t seem bad.

Previously in my engulfment revamp PR I made it possible to form a convex collision shape from the shape of the membrane, I think the code is still in the membrane code so it probably can be useful for this.

2 Likes

I think it would be definitely worthwhile to combine a set of several optimizations if together they produce a significant improvement in framerate. We only need to keep performance good until the end of the Microbe Stage, after which metaballs will be able to handle performance much better.

Was also thinking… I know it would be a painful decision, but do we need the reproduction system as it currently stands? Where every organelle is one by one split until you reproduce? If it takes a lot of computational power to constantly recalculate the size of cells, couldn’t we switch over to a system where cells simply acquire the required nutrients, and then once they have enough they duplicate all their organelles at once? It would mean only one calculation at the end of a cell’s life cycle, instead of constant recalculations every time he absorbs more nutrients.

1 Like

It may not seem like much, but just 10 fps can make a world of a difference if you have particularly low framerates. Imagine running the game steadily at 20 fps; This change would bump you up to 30 or maybe more, making the game go from a slideshow to atleast bearable! 30 fps used to be industry standard you know.

As everyone else has said, every little bit can help.

Does that system even have an effect on gameplay? Everything should even out, except maybe cell speed.

It doesn’t take very long to simulate a new membrane shape currently. Only reason organelle splitting causes lag spikes is due to this issue:

Though, this’ll be an entirely different matter when we eventually get 3D membranes with much more expensive computations required.

Also there’s a membrane data caching feature now so each time a membrane is generated, the membrane is stored and can be loaded later. If we don’t care about extra memory use the membrane cache time can be increased to make all intermediate reproduction steps be found in the cache.

I suppose this is a fair point, but the reason why I’m chasing really large improvements is because these kind of small things will at most allow the game to have like 10 more cells in the game at once, so the performance gain that allows 80 cells instead of 70 seems actually pretty insignificant to me…

Yes. Currently the player has to survive on a partially duplicated cell. Switching to all duplication happening at once, basically needs to be made into a reproduction animation that is just played out instead of doing the current operations all at once (as that would cause big lag spikes if it all happens very quickly).

I was given a save to test multicellular performance. In it at the lowest I got around 8-15 FPS (with normal entity limit and lowering the thread count and increasing cloud simulation interval to get cleaner profiling data). One thing that most jumped out at me was that the membrane point generation took a bunch of time, and because we have the membrane cache feature, letting the game run for 30+ seconds already got my FPS up to like the 30s and I saw even 60+ FPS at few points.

Here’s some profiling screenshots (notice how Godot engine takes 80% of the time so our even most intensive parts of the code take up just a few percent of the processing time):

Left side shows how our microbe processing code takes the most time.

Here’s the membrane radius being the most expensive part of compound absorbing:

But here another part of the profiling results show that detecting if a compound is useful (and a pow call), take up a bunch of time:

So we might get a tiny bit more performance if we didn’t use the pow calculation on this line:

var fractionToTake = 1.0f - (float)Math.Pow(0.5f, delta / Constants.CLOUD_ABSORPTION_HALF_LIFE);

Another thing to try might be to limit cells to absorb and emit compounds only 30 times per second.


Here’s the reproduction expanded, so growing organelles is taking surprisingly long time (well I guess it is pretty sensible as the game needs to loop through a ton of organelles to check if they are growing):

Reproduction updates is already limited to 20 times per second, but perhaps an approach where the previously growing organelle could be stored would improve performance. I opened an issue to track work on this:


Regards to the new engulfment mechanic it probably also should have a max rate it progresses at (especially it looks like it’s pretty expensive to upgrade the shader parameters for all of the organelles):

Opened an issue:


And here’s the last screenshot:

What surprised me a bit is that playing a sound effect takes so long, so we probably should have some kind of distance based sound effect cooldown for non-player cells.


Again, I tested the disabling organelle rendering and it seems to maybe give double performance initially when the game is very laggy, but then after that it is much less.

Disabled graphics:

And enabled:


So there doesn’t seem that many easy performance gains, though one also pretty radical idea (on top of the organelle rendering: Investigate if cell (organelle) graphics can be rendered using MultiMesh · Issue #3709 · Revolutionary-Games/Thrive · GitHub) I got was that what if we limited cells to process only 20 to 30 times per second? That way most of these expensive things would happen less often, but we could still keep the physics process happening the way it currently is to hopefully keep the gameplay feel the same.

1 Like

It just occurred to me, have we ever thought of reducing the physics FPS to lessen the CPU load? We’re making a game that arguably does not require highly-accurate and fast physics interaction taking place majority of the time so I’m quite certain this could boost the performance a bit while not massively affecting gameplay.

As of right now, the value is set at 60 times per second which is the default, I’m thinking we can lower this to 50 TPS (or maybe even 30 if we’re feeling adventurous). The cons here is that it seems lower physics FPS may result in some stuttering which fortunately can be counteracted with physics interpolation that Godot comes prebuilt.

More reading: Physics Interpolation — Godot Engine (stable) documentation in English.

One big drawback is that as our player movement is tied to physics, reducing the physics simulation rate will directly lower the responsiveness of the game to player input.

Way back with Leviathan I actually had the game set to simulate only 20 updates per second and smoothly interpolating between the simulated updates (this was actually using the same code as interpolating network received data for locally generated updates). I couldn’t really tell that anything was wrong and the game was perfectly fine for me, but Oliver and many other people complained about the really laggy feeling that made the game even unplayable for them. After that I fixed the problem by making the game simulate as many updates per second as it could (though there was still a fixed physics maximum rate which I set to 60 or maybe 75, can’t remember exactly).

Something we could do if we implement our custom logic is to make it so that physics simulation starts just when we have sent Godot data to be rendered, so we could probably entirely run the physics “for free” by running them while rendering is happening as the physics would probably be ready by the time the game has rendered a frame and is ready to simulate the next update.

Assuming this is due to the too low update rate, input delays might not even be noticeable at 30+ times per second update rate.

I think I’ll open a test PR sometime in the future.

I guess that might be the case, after all 30 Hz is 50% more than 20 Hz update rate…
You’ll definitely want someone to test who found 20 updates per second unacceptably laggy.

I’m currently working on a native code module for Thrive that includes an integration to the Jolt physics engine. As preliminary work I did a specific benchmark scene to validate that it is a good idea.

Here’s how that looks:

And here are the test results (note that the scene setup / rendering performance has been a bit problematic so these initial results are with just 64 microbe placeholder physics bodies at once. I plan on trying bigger tests next week):

Jolt single convex shape per microbe (UPDATE: may actually be the spheres case):

Physics time: 0.006744191 Physics FPS limit: 148.2758, FPS: 1
Physics time: 0.004203023 Physics FPS limit: 237.924, FPS: 30
Physics time: 0.002538978 Physics FPS limit: 393.8593, FPS: 30
Physics time: 0.001526676 Physics FPS limit: 655.018, FPS: 308
Physics time: 0.0006721547 Physics FPS limit: 1487.753, FPS: 361
Physics time: 0.0003296497 Physics FPS limit: 3033.523, FPS: 361
Physics time: 0.0001993665 Physics FPS limit: 5015.888, FPS: 360
Physics time: 0.0001796981 Physics FPS limit: 5564.891, FPS: 360
Physics time: 0.0001652893 Physics FPS limit: 6049.998, FPS: 360
Physics time: 0.0001341461 Physics FPS limit: 7454.556, FPS: 360
Physics time: 0.0001477947 Physics FPS limit: 6766.141, FPS: 360

Jolt combined shape from spheres:

Physics time: 0.0008753056 Physics FPS limit: 1142.458, FPS: 1
Physics time: 0.001244984 Physics FPS limit: 803.2229, FPS: 75
Physics time: 0.001221391 Physics FPS limit: 818.7385, FPS: 75
Physics time: 0.001119852 Physics FPS limit: 892.9752, FPS: 329
Physics time: 0.001057841 Physics FPS limit: 945.3221, FPS: 329
Physics time: 0.00107415 Physics FPS limit: 930.9684, FPS: 316
Physics time: 0.000895411 Physics FPS limit: 1116.806, FPS: 313
Physics time: 0.0009229564 Physics FPS limit: 1083.475, FPS: 313
Physics time: 0.0008093275 Physics FPS limit: 1235.594, FPS: 326
Physics time: 0.0006260973 Physics FPS limit: 1597.196, FPS: 326
Physics time: 0.0005499413 Physics FPS limit: 1818.376, FPS: 355
Physics time: 0.0005800998 Physics FPS limit: 1723.841, FPS: 355
Physics time: 0.0004281088 Physics FPS limit: 2335.855, FPS: 360
Physics time: 0.0005050105 Physics FPS limit: 1980.157, FPS: 360

Godot physics (Bullet) convex shape:

Physics time: 0.003198 Physics FPS limit: 312.6954, FPS: 1
Physics time: 0.004069 Physics FPS limit: 245.7606, FPS: 121
Physics time: 0.004069 Physics FPS limit: 245.7606, FPS: 121
Physics time: 0.003704 Physics FPS limit: 269.9784, FPS: 328
Physics time: 0.003704 Physics FPS limit: 269.9784, FPS: 328
Physics time: 0.002906 Physics FPS limit: 344.1156, FPS: 343
Physics time: 0.002906 Physics FPS limit: 344.1156, FPS: 343
Physics time: 0.001898 Physics FPS limit: 526.8704, FPS: 361
Physics time: 0.001898 Physics FPS limit: 526.8704, FPS: 361
Physics time: 0.001539 Physics FPS limit: 649.7726, FPS: 360
Physics time: 0.001539 Physics FPS limit: 649.7726, FPS: 360
Physics time: 0.001116 Physics FPS limit: 896.0574, FPS: 360
Physics time: 0.001116 Physics FPS limit: 896.0574, FPS: 360
Physics time: 0.001397 Physics FPS limit: 715.8196, FPS: 323
Physics time: 0.001397 Physics FPS limit: 715.8196, FPS: 323
Physics time: 0.000698 Physics FPS limit: 1432.665, FPS: 360
Physics time: 0.000698 Physics FPS limit: 1432.665, FPS: 360
Physics time: 0.001458 Physics FPS limit: 685.871, FPS: 360
Physics time: 0.000517 Physics FPS limit: 1934.236, FPS: 360
Physics time: 0.000517 Physics FPS limit: 1934.236, FPS: 360
Physics time: 0.000629 Physics FPS limit: 1589.825, FPS: 360
Physics time: 0.000629 Physics FPS limit: 1589.825, FPS: 360
Physics time: 0.001151 Physics FPS limit: 868.8097, FPS: 360
Physics time: 0.001151 Physics FPS limit: 868.8097, FPS: 360
Physics time: 0.000678 Physics FPS limit: 1474.926, FPS: 360
Physics time: 0.000678 Physics FPS limit: 1474.926, FPS: 360
Physics time: 0.000968 Physics FPS limit: 1033.058, FPS: 361
Physics time: 0.000968 Physics FPS limit: 1033.058, FPS: 361
Physics time: 0.001221 Physics FPS limit: 819.0009, FPS: 360
Physics time: 0.001221 Physics FPS limit: 819.0009, FPS: 360
Physics time: 0.000909 Physics FPS limit: 1100.11, FPS: 360
Physics time: 0.000909 Physics FPS limit: 1100.11, FPS: 360
Physics time: 0.000792 Physics FPS limit: 1262.626, FPS: 360
Physics time: 0.000792 Physics FPS limit: 1262.626, FPS: 360
Physics time: 0.001548 Physics FPS limit: 645.9948, FPS: 360
Physics time: 0.001548 Physics FPS limit: 645.9948, FPS: 360
Physics time: 0.000693 Physics FPS limit: 1443.001, FPS: 360
Physics time: 0.000693 Physics FPS limit: 1443.001, FPS: 360
Physics time: 0.000644 Physics FPS limit: 1552.795, FPS: 360
Physics time: 0.000644 Physics FPS limit: 1552.795, FPS: 360
Physics time: 0.000998 Physics FPS limit: 1002.004, FPS: 360

Godot physics (Bullet) combined spheres (currently the approach used in the game):

Physics time: 0.011677 Physics FPS limit: 85.63844, FPS: 1
Physics time: 0.006466 Physics FPS limit: 154.6551, FPS: 29
Physics time: 0.006466 Physics FPS limit: 154.6551, FPS: 29
Physics time: 0.00437 Physics FPS limit: 228.8329, FPS: 353
Physics time: 0.000726 Physics FPS limit: 1377.411, FPS: 360
Physics time: 0.000726 Physics FPS limit: 1377.411, FPS: 360
Physics time: 0.000945 Physics FPS limit: 1058.201, FPS: 360
Physics time: 0.000945 Physics FPS limit: 1058.201, FPS: 360
Physics time: 0.000448 Physics FPS limit: 2232.143, FPS: 360
Physics time: 0.000448 Physics FPS limit: 2232.143, FPS: 360
Physics time: 0.000415 Physics FPS limit: 2409.639, FPS: 360
Physics time: 0.000415 Physics FPS limit: 2409.639, FPS: 360
Physics time: 0.000362 Physics FPS limit: 2762.431, FPS: 360
Physics time: 0.000362 Physics FPS limit: 2762.431, FPS: 360
Physics time: 0.000373 Physics FPS limit: 2680.965, FPS: 360
Physics time: 0.000373 Physics FPS limit: 2680.965, FPS: 360
Physics time: 0.000396 Physics FPS limit: 2525.253, FPS: 361
Physics time: 0.000396 Physics FPS limit: 2525.253, FPS: 361
Physics time: 0.001432 Physics FPS limit: 698.324, FPS: 360
Physics time: 0.001432 Physics FPS limit: 698.324, FPS: 360
Physics time: 0.000391 Physics FPS limit: 2557.545, FPS: 360
Physics time: 0.000391 Physics FPS limit: 2557.545, FPS: 360
Physics time: 0.000474 Physics FPS limit: 2109.705, FPS: 360
Physics time: 0.000474 Physics FPS limit: 2109.705, FPS: 360
Physics time: 0.001289 Physics FPS limit: 775.7952, FPS: 360
Physics time: 0.001289 Physics FPS limit: 775.7952, FPS: 360
Physics time: 0.000606 Physics FPS limit: 1650.165, FPS: 360
Physics time: 0.000606 Physics FPS limit: 1650.165, FPS: 360
Physics time: 0.000629 Physics FPS limit: 1589.825, FPS: 360
Physics time: 0.000629 Physics FPS limit: 1589.825, FPS: 360
Physics time: 0.000572 Physics FPS limit: 1748.252, FPS: 360
Physics time: 0.000572 Physics FPS limit: 1748.252, FPS: 360
Physics time: 0.000557 Physics FPS limit: 1795.332, FPS: 360
Physics time: 0.000557 Physics FPS limit: 1795.332, FPS: 360
Physics time: 0.000478 Physics FPS limit: 2092.05, FPS: 360
Physics time: 0.000478 Physics FPS limit: 2092.05, FPS: 360
Physics time: 0.000465 Physics FPS limit: 2150.538, FPS: 361
Physics time: 0.000465 Physics FPS limit: 2150.538, FPS: 361
Physics time: 0.000934 Physics FPS limit: 1070.664, FPS: 360
Physics time: 0.000934 Physics FPS limit: 1070.664, FPS: 360
Physics time: 0.00044 Physics FPS limit: 2272.727, FPS: 360
Physics time: 0.00044 Physics FPS limit: 2272.727, FPS: 360
Physics time: 0.000498 Physics FPS limit: 2008.032, FPS: 360
Physics time: 0.000498 Physics FPS limit: 2008.032, FPS: 360
Physics time: 0.005362 Physics FPS limit: 186.4976, FPS: 353
Physics time: 0.005362 Physics FPS limit: 186.4976, FPS: 353
Physics time: 0.000357 Physics FPS limit: 2801.12, FPS: 360

Jolt single thread (instead of 2) convex:

Physics time: 0.00504744 Physics FPS limit: 198.1202, FPS: 1
Physics time: 0.004344175 Physics FPS limit: 230.1933, FPS: 66
Physics time: 0.002714122 Physics FPS limit: 368.4433, FPS: 66
Physics time: 0.001554479 Physics FPS limit: 643.3024, FPS: 317
Physics time: 0.0006783324 Physics FPS limit: 1474.203, FPS: 317
Physics time: 0.0003822733 Physics FPS limit: 2615.929, FPS: 360
Physics time: 0.0002402095 Physics FPS limit: 4163.032, FPS: 360
Physics time: 0.0002077179 Physics FPS limit: 4814.221, FPS: 360
Physics time: 0.0001857197 Physics FPS limit: 5384.457, FPS: 360
Physics time: 0.0001746437 Physics FPS limit: 5725.942, FPS: 360
Physics time: 0.0001679698 Physics FPS limit: 5953.45, FPS: 360
Physics time: 0.0001844394 Physics FPS limit: 5421.835, FPS: 360
Physics time: 0.000174993 Physics FPS limit: 5714.515, FPS: 360

Here’s my quick thoughts:

Turns out that our approach of using combined spheres to make microbe collisions is faster than convex shapes in Godot when nothing is colliding (0.000362 vs 0.000698). Whereas then when shapes are colliding a bunch the convex shape is much faster (0.006466 vs 0.003704).

In Jolt the combined sphere shape is much faster when there are a ton of collisions (0.0008753056 vs 0.006744191) but then it doesn’t reach the maximum performance of the convex shape (0.0004281088 vs 0.0001341461). So it seems a bigger test / testing with microbe colonies will be necessary to pick which is overall the better approach for Thrive: combined sphere collision shape or a convex shape (this disallows holes and concave parts of microbes).

And luckily for me, Jolt is faster (even when running with a single thread, and Jolt scales up to speed up at least to 8 threads). Funnily enough it seems the single thread mode of Jolt is faster when literally all of the bodies are colliding in a big clump, as this likely prevents parallel processing.

As a summary here’s the first frame (a ton of collisions) and then basically the top performance of each test to give a quick overlook of which approaches are good when there’s an absolute ton of collisions and which work well when collisions are rare:

Jolt single convex shape per microbe (UPDATE: may actually be the spheres case):
Physics time: 0.006744191 Physics FPS limit: 148.2758, FPS: 1
Physics time: 0.0001341461 Physics FPS limit: 7454.556, FPS: 360

Jolt combined shape from spheres:
Physics time: 0.0008753056 Physics FPS limit: 1142.458, FPS: 1
Physics time: 0.0004281088 Physics FPS limit: 2335.855, FPS: 360

Godot physics (Bullet) convex shape:
Physics time: 0.003198 Physics FPS limit: 312.6954, FPS: 1
Physics time: 0.000678 Physics FPS limit: 1474.926, FPS: 360

Godot physics (Bullet) combined spheres (currently the approach used in the game):
Physics time: 0.011677 Physics FPS limit: 85.63844, FPS: 1
Physics time: 0.000362 Physics FPS limit: 2762.431, FPS: 360

Jolt single thread (instead of 2) convex:
Physics time: 0.00504744 Physics FPS limit: 198.1202, FPS: 1
Physics time: 0.0001679698 Physics FPS limit: 5953.45, FPS: 360


Some non-microbe findings: Godot is pretty slow at rendering a ton of multimesh parts that all need to update constantly. Individual Godot nodes have pretty good frustrum culling and gives pretty nice FPS bonus. Need to investigate which is the optimal way to setup microbe graphics to move around based on Jolt computed data.


Edit: I just realized I probably mixed up the sphere and convex body creation for Jolt, so mentally flip the numbers.

1 Like

Even though I am not a programmer or understand much about how to improve game performance, I will say that my computer struggles in the Micro-multicellular stage likely due to all of the information it has to process. It also doesn’t help that my computer is older, not optimized for heavy game, and has no battery in it at the moment (lol)

Okay so I made a bit of an embarrassing mistake (I had a missing !) and I actually tested the opposite of what I wanted with Jolt. So the numbers for convex are actually for the sphere case and vice versa.

So turns out that sphere collision shapes are actually more efficient by about maybe 25-40% in terms of speed (when not literally all bodies are in a huge clump colliding).

Which is pretty nice as it will be a bit simpler to convert the current code to use Jolt when I don’t need to overhaul the order of operations (for convex generation the membrane shape is needed to be computed one frame earlier for collisions to be created on microbe spawn).

Also the physics speed seems to pretty much scale less than linearly with the number of spheres each collision consists of (performance is still fine with 60 random mutation steps as compared to 25). Even with 100 mutation steps the simulation still gets around 400 physics frames per second. Though, at this point it is starting to be the case that convex collisions are more efficient. So maybe microbe colonies is the point where the performance for sphere collisions explodes and convex shapes are needed?

I’ll give that some thought as I’ll now start trying to get the microbe benchmark (which requires most of the normal microbe stage logic to be working) working with the new physics.


Update: with Godot physics using spheres the 100 step massive test results in physics FPS limit of around 80-95, so a fourth of the performance. Using Godot convex bodies results in: 80-105 FPS (with it staying abit on average closer to the higher end). I think this close performance shows that it was not a mistake to originally design Thrive to create microbe collisions out of spheres rather than using a convex body.

So as things get bigger the gap between Jolt and Godot performance stays, and even gets bigger. I think this might give some clues as to why the multicellular performance is especially complained about as it might get almost linearly worse when you have 10 big cells glued together into one physics body.

2 Likes