Calling All Hands on Deck for fixing up 0.6.4

After the huge refactor recently to switch Thrive to use an ECS architecture, there’s still a ton of things to tweak back to normal or flat out reimplement. I was hoping some time ago that we could release 0.6.4 next week, but that was way too optimistic. There’s still at least 70 problems I know of that should be taken care of before next release as otherwise it’ll be a really rough release with a ton of new bugs and maybe even some crashes.

So I’m making this post to call out to all of the Thrive team members (and why not active community members either) to help get Thrive 0.6.4 released in the next few weeks. Basically I just need a bunch of help from people to identify issues I haven’t seen yet (this is important as getting constant reports about the same problems just slows things down), and also help in fixing some of those issues. I’d especially appreciate if recently active people could verify that features they previously added still work as intended as those are probably easier for someone else than me to verify thanks to knowing intricately how it works.

With some help I think we can still manage to release Thrive 0.6.4 in early December, which is now my new goal. It would be pretty bad to end off the year without managing a new release at all. Or making a new release but it is so bad that it isn’t good to play for new players or casual returning players.

Here’s my short TODO list I’m trying to tackle before making a public test build (items marked with ? I’m on the fence about and might postpone if they start taking up a lot of time:

  • CalculateRotationMultiplier needs to be folded into physics creation and size system
  • chunk collisions get deleted on save load
  • physics background running multiple steps after logic update while render happens and combined callback data over multiple physics updates to not impact gameplay
  • for above need to have keeping collision data until told it can be cleared / need to not happen until next batch of logical updates is allowed to start
  • probably anyway need to switch to the stored physics callback approach so that lag spikes with multiple consecutive physics updates don’t cause instability
  • colonies to not crash
  • ? test that colony pilus works and also damage is received by the right cell
  • ? playable all the way to the prototypes
  • linux core dumps on exit (in glibc exit handlers), maybe caused by the static link of the distributable version?
  • need to make v2 of the native library

So those are the items I’m trying to tackle myself ASAP. Here’s my latest PR I’ll merge in a bit that has a bunch of fixes:

And here’s the full list of problems I know of that I think should be solved before the next release:

TODO list
  • switch jolt to a custom job system
  • run physics in the background while Godot is rendering the frame, implement starting a physics run at the end of an update and ensuring the run is finished by the next update, for above probably also should synchronously run 2-4 physics updates to catch up if behind more than 1 update’s worth of time (to ensure that 20 FPS still runs at normal speed)
  • custom executor for C++ implementation that allows the above and the physics engine to share threads
  • source package script should include git submodule info / their checked out commit (maybe in revision.txt?)
  • source code bundling should include the .gitsubmodules info (and info on how the native modules need those to be separately downloaded to compile)
  • ensure that NDEBUG is defined when building in distribution mode
  • check that native library unload happens after the current scene is destroyed (currently NativeInterop.Shutdown gets called before disposing the microbe stage)
  • option to disable manifold reduction for microbes to get all collisions for accurate pilus detection
  • the metrics panel needs to be hooked into the external physics system to have the physics numbers appear there
  • CellType photographing
  • cell body plan editor (using billboarding with photographing for cell graphics). need to reimplement multicellular body plan editor visuals, could probably use the photostudio (with a new feature for higher res images if the game resolution is over 1080p) and draw those on quads to get the visuals easily scaled and with more performance
  • Microbe.Multicellular
  • ModLoader.ModInterface.TriggerOnDamageReceived (in Health.cs)
  • Fix the weird workaround and investigate the corner cases in microbe colonies · Issue #2504 · Revolutionary-Games/Thrive · GitHub
  • need to test all prototypes still function correctly
  • cilia pull upgrade reimplementation
  • pilus damage doesn’t seem to apply (it does in some cases, tweak this)
  • unbinding (g_perform_unbinding)
  • Fossilization dialog (probably just depends on photostudio)
  • all the notimplementedexceptions in the gameplay code
  • test: reimplement projectiles going through microbes of the player species as well as the player’s own cell colony
  • freebuilding having editor immediately available doesn’t work
  • Calls to FilterCollisions need to be wrapped in an exception catch at the native code boundary
  • TODO: disable dealing damage to a pilus
  • re-add smooth camera player follow, seems pretty hard to get right currently without causing jitter
  • growing microbe colonies should be able to report new entity weight to the spawn system
  • // TODO: reporting the player position to all systems on game load
  • when engulfed microbe mode needs to be forced to normal (i.e. the cell that gets engulfed can’t stay in engulf mode or any other special mode)
  • engulfed cell needs to set absorb speed to -1 to stop it from absorbing
  • engulfed cells need to set vent threshold to float.MaxValue. engulf component needs to disable / enable compound venter in the engulfed target
  • multicellular reproduction should copy some fiends like notices from microbe event callbacks
  • open issue: system to unlock engulfables if their HostileEngulfer or attached to component becomes invalid to avoid objects being stuck in bad state
  • open issue (/investigate, I removed part of this): cell accidentally stop digestion always at 50% (permanent debuff), investigate. reference Discord
  • TODO: make sure that reset organelle layout (and growth) properly update capacity values in the compound storage
  • reimplementing organelle render priority
  • microbe colonies should have code against having dead entity references in the colony to clear them in case some non-death system (like despawn) kills the entity
  • engulfing a colony member need to properly move that entity to the right state
  • check: SAVE LOAD NEEDS: OrganelleContainer.OnOrganellesChanged to be called
  • make sure control.SlowedBySlime gets updated by some system
  • engulfing has to set the engulfed body as physics.BodyDisabled and reverse that on eject
  • player being engulfed should prevent the editor button from being clickable
  • should deleting an entity that has AttachedToEntity components pointing to it delete those entities as well?
  • MicrobeEventCallbacks.OnUnbindEnabled is never triggered
  • check that duplicated cells get resources properly. Seems like they only get 0.1 out of 0.5 for some reason instead of almost half like before?
  • ATP bar no longer stays properly full instead it flickers near the top, probably our cutoff for when to show the bar full is not good enough?
  • cheat to slow down time to easily test the above animation
  • rotating the camera now causes the cell position to change
  • being engulfed allows entering the editor (need to disable the button)
  • engulf progress indicator now can go over 100% (seems to end at 200%, likely caused by me fixing the cells being killed at 50% digested), and no longer says “devoured” on the health
  • player being engulfed to death causes the death sound to play twice
  • rotation speed needs some kind of better non-linear function to really get it to be good again (and allow really big cells to get very slow)
  • seems to be fixed: cell corpse chunks seem to dissolve immediately
  • TODO implement vent compounds on fade
  • adjust rock densities to allow microbes to again slowly push them
  • fix the rock chunks requiring scaling up by 90x to make the physics object scale roughly even match the visual one
  • physics rocks models seem to be rotated differently than the graphics ones
  • exiting a game while debug labels are active and starting a new game crash the game with at DefaultEcs.Entity.get_IsAlive () [0x0000e] in <71cea9262b2d4f47b57b59fbd6ae9b26>:0 at DebugOverlays.UpdateLabelColour (DefaultEcs.Entity entity, Godot.Label label) [0x00001]
  • toxins don’t seem to damage cells (easily test with benchmark)
  • enable O3 for the Thrive native distribution builds / release
  • write in our license the used native libs (probably there already) and mention how the thrive native libraries are compiled with clang with love
  • debug draw lines don’t work correctly on windows (there’s a screenshot showing the problem) hopefully the problem is just a small bug in the debug line drawer
  • opening auto-evo exploring tool complains about not being able to load Microbe.tscn
  • delete unused component: AttachedChildren
  • open issue: find the root cause for Immediately ejecting engulfable that has no animation properties which happens in the benchmark a bunch
  • endosomes are not displayed (maybe due to render order?, it kind of looks like they are there but just underneath all the microbe visuals), they are properly attached to the scene and seemingly with all of their graphical properties but they don’t display at all for some unknown reason
  • test: make sure microbe engulfing size is updated for colony members

And here’s issues I know of that I think can be saved for the next release:

Issues for later
  • MAC VERSION
  • CI running for clang tidy and format
  • add a specific native_checks CI job to check the native lib is formatter right and doesn’t have any errors
  • need to have thread affinity for specific CCX for physics threads
  • need to write a styleguide for the C++ code, use clang-format and clang-tidy to automatically check things
  • investigate if a total memory pool or custom allocator with alignment requirements would be good to use
  • can CI executor builds just clone with depth 100 and also all the submodules like that?
  • need to add checks for minimum physics shape sizes / need instructions for developers to always run the physics library in debug mode when developing
  • CI check for native library version hash compute from all the source files
  • use JPH_TRACK_BROADPHASE_STATS to tweak broadphases
  • preprocessing step for going from Godot .shape to Jolt specific format for faster loading performance. need a pre-export step to bake godot collision shapes into efficient form for Jolt. pre-bake step for Godot → Jolt collision shape conversion CreateShapeFromGodotResource
  • need a way for the managed executor to ask all native threads to help with managed tasks until are all done
  • rename reachTargetInSeconds as it probably doesn’t match / investigate why it matches
  • one more native module as Godot extension (thrive_godote) that allows the compound clouds to be simulated faster and uploaded to the GPU. Hopefully this is not too tightly linked to the C# side to allow still good read and write access from C#
  • making the physics debug draw renderer as a godot extension for performance reasons
  • setup instructions mention that downloading the repo as zip doesn’t work (unless Github has suddenly started including the git LFS files in that kind of download) https://github.com/Revolutionary-Games/Thrive/blob/master/doc/setup_instructions.md
  • implement Floatingchunk.ConvesPhysicsMesh again with the new koshiko models
  • can probably delete the IEntity (once no longer used in the prototypes)
  • delete the EntityReference and related classes as unnecessary (once when prototypes don’t use it as well)
  • if can be done without much more memory copies: Microbes should on init start to calculate their membrane in the background to give a bit of extra time before world attach when it needs to be done
  • need to write an overall ECS architecture document and mention about the dirty flags for various things
  • Must be set to true when component data is modified and systems should re-check the data. Some systems will not react to component field changes without this. So it is very important to check if that field or equivalent exists in a component
  • find if there are any potential stackalloc places to speed things up
  • Refactore OrganelleContainer.Organelles if at all possible to reduce the number of objects / data that needs constant processing during gameplay
  • a tool that can verify / generate threaded system run info based on attributes about read and write to components
  • for above should also have an attribute to specify requiring godot / main thread
  • reduce stale check time to 60 days, but keep the close after 30 more days
  • a spin lock library (a hybrid lock that resorts to a lock after a while) or file for the native side to ensure mutexes performance impact is lighter?
  • maybe rate limited systems will need a separate UpdateIfTime method that only even calls Update if it is time to run it
  • moving what makes sense out of the microbe AI system to reduce the code amount
  • overall architecture document that can also document the ECS approach
  • delete PhysicsHelpers
  • delete the helpers for ICellproperties, CellPropertiesHelpers
  • ECS architecture document needs to have info on when to use EntityCommandRecorder (when adding or removing components during a system run)
  • moving fast noise lite to the C++ module
  • reimplement engulfer digest update rate limiting if that is problematic for performance
  • open issue: create a separate injectisome icon instead of using the toxin vacuole icon as the upgrade icon
  • fix Divide to use membrane radius rather than the organelle positions when calculating division
  • open issue: does editor storage space stat need fixing now with specialized vacuole upgrade existing?
  • comment on the game settings remarks that the setters are private exactly so that the value objects are not reassigned as that breaks all observers on them
  • open issue for making the pause menu into an autoload to ensure the thriveopedia and other content doesn’t have to be reinitialized on each scene change
  • reimplement usage of SurvivalStatistics or remove it entirely
  • Godot debug menu “spawn entities with collision shapes” needs to be hooked up to the new system and should be unselectable when not running a debug version of the native library
  • dotnet 7 runtime: TODO: add this when we can to reduce hyperthreading resource use while waiting X86Base.Pause X86Base.Pause Method (System.Runtime.Intrinsics.X86) | Microsoft Learn
  • after release: a full threading pass on all the systems that might benefit from it and the tool to generate parallerl task tracks to fix the performance of all the systems together to be much faster
  • verify that this is no longer the case:
    // TODO: it seems that at low framerate (below 20 or so) cells get a speed boost for some reason

So please feel free to give the latest Thrive code a try and either report any still unknown issues in this thread or coordinate who wants to fix what.

Edit: there’s now a project on Github to collect all the known things

2 Likes

I’ll see what I can do from a coding perspective. As I recall, you advised just re-cloning the repo post ECS refactor, right? I haven’t touched the codebase since late summer.

For the testing side, I think it would help if we had someone who isn’t our lead dev in charge of collecting bug reports and combining duplicates. I expect we’ll get an enthusiastic response from the players for playtesting, which could be a curse as much as a blessing if nobody has time to comb through all the reports.

2 Likes

Is there somewhere we have a compiled list of all the specific crashes we have encountered or is it just open github issues?

That was a separate thing related to repo history rewrite. If you have recloned after the history refactor (which was during the summer) then you are good to go. You’ll know if you are in trouble if git complains about divergent histories or no common commits found for a branch.

Why I’ve recently been telling people to reclone is that the native module compilation stuff is easiest to explain cleaning by just deleting the repo (I was too lazy to list all the build folder names to delete). And at least one person had like local commits on their master or something that caused a different repo state than should be on the master branch, again that’s easiest to explain fixing by just telling to reclone.

I put them as collapsible sections in my post as they are pretty long.

I created a new github project where all issues related to this should be put on so that it’s quick to see at a glance what is known:

I added a table view as alternative to the cards, but that seems to have the drawback that it is hard to show which items should be done now and which should be done later.

If anyone has spare time, taking all of the bullet points in my post and dumping them in the Todo and TODO after release columns would be nice.

1 Like