Custom archive format for saving

hhyyrylainen · October 2, 2024, 11:58am

I once again had to tweak what gets saved in Thrive and that means interacting with the JSON conversions. I think I’ve mostly found out all the pitfalls and ways to avoid them, but that has taken years of working with the saving system. This means that the JSON save format has an absolute ton of difficulty related to this. So I’ve noticed that no one else really wants to touch any feature that requires complex save data changes or writing upgrade steps for saves. And while I can get almost anything done with the saving system now, I’m not a huge fan of how difficult it is to use.

So I’ve been thinking for a while that it would be better to switch Thrive saves to use a custom archive format rather than JSON. This isn’t a huge priority right now but I’d like to do this the next time we break save compatibility (and definitely before 1.0).

Here’s a few random thoughts about this:

It’ll require writing more manual load and save methods, but those will be much, much clearer than how the JSON system works with really unintuitive attributes and converter types that need to be added. So we’d have slightly more manually written save / load code but it would be much easier to read and write so I think this is a huge benefit.
Save upgrade system could be scrapped and we could have version tags per object in the saves so then each loader method would need to separately support loading an older object format for that specific type (or throw an error to indicate the save is too old to be loadable). This should be massively easier to write than the current JSON save upgrade steps
The save info data could still be kept as JSON for other software to be able to easily inspect basic information from Thrive saves
A binary format should be much more space efficient (though I guess as we gz compress our saves the total result wouldn’t be that different in on-disk size).
A decision should be made whether our JSON archive format should have the structure written into the file. So for example writing a string would first write info that a string follows and its length. Or to save space (but make it harder to debug things if a save / load method is inconsistent with each other, but this same thing is also very hard to debug with the current JSON system) we could follow Godot’s example and just write the data. Then the reader is responsible for knowing what type of deserialize method it wants to call.
A custom serialization format could solve the issue where object references to each other need to be unnecessarily written to the JSON quite often which then on load cause a lot of extra processing time loading them, as this is something I haven’t come up with any reasonable way to fix with the current JSON system.

I think that’s all the thoughts I have for now. As a summary I really think we need to move away from the JSON save format as it is abused a ton to get it to support all the things we have in Thrive and this makes it very hard to update and make support new use cases as they come up when working on Thrive.