I have been working on the world generation system for a while now, and I would like to discuss my findings so far along with a proposed solution:
Defining the Problem:
Thrive will need a world generation system that can support as realistic of a simulation as possible not only for the earlier stages where the focus is more on the biological evolution of organisms, but also the later stages where the evolution of realistic societies will be the main focus. Many of the features that the earlier stages require from the world generator will continue to be useful for the later stages, such as the basic terrain generation and dynamic climate system. However, I posit that the later stages will need to generate much more than just world terrain on demand.
The earlier stages are able to simulate the world at a low level of detail (LOD) without seeming to compromise too much on the quality of the simulation. For example, the microbial stage currently runs its simulation on a small amount of patches representing vast areas of the world, and this seems to be a good fit for it. This allows for the simulation to run on the scale of the entire planet without having to simulate it using thousands of patches, which would require a sacrifice in terms of the amount of detail each patch could be simulated to. It also reduces complexity for the player, as they arenāt overwhelmed by a huge patch map when all they need is a basic overview of the world.
However, for the later stages it will become increasingly important to be able to simulate the world at finer resolutions, all without making the computation time grow exponentially. At the very least, this will have to be done for the terrain generation part of the world generator, as the worldās terrain would be far too large a data set to generate at the highest LOD initially. The higher LOD of the worldās terrain would be generated dynamically, as it is needed. This is a fairly normal thing done in these types of large scale procedurally generated worlds, as it is basically required as the alternative would be to generate gigabytes worth of just terrain data in order to get a barely decent level of detail on a world that is similar to Earthās size. For example, an available GEBCO global terrain elevation data for just a 450m resolution grid, is about 8GB of data. (https://www.gebco.net/data_and_products/gridded_bathymetry_data/)
We would need a better resolution data than that in order to have the terrain look very good for at least the multicellular to society stages, as that would be the primary stages that would be closest to the terrain, and require high resolutions to look any good.
This is still pretty typical though, and games have been doing terrain LOD for quite some time. However, I posit that much more than just the worldās terrain will need to be generated at high LOD to allow for the realistic simulation that we want.
Besides just the terrain data, any climate data will also be needed to generate at high LOD in order to have the climate able to correctly interact with the high LOD terrain. If the climate data was to stay at low resolution relative to the terrain, then there could be ugly discontinuities and artifacts visible at the earlier stages when exploring regions with complex, high resolution terrain and low resolution climate data that can not match the terrainās detail.
Another feature needed is the ability to partition the biome patches at high LOD. At the start of the game, in the microbial stage, the world would only want to generate and simulate at a low LOD in order to allow for the world to be simulated at the global level. Once the game progresses to requiring higher LOD for the world, the biome patches will also need to be adjusted to be consistent with the low LOD versions while adding in the high LOD elements to the patches, making them more complex and detailed. If the biome patches were just to stay at low LOD then the individual patches would remain huge and monotonous due to the huge scale of the world. And if the biome patches did not conform to the higher resolution terrain and climate data, the biome patches would become a worse match to the actual terrain as the LOD increases.
All of this already requires a robust world generator that is able to generate extremely high detail regions that must remain consistent with previous lower levels of detail so that the worldās development remains consistent across the early stages. But, there is even more that the game will have to support in order for the later stagesā society evolution to be done realistically at these large scales.
According to a random google search, there are many more than 10,000 cities in the world right now. There are also many more smaller urban areas, such as towns or villages (Much more than 100,000). There are also about 200 different countries on Earth, and this is after a large amount of consolidation over the centuries. It seems unlikely that all of the disparate communities and civilizations that there should be across a realistic sized world would be able to simulate at a reasonable LOD at a consistent level globally. And it would be completely infeasible to have all of the cities or urban centers simulate at a high LOD due to the sheer amount. These different and numerous urban centers would have to develop naturally over time, from nothing to villages, towns, and sprawling metropolises. This seems to be a very similar situation to the worldās terrain generation, only now it is dealing with data (cities, etc) that would be extremely dynamic over the course of the simulation. This requires a system that can create detailed parts of a society as it is needed, whether it is creating new cities from scratch on demand or just increasing the LOD allocated to a region that was previously only created and simulated at a low LOD. Only the regions close to the player (or are otherwise relevant) will need to be generated and simulated at a high LOD, and the rest of the world could either simulate at a lower LOD or just be generated on demand. There are many more considerations that a dynamically variable LOD for society generation and simulation system would require, but I think this is sufficient for now. I might make a new post about the society simulation as it relates to world generation at some point. The main point is that societies will not be able to be generated and simulated at a consistent LOD across the world without limiting either the scale or depth to be far less than real (Earthās) scale. This could be solved easily by reducing scale to something like Spore scale worlds, but I doubt I need to explain why that would be problematic for Thriveā¦
GIS:
After looking at all of these different requirements that the game would need for detailed simulation of a realistic scale world, I propose that developing a Geographic Information System for Thrive would help solve many of these problems in a neat and logical package. A Geographic Information System (GIS) is a system that can store, manage, analyze, edit, output, and visualize geographic data (source: Wikipedia). This is typically implemented as some kind of spacial database that stores all of the data that is associated with the world. What is relevant for us is that we need a system that will manage all of the data that relates to the world, from terrain data to locations and population numbers of different organisms or societies. This differs from typical GIS though in that our world would have to be generated dynamically, but would otherwise be very similar in requirements to a normal GIS. This is essentially taking the standard dynamically generated world terrain of procedural planet generation methods to its logical conclusion. By merging the dynamic generation of a gameās planet generator with the power of a GIS, it should make the design of the game much easier to deal with by neatly modularizing parts of the gameās codebase. I doubt that the idea of having the features of a dynamically generated GIS in a game is a very unique idea, but clarifying exactly what things the world generator needs to do by calling it a GIS makes the idea of what features it will need to support much clearer.
I will be calling the system Dynamically Generated GIS (DGGIS) for now, but I am not sure if that is the best way to label it.
To learn more about Geographic Information Systems, this is a nice short introduction: 2. Introducing GIS ā QGIS Documentation documentation
ArcGIS by Esri is another large GIS used widely, and so there is much information out there on it if you wanted to see how it is used. Also, check out ArcGIS CityEngine (Procedural 3D City Generator | 3D City Design for Urban Environments), which integrates with the ArcGIS system to enable procedural city generation. This is related to what we would want Thrive+DGGIS to be able to do, only focused more for real world applications.
Thriveās Dynamically Generated GIS (DGGIS):
Now, to define exactly what it is that I am proposing in detail:
DGGIS is essentially a combination of a World Generator with a Geographic Information System (GIS). Its purpose is to support most GIS features, but with the ability to do so on an internal set of data that can be generated on demand. Some aspects of the geographic data will be persistently stored, while others will be generated dynamically as the client (Thriveās simulation or renderer) needs it. Some of this data can also be stored persistently but at a lower resolution than it was modified on in order to save on space when full resolution persistence is not needed.
The system will mainly be composed of various data planes storing different types of information. These data planes will be split into chunks representing different geographic areas of the world, thus subdividing it into a manageable amount of data. These chunks will be organized in a hierarchy such that low resolution (low LOD) chunks can be used to represent large areas for faster simulation purposes or to represent large areas when viewing and manipulating the world at large scales.
One potential implementation is to use a quadtree or similar data structure to split the data planes into Ā¼ size chunks per lod level. This would result in a simple to implement version of the GIS, and could efficiently support queries to specific regions. This method is what I am already using for the basic prototype world generator, and is what many other simple world generators use. More advanced methods can be researched in the future. This is only one part of the GIS system though, as we will discuss later in the partitioning section.
A major distinction between different forms of data that is stored in a GIS is vector vs raster data. Vector data is data that is represented as points, lines, or polygons while raster data is stored as a grid aligned set of data points. This is basically the same as vector vs raster formatted images (like SVG vs PNG). It is likely that DGGIS will need to be a hybrid between a vector and raster database. The choice between the two will be based on the specific case of how each data plane would be best represented.
The functional differences are that the vector data might need to be rasterized before being used in some cases, for example the terrain data would need to be rasterized before being sent to the GPU as the terrain mesh if it was stored in a vector format. (Donāt confuse GPU rasterization, which is done on the final terrain mesh given to the GPU to render, with the GIS rasterization which converts the vector formatted heightmap data into a discrete mesh of vertexes)
If you are wondering exactly what the raster would be, in typical GIS a raster would be a square(ish) grid. This is not a requirement though, it is completely possible to have the raster be a hex grid or triangle grid. Googleās S2 Geometry library uses a āsquareā grid (https://s2geometry.io/) while Uberās H3 uses a āhexā grid (H3: Uberās Hexagonal Hierarchical Spatial Index | Uber Blog), and others are definitely possible. Note that these are actually hierarchical spacial indexes, and the actual data in those specific applications can be stored in a different raster format or just use vector format, but that isnāt too important here. This distinction shouldnāt be too important for us because we will be generating all of the data ourselves, so we can align the data regularly and consistently.
If you have any comments about the vector vs raster format, you might want to look at the post on World Tiling/Grid (World Tiling/Grid) first. I think that any discussion about using a grid (whether square, hexagonal, free form, or other) should be had there, in order to make the discussions more organized. If the discussion is specifically about it in relation to the GIS design, and not about the game design aspect of it, then it should be done here.
One important aspect of the system would be a partitioning system. This would be the system that generates the biome patches by looking at the worldās terrain and climate data, and partitioning it into distinct patches. It would also manage other regions that need to be partitioned such as civilization borders. The partitioner would have to manage all of the spatial connections between adjacent patches and communicate this to the client so that Thrive can run simulations using this information. These partitions would generally form graphs where the nodes represent each region (such as a biome patch, or a societyās borders) along with the ability to get properties of that region from queries to it (such as population numbers, or region boundaries).
The partitioning system would need to be able to resize patches when different conditions change. For example, if the global temperature rises, a desert biome patch might expand. Or, if two societies are at war with each other, one might capture a city causing the society boundaries to be repartitioned so that the city is within the correct societyās borders, and the other societyās border would shrink accordingly.
For the microbial stage, the partitioning system is one of the most important aspects of the entire system. The microbial stage can basically just have the DGGIS system generate the world at a low LOD, generate the corresponding biome patches (partition the world), and then give the biome patches to the game. The system will also have to support whatever simple queries to those regions, such as whenever the simulation is wanting to know what organisms currently are associated with a patch. It might also want to know simple things like light level of a patch, or oxygen level. This should not be overly complicated to design, and if done right could make expansion of the system into the later stages easier to do.
I would also propose that the entire world generation system (DGGIS) be created as a separate project from Thrive. It would still be developed specifically for Thrive, it would just be done in a nicely separated manner. Thrive could then define an interface along with a specification and list of features so that the system could be designed in a clean and modular way. By designing the project like this there would be numerous benefits. A big one is that it could increase confidence in the system being able to be eventually realized. By having the specification separated from the implementation, if one implementation of the project does not turn out or any developers working on it disappear, the system should have been documented and defined well enough so that someone else could pick it up. Or, completely separate versions of the GIS could be swapped out later if the interface hasnāt changed which could be fantastic for Thriveās modability. Or it could enable more complex configurations or sets of different world generators for use in Thrive.
DGGIS could thus be developed in parallel with Thrive initially. This would allow time for the interface design to stabilize to avoid annoying refactors in the main Thrive codebase that would use it, or getting stuck with badly designed interfaces. It would also allow for quicker iteration of the system and allow for more experimentation. Simple demos or tests could be created that use the DGGIS system to test out different features without having to deal with the large Thrive codebase. The main downside with this method is that it would likely take longer for the system to get to the point where it seems stabilized/developed enough to be integrated into the main project. This might be able to be mitigated somewhat by designing a simple/cut down version to use just for the first couple of stages in order to provide a better transition to the full version of DGGIS. This all depends on when the world generation system is needed, and what the pace of development of DGGIS would end up being.
Designing DGGIS as a cleanly separable project would also open up the possibility of having the world generation system able to be used in more projects and games than just Thrive. This is similar to the Story Engine discussed here earlier: (Story Engine). Having more parts of Thrive that are available to be used in other projects could attract more developers to contribute, or it could just make it easier overall to get involved with developing Thrive if its components are well-defined, with smaller codebases to be involved with and have to understand. I personally like this ability to use in other projects because I would like to be able to make some nice demos that could show off what the world generation system is capable of without having to heavily modify the Thrive code to do so, even if the features would be intended for use in Thrive later.
TL,DR:
Because of the sheer scale of Thriveās realistic simulation and the requirements of generating detail at high resolutions, Thrive will need a system that can manage world scale amounts of data in a coherent and intelligent system. Creating a Dynamically Generated Geographic Information System (DGGIS) is my proposed solution to this, which will abstract all of the complexity of managing world data into a single system, and relates well to how real world geographic data is actually managed: in geographic information systems like ArcGIS or QGIS.
Now, if you have read all of this and think that this system is far too complex or will likely never be done: donāt worry! Just like Thriveās organisms and societies, DGGIS can start out simple and evolve over time, gaining more advanced features as it develops. At its core it isnāt really doing anything very novel at all, and my very basic prototype world generator that I have been working on could already be called an extremely primitive GIS. The main point of this thread is to discuss whether this overall idea could work, and exactly how this should be implemented if so. How should the interface be structured? What are all of the features you think the world generator/DGGIS should support? And of course, is this the right system/abstraction to use for Thrive? If this idea can be refined into a specific set of specifications and goals, then this should make it far easier to design both the world generation system on its own as well as Thrive as a whole.
Also, if something doesnāt make sense about my explanation, please tell me. I had a far longer document that I had to cut down and splice into this because it was getting too long and perhaps off track, so there is a change that some part of the context could be missing. I split off the entire discussion about grid systems as well, because I donāt want that to clutter up this thread as I foresee it could.