CPA (Compounds, Population Dynamics and Auto Evo)

So I think the AI thread got a little off topic, my bad. I decided to start looking into Outcome of hunting attempts that @tjwhale discusses below and started this thread on CPA to discuss it. If this belongs somewhere else just let me know and I’ll move it, but the CPA threads I’ve looked at seem pretty old now.

So I’ve made an all in one Lotka-Volterra style model that should simulate predator-prey interactions for any number of species. The write up is a bit long (sorry) and has equations and figures so I thought it easier to upload it as a pdf Find it here.

Its a good starting point I hope, but needs refinement. For example, is it too simple, not simple enough, if there aren’t too many organisms should we do a deterministic model or should there be some stochasticity? So read it over and let me know what y’all think.

This is great! I love it when people bring the equations, it makes me feel like we’re really getting somewhere.

I agree with pretty much everything you have said. The population number for each species is derived from the compounds part of the model (I’ll explain that later) but yes when we have it this is a great start for working out how predation goes.

The goal of all this is to get to the point where we can say “a microbe has these organelles, how many of the other species compounds does it steal.” Which is very challenging. Where @Seregon and I had got to is that the final formula for c_xy should look something like this (it’s more about compound transfer than population changes but it basically works the same)

c_xy = P(encounter)*P(Not running away)*P(winning the fight)E_eE_d

where P(X) is the probability of X happening and E_e and E_d are eating and digestion efficiency exactly as you suggest (this is basically more detail for d_xy in your model which, in the usual L-K is just based on population densities).

I guess P(encounter) should work in the usual Lotka-Volterra way (proportional to the density of those microbes in the patch).

P(Not running away) is a little more complex. I have been thinking we should assume when a member of species A fights a member of species B we assume they are at full power (both members uninjured and full of energy) however maybe we need some stochastics in there to represent a wolf occasionally being able to kill a bison if they are old or wounded. If you assume they are both always full power then a wolf will never be able to kill a bison. What do you think about this?

Anyway P(Not running away) needs to be based on the speed of both microbes as well as the ai (especially of the microbe running away) (though we could ignore ai dependence but that is a shame, especially if we have explicit instincts we can query for behaviour). Moreover we need to know if they have any agents which might work to slow the other one down and also how effective that is (presumably when you are running away agents you excrete pretty much definitely hit the thing chasing you but if you are chasing it’s hard to spray agent on the thing running away).

Finally P(winning the fight) is even more complex still. Do you just assign each species a “fight strength” parameter to govern this or do we try and go deeper and say “if they have the right agents then that increases their power against this species specifically?” What do you think?

I was thinking we could simulate (just put an example species A in a petri dish with an example of species B and see what happens) however the others convinced me this is a bad idea because it severely limits the number of species we can have per patch, a formula is super cheap computationally, running a simulation 1000 times per species combination is like 1000n^2m where n is the species per patch and m is the number of patches so I think simulation is not feasible.

Also what about stealth? Where would that fit, maybe into P(encounter)? What if the predator has steal, does it then go in P(winning the fight)? Remember that, much like with the ai, it’s best if we can build a system which will work for everything up to the society stage because otherwise, if we make it too microbe specific, we are going to have to come back here again and build a lot more when it comes to the next few stages. Maybe that’s ok but I think abstract is best.

Finally when you talk about scavengers that’s what the bacteria are for (you can also be a microbe species scavenger). They take excess compounds and break them down into basic compounds (and they do other stuff too).

Okay, just to be clear. The model I’ve proposed is meant to be run in Biomes the player is not present in. I don’t think it will work in a biome the player is in because it is a mass action deterministic model. Adding the player randomness and the limited amount of actions we can have on screen there is no way we can claim mass action models will come close to being realistic. Hell, even in other biomes I’m not sure if there are enough organisms to rely on mass action and we might need to use a different model. Though this is computationally the least expensive so we should start with it.

Anyway what this means is that the probabilities you’ve described are more implicit than explicit. They wouldn’t go in the c_xy parameter, they would be a part of the d_yx parameter and end up being condensed to a single number. So say organism A is good at beating organism B and has high P(encounter) and B sucks at running away, all this would mean is a high d_BA, i.e. a high death of B per A per unit time. So the wolf bison example we would have a low d_yx. Yeah a fully grown healthy wolf might not always kill a bison, but some bison are young, or sick or too old to fight so sometimes it happens. This means a low amount of bison are killed per wolf per unit time. The fact that it happens infrequently though has no bearing on c_xy. This only comes into play when it does happen. i.e. there is no compound transfer when a wolf doesn’t kill a bison so c_xy is meaningless in those cases.

If you want the overall compound transfer over a certain period of time it would be something akin to:

C_T of x * population of x * c_yx * d_xy * time of simulation

So if organism B kills and eats A then the equations means

compound total per A* population of A *compound uptake efficiency of B for organism A * death of organisms A per B per unit time * total time we’ve run the biome. The final units of this I believe simplify to concentration of compound. And the answer should be how many of A’s compounds did we transfer to B over the entire biome simulation. If you want to get more specific, for example B cannot uptake compound 1 of A unless it has organelle 1 then there would just be a specific c_xy_1:

c_xy_1 = E_E * E_D_1, where E_D_1 is B’s ability to digest compound 1 and it would equal
1 - (1 - E_O_1) ^ (n_O_1) , where E_O_1 is organelle 1’s efficiency and n_0_1 is the number of organelle 1 present in B.

Now, how to incorporate the explicit probabilities implicitly into d_xy is another story. I think it would depend on what AI model we end up with and what info we have on each species. But since this model is mass action and since it is in another biome away from the player it might be best to just rank microbe features, give them each a score and determine a d_xy from some sliding scale. Like you said running simulations would be helpful but way too slow. This is where we really need to be clever in how we score a species. And at the moment I have no clue. Maybe we run several simulations now, get some predictive equations from that and refine them, and upload those predictive equations to the game.

If you and @Seregon were looking for models we can run in the player biome I would say we need some other model. We can still have a c_xy per animal per organ but now the probabilities are explicit and the use of the compounds is explicit. The computationally expensive approach is to just keep track of everyone in the players biome and have them all run on whatever AI we have. When one eats another one we calculate its c_xy and move on. I doubt this is feasible. We need some compromise between explicitly following everything and the implicit method above. There is something called the tau leaping method that I think can strike a good balance. I’ll look more into it and see if we can use it. I think it can make use of the instincts AI we’ve got as well as the memory stuff @TheCreator wrote about to get probabilities. I.e. fight and flee scores are compared and a P(not running away) is generated from that. If P(encounter) depends on stealth then there should probably be some AI stealth instinct we can borrow from. P(winning the fight) is tougher. Maybe we first assign a score. Then as time goes on we simulate a fight every once in a while and update the score accordingly?

FInally, how do we handle pack hunters? simulating fights, getting probabilities, all so far have been 1 v 1. Yeah a wolf won’t kill a healthy bison, but wolves will.

I’m really glad you’re getting into this stuff, you’re obviously a smart guy and I think it’s going to be great having you on the team.

Alongside that I want to say that this is not a new model, we have been talking about this stuff for quite a while now. If you are interested you can check out this thread

or have a look at what I wrote about the CPA model in the gdd channel on slack (about 3 posts up). We have already decided on a L-K type model for predation alongside a chemical model (both of which are based on random collisions between particles, interestingly) for the processes inside the microbes. I’ll try to describe the chemical part of the system now. So yes there is some stuff that needs to be fleshed out and formalised however this is not a new problem. Once you’ve understood where we are at then I’ll happily listen to what you have to say about improvements and upgrades to the system (or even large changes) as you obviously have the expertise to help.

So yes as you say this is a separate system from the “swimming around” which the player does in the single player. When we talk about patches there is some debate about size but we could be talking about 100,000 sq km and all the microbes that live in an area that big (to keep the numbers more manageable we may be talking about a representative sample of a smaller size). However the CPA model is designed for populations in the billions, and as such the swimming around will have no impact on it.

So this model will run in the background of the patch the player is playing in. The players experience will then be slaved to the CPA model for that patch. So if species A is dominant in the model then the player will meet a lot of species A as it swims around. As you evolve your microbe in the editor you will need to think both about what you want as the player and what will help your species (you will get some info on the plight of your species in the editor) that’s the main reason why it’s important to model the swimming around as closely as possible with the CPA system because, ideally, you want the best choice for your species to be the best choice for you as the player. So that’s how the two parts will be related. This relationship will be the same for all the creature stages up to society where things really change.

The species is basically considered as a giant sack of compounds and the sack has two sub-divisions or pools. One is compounds locked and one is compounds free. So at the start all the compounds are in the free pool and then they are transferred to the locked pool proportionally to how many there are. This represents the microbes locking compounds away in their cytoskeletons and cell membranes etc. There is also a flow from the compounds locked pool into the environment representing the death of the species. So for a one compound system

d(Compounds Free)/dt = + (Compounds in from the environment) - (growth rate)* (Compounds Free)

d(Compounds Locked)/dt = + (growth rate)* (Compounds Free) - (death rate)* (Compounds Locked)

Now obviously things are more complicated than this, there are many compounds all being converted from one type to another and locked away in proportion to the “blueprint” of the species (number of organelles of different types).

The population number is then derived from the (compounds locked)/(mass of a single member).

The problem then becomes using this population number and the blueprint of the species use an L-K style model to build a food web for transferring compounds from one species (both pools) to another or to the environment. And that’s what I was suggesting we work on when I posed the problem to you the other day.

So yes I misunderstood some of your notation, apologies, I mixed up d_xy and c_xy. What we want finally is, as you say, “the overall compound transfer over a certain period of time”. Once we have that we are pretty much set as we can plug it into the other half of the model and we are done.

I’m not exactly sure what you mean when you say mass-action (do you mean it’s based on a the chance of randomly moving individuals bumping into each other?) or implicit vs explicit.

For pack hunters I would suggest we consider them to just be larger blobs. So they are less likely to encounter other species but whenever they do (as prey or predators) they get a boost to their “fight power”, however we end up determining that.

Does this all make sense? Please ask if you have any questions about it.

Instead of mass action I should have probably been using bulk reaction. So what I meant is that there are so many things interacting that we can use ‘averages’. So for example when modeling chemical reactions (mass action kinetics) you assume that billions of molecules are interacting, so instead of modeling individual interactions you model the rates at which reactions happen and get a deterministic model. On the other hand, if you only have a handful of chemicals and are trying to model their reactions then you can’t use mass action kinetics, you use some agent based model and the end results aren’t deterministic. And you’re totally right, if in the player patch we will have huge populations then an L-K type model is ideal. I just wasn’t sure if we would have that big of a patch. Pretty impressive.

For explicit and implicit I mean do we do the actual probability calculations, roll the dice, see the outcome and model specific interactions, or do we just look at the general trend and say, yeah this is what happens on average so we’ll just go with a fixed number that represents a rate of occurrence.

Questions n’ stuff:

  1. The current model compounds are conserved and finite in each biome? No creating carbon out of thin air, right?

  2. So the population model in the first post needs to be modified to include: what compounds am I getting, how much, from where, and how much did I lose trying to get it? Then instead of just having c_xy we have c_xy_glucose , c_xy_fat, etc., … aswell as some terms describing loss rate of compounds. If this is right then I think the c_xy calculations proposed are pretty close, just needs some expansion.

  3. This is where I get tripped up though, I don’t understand why population number = compounds locked/ mass of single member if we can have a population dynamics model based on compound intake and output already. What benefit does calculating population number this way give? Is it kind of like having a carrying capacity based on physical resources? Why is compounds free not part of the equation?

Or am I thinking about it all wrong? Do we not keep track of population change over time. Rather we keep track of compound change over time and then when someone asks what’s the population of A we look at the compounds and somehow extract A from them?

  1. What’s not close is how to get a proper d_yx term from all the probabilities you wrote down. This I think will need a lot of input. Also, should they not be thought of as rates rather than probabilities if we are working with large populations?

Anyways, thanks for clearing up some stuff for me. It’s starting to make more sense… maybe

Yes, as you say we are modelling everything as a continuous process. Much like a continuum limit in fluid dynamics we are assuming there are so many interactions occurring that the process is basically a smooth flow. It’s interesting because what happens inside the cell (chemical reactions) and what happens between cells (predator - prey) become basically the same type of process governed by similar equations, which I think is quite neat.

In terms of rates of reactions I think that’s just the (probability of something happening per interaction) * (number of interactions in time dt) right? So once we know the probabilities we are basically there.

  1. Yeah we are very keen on mass conservation. Energy, of course, is not conserved and will be inputted as sunlight (and Hydrogen-Sulfide from volcanic vents) but then everything else will be driven by those inputs. For that I went through carefully and balanced all the chemical reactions (as best I could) to make sure that they would conserve the individual elements. (Also I think it’s kind of cool if the game introduces people to the idea of a metabolic pathway). Also in each patch there’ll be an accounting function which will, at the end of every timestep, go through and count the amount of each compound and then add or subtract some from the environment to make sure it is conserved. The reason for this is persistent rounding errors could through things off so we need to artificially correct everything.

      1. Exactly as you say. We are only interested in keeping track of who has what compounds over time. So there is X carbon in this patch and it is either in the environment or it is in one of the species. That’s all we really care about. For predator-prey models we need a population number and so, at that point, we derive it from the amount of compounds a species has locked away but other than that we don’t mind how many of a species there are, we only care about what compounds they are hoarding (a one tonne elephant or tonne of ants is basically the same thing). (The other time pop-number matters is when it comes to absorption. So absorbing background compounds is proportional to the surface area of a species and for that you need a population number and the “blueprint”.)

So we only need a single transfer term, “you get 1/100 of species B’s compounds per timestep” and then those compounds are just transferred. Then you move them into your locked pool in proportion to how you are built but the transfer between species just gives you 1/100 of their CO2 and 1/100 of their protein and 1/100 of everything.

  1. Yes the equations governing predation are what I want to be working on, how do we work out the rate of compound flow from species A to B due to predation? It’s pretty difficult which is why we haven’t made much progress on it.

No problem. Seregon kindly spent a lot of time bringing me up to speed when I first joined and so I’m happy to do it, also you’re a useful person to have so it’s worth it. Moreover all this isn’t clearly written down somewhere (which would make all this much easier). What we’re trying to build here is pretty insane, it’s a giant chemical + L-K type predation model with custom predation determined by a first person microbe game. That’s why it’s so hard to describe because it’s so big (and also I think it’s extremely cool). The good thing is we are basically building this once and then it will work for all stages (with maybe alterations to the predation flows) and so we are front loading a lot of the work (as also we are trying to do with the ai).

Edit: Sorry I don’t know why I keep saying L-K, I mean L-V for Lotka-Volterra.

This thread covers auto-evo things, so I’ll post my misplaced comment from before here:

"I’m not a programmer, but I want to mention the fact that all discussion here is entirely focused on the influences of natural selection and mutation, which are only parts of the whole system of evolution (especially with the small population sizes present at the start of the microbe stage)-- one of the most important being genetic drift. This is something very important in small populations, but would require a very complex gene model to implement naturally, which is really unnecessary. Fortunately, plenty of mathematical formulas exist that can efficiently calculate the effects of genetic drift. As population sizes become larger, the effect becomes basically negligible and can be ignored in calculations.

Gene flow isn’t really something important for now at least, so it can probably be ignored.

A huge thing to consider is horizontal gene transfer in bacteria. Most of you probably know about this already, and it has a very significant impact on bacterial population genetics. Horizontal gene transfer does occur in other organisms, but not to an extent that I think it would really need to be modeled.

With genetic drift, the one requirement is sexual reproduction, which isn’t in the game yet. Is this feature planned to be developed soon? It’s obviously an important mechanic for evolution and of course necessary for the majority of multi-cellular organisms."

Hey @Poisonchocolate I’m not a biologist, you’ll probably need an answer from @Seregon.

However a couple of points, one of the main simplifying assumptions is that all the members of a species are genetically identical. This means there’s no genetic drift. In terms of the model there’s not a huge difference between having 1 species with two subgroups with slightly different genes which drift back and forth and having two separate species. So any drift is accounted for by the relative success of the two, very similar, species competing with each other.

Second auto-evo doesn’t keep track of where you got an adaptation from. In reality you could have a random genetic mutation or you could have horizontal gene transfer or something else. But in the editor you’re just allowed to make whatever changes you like. It’s not important where you got this from, it’s just important to make sure the change isn’t too drastic.

Hope this is helpful, as I say, I’d like to learn more about it myself and I hope someone who knows more can weigh in.

@tjwhale - I think your confusing genetic drift with gene flow, though otherwise your pretty much correct.

Gene flow happens between populations, it’s what keeps them the ‘same’. If gene flow is reduced (by distance or a physical barrier) it can’t keep up with mutation and drift, and the populations may form separate sub-species. Your first paragraph covers gene flow within a patch - i.e.: we won’t have any - we may have gene flow between patches be the factor controlling whether sub-populations become new species or not… or we might do it randomly if that’s too difficult.

Genetic drift is the tendency for the ‘average’ genetic makeup of a population to drift randomly. This is most noticeable in small populations because the gain or loss of a few mutations more greatly affects the average genome, and rare alleles can easily be lost if not all individuals have offspring.

We’ve considered having plasmids (a form of horizontal gene transfer) be pickups in game which you can use to gain ‘parts’. I’m not sure how horizontal gene transfer (hgz) would affect CPA, though its greatest effect would be on toxin effects and resistances (there’s a lot of interesting research on how these spread through a bacterial population, and hgz is a major factor). Hgz is unlikely to tweak a microbe in the way we allow the editor to do, its more likely to encode a specific protein, either a toxin, some sort of toxin interference, or some sort of enzyme.

(note that I’m still away, I’ll likely only be able to reply on weekends)

Thanks for the nice answer, I think I understand.

Depending on how toxins get encoded this could be included in quite a nice way, like when someone uses a toxin near you you get a knowledge of what it is and can then evolve your organelles away from being susceptible to it. Something like that. That would be pretty cool.