# CPA Algorithm

@Seregon it would be great if you could look at this some time. I understand if you are busy.

I’ll try to detail the CPA (Compounds, Population Dynamics and Auto-Evo) algorithm here in pseudo code. It’s quite complex. The notation f(a,b) means some unknown function of a and b which produces a real number. If anyone has any questions I’ll very happily answer them as best I can, this is only the current state and so can be changed before implementation. I drew a helpful picture

The world is divided in to many patches which have a certain number of slots for species. Inside these patches are compounds. The compounds can either be unallocated in the environment or they can be in a species. A species is a pair of bins which contain compounds and a blueprint. The pair of bins are free compounds and locked compounds. Locked compounds represent compounds which are used in the construction of the species (for example calcium in bones) and are not available, free compounds can be used freely. The population number of a species can be deduced from the amount of locked compounds. A blueprint is all the information on the makeup of the species, for example number and placement of organelles, processes the species can perform, types of agent the species can produce, the growth rate of that species etc (there is some analogy with genetic data). A process can convert some combination of compounds into some other combination of compounds.

For each Patch:
.For each Species:
…compute the population number for that species, = locked compounds / dry weight of a single individual
…compute the surface area of the species, = population number * surface area
…absorb compounds from the environment, = f(concentration gradient, surface area, specialised organelles)
…run this species processes (respiration, photosynthesis etc)
…compute predation relations with all other species in the environment, = f(blueprint of this species, blueprint of other species, number of this species, number of other species) [1]
…growth: transfer some compounds from the free bin in to the locked bin, = f(growth rate, amount of free compounds, blueprint) [2]
…non-predation death: transfer some compounds from the locked bin in to the environment, = f(death rate, amount of locked compounds, blueprint)

.Auto-Evo:
…select a species
…make 5 clones of the patch and in each one alter the selected species’ blueprint in a different way
…run all 5 simulations forward in time and see in which version the selected species has the highest population number
…select this adaptation and disregard the rest

.cleanup
…if a species population number has fallen below a certain threshold delete it
…for each compounds sum the amount in the patch and add or subtract the excess to the environment (what to do if there is 0 in the environment and you need to take some away?)
…if there are free slots in the patch and the most successful species has more than a threshold amount (say 10%) of the compounds in that patch split it in two and evolve both a little in a random direction [4]
…if a patch is empty then give it a high priority for migration next turn

Compute patch to patch relationships:
.select two patches, one target and one source (weighted by the relationships between the patches, such as ocean currents)
.if there is a slot in the target (see below) create 5 clones of the target patch
.in each clone (n) move some percentage of the nth most successful species from the source patch to the target patch and run the simulation forwards in time
.select the species which Thrived best in the target patch and move some of it there, disregard the rest [3]

The compounds and species the player comes into contact with is derived from the model for their patch.

Analysis:

This system is quite stable. The largest population number a species can have is when it contains the entire patches compounds in its locked bin and is therefore finite. The largest growth rate a species can experience is when it contains the entire patches compounds in its free bin and is therefore finite. The amount of compounds globally is a conserved quantity (though could be slaved to a planetary climate model which would change these amounts over time, for example).

The amount of species is bounded by each patch having only a finite number of slots and there being a finite number of patches. One species cannot completely dominate a patch as it will be divided (though this needs more discussion). What happens when two species in the same patch are evolved to have the same blueprint? Are they merged? Maybe there will be enough parameters that this will never happen.

[1] there hasn’t been much discussion of the predation equations. The question to be answered is “how much of species a can species b eat per unit time?” The result should be between -1 and 1 (and should be symmetrical). This is will be a complex formula based on agents produced, speed of the cells, flocking behaviour etc. Also spillage needs to be considered. What will happen is when a eats b some of b’s compounds will be transferred to a and some will be lost in the environment. We can start with a simple formula and slowly make it more complex. It is highly desirable for the model to be as accurate as possible to the players experience of playing the game.

[2] when a species grows the compounds should be locked away in the correct proportion according to the blueprint. So maybe you need 3x more protein that fat, for example.

[3] how does this fit with balancing the amount of compounds in the patches? when a species migrates is there a compound transfer or just a blueprint transfer?

[4] we haven’t talked about this that much but it is important to prevent one species taking over too much.

3 Likes

TL;DR I think you did a very good job at laying out what the CPA system is supposed to do (with a few minor suggestions from me). All we have to do know is make a thread for each point on your list and figure out how exactly it will work.

This too quite a bit of time to respond, but it was a giant wall of text.

I just want a clarification about this. So locked compounds are unusable by the cell for processes, but they are partially transferred to the predator when the cell is eaten. Free compounds, on the other hand, can be used for processes, and they are also partially transferred to the predator when the cell is eaten. Right?

What is the benefit of storing the amount of locked compounds per species vs the population number? It is my understand that these two are directly related with proportionality constant equal to the number of compounds locked per cell. I feel that using population numbers will be most intuitive to use in predation equations and others, but I might be completely off on this.
Edit: never mind, I got it after reading deeper into the post.

By “for each Patch” you mean the 8 patches neighboring the one the player is in. Right?! I think actually doing the calculations for the hundreds of patches that there are would be too computationally expensive and completely not worth it. We might decide to add an even greater level of abstraction to the farther away patches in the future, but for right now I think it would be best to limit it to calculating only the patches directly adjacent to the player. Also, what exactly is a patch? Are they square regions that we arbitrary defined, or are they like biomes, with irregular shapes and sizes? If it’s the latter, I think it might be a problem to having the same number of species in a 2 by 2 patch around a breadcrumb vs a 1000 by 1000 patch around hydrothermal vents. On the other hand, having 30 species in each patch might actually be the way to go since we need a few species to occupy the primary producer tertiary level, a few to occupy the primary consumers level, and so on, or else we won’t have a stable ecosystem.

What do you mean by concentration gradient? Isn’t it going to be different at every pixel? I think a better way to do this would be to look at the total amount of free compounds floating in the patch (not relative to anything) and multiple it by surface area and some absorption constant.

I’m guessing that this will just transfer locked compounds from one species to another. We should probably not think about this much until we actually have agents and a combat mechanic.

Is this individual cell growth or population growth?

Nice!

I don’t really understand what this is doing.

1. I think it would be best to not do anything about predation equations since we currently don’t really have a predation system
2. I think it should be a compound transfer (say 10% of the species migrated from one patch to another, taking 10% of all compounds with it) because if it is just blueprint, how will the species survive and how will the CPA system run for that species since it depends on amount of compounds locked.

Yes to basically everything you are saying!

First thing to say is population number is a real variable, so you can have any number of microbes in your species (even like 100.5). So growth means population growth rather than cell growth. It’s one of the assumptions of this system that all members of your species are identical (all adults with the same genetic makeup).

Yes, locked compounds are considered built into something in the organism (like into the cell wall or into the bones) and so can’t be processed by an organelle / organ. The reason to have locked and free compounds separate is that simply absorbing some compounds through the cell wall is not equal to population growth.

Yes population number and locked compounds are proportional to each other, so you only need to keep track of one of them, the other can be derived. However we are already tracking compounds, so why not only track compounds? The other option is to say “deduct 3 protein from the cell each time the population number increases by 1 and then each time the population decreases by 1 put 3 protein back into the environment,” which is basically the same as locked compounds.

I think we should determine how many patches you can have by working out how computationally intensive all this is. It may not be too bad because firstly this is all real number mathematics (which will be very fast) , secondly this can be multithreaded (because the patches are in parallel) and thirdly because this doesn’t need to run in real time. We want the “patches screen” (whatever that will look like) to evolve slowly in time but it can be quite slow. Another thing is that there is a strategy element involved in spreading your species, so you can have some of your microbes in multiple patches at once. So it would be nice to have a large strategy map you could slowly conquer. Moreover we had discussed (but it was only discussed) that there would be a tree of life and if your species died out you could play as any descendant species of your original one (because in all patches except the one you are playing in your species could be evolved by auto-evo). So maybe you spread from the coast to an ocean vent and the ocean vent microbes adapt and become a new species, but you are still cousins and if you are wiped out on the coast you could play at the vent. This could make some nice gameplay I think.

In terms of exactly what a patch is that is not decided. Either it’s something like a teaspoon size “sample,” so it’s as if you were a scientist and took a vial of liquid from different parts of the ocean and counted all the life in it. They are discrete samples representing a continuous landscape of life. Or patches are giant areas and the ocean is cut into patches so that every point of the ocean is in some patch. The only real difference is scaling of the numbers in the system (10 microbes per square meter is 10,000 microbes per 1000 square meters) so it doesn’t matter too much. The former is maybe more elegant (and as you say having them the same size might be helpful intuitively).

In this context concentration gradient means exactly what you are thinking, it’s basically partial pressure. So if you’re 10% CO2 on the inside of your cell and the ocean is 20% CO2 then you absorb some of it. There can be compounds which don’t work like this, for example with protein you can scoop it up without spilling yours using active transport over the cell wall. So as you say it is fine to say you can just get some small percentage (proportional to your surface area) each moment.

Predation is when one microbe eats another. So when it does this the locked and free for the prey are both lost and the predator gains some of both stores while some are lost to the environment. Like when a lion eats a gazelle, it gets some of it’s locked compounds (muscle) some of it’s free compounds (blood) and some of both is lost to the environment.

That part of the cleanup section of the code is just checking for computer errors. So if the amount of CO2 is over calculated by 0.00001% each time a calculation is done then over time it will drift upwards (especially because so many calculations are being done) so the cleanup phase sums the amount of CO2 in the patch and corrects the value if it has gone wrong. That way we know that we aren’t getting drift errors over time.

I agree that it’s not worth working on the predation equations until the combat in the game is finished. We want the equations to be as accurate as possible and so we need to model well. A placeholder could be; you get 1 “power” for having agents and 1 power for having pilli. If you are more powerful than your prey you get 0.1 of the prey each turn (proportional to both population numbers). That would be enough to have the system separate out prey and predators. We could then work on getting the equation better over time. (It would also be fine to build this with predation off).

The only issue with moving compounds when you move a species to a new patch is will some patches, over time, lose all their compounds and become wastelands? Somehow it would be nice if they could be conserved. Maybe when a species is moved only the blueprint is moved and they are given a starter amount of compounds (both locked and free) from the new patches environment. That would conserve things.

Quick post before I sleep:

Let’s not worry about the heuristics that go into computing the predation function right now. those depend heavily on game mechanics, and the model design should work independently of what in particular they are.

Instead, one place to start is with figuring out how each species wants to feed itself.

As for concentration gradients, a quick note: the environment could be modelled as a system that has a certain variability in compound quantities, without actually having to simulate the clouds for microbes to swim between. For example, look at the results of percolation theory. I would elaborate but I have a busy day tomorrow.

Yep, that’s what we decided.

Could you elaborate? Also, we’re still going to be simulating the compound clouds around the player, right?

I think that at least for the cell stage we should separate the patches into equal rectangular regions (say, 5 by 5 player screens).

I personally think that 55 would be far too small. I think around 1515 would be more appropriate. Also, perhaps a rectangular shape would feel a bit unnatural, especially if all are equal in size.

I completely messed that up—I meant to write 50 by 50 player screens. But now that I think about it, even that might be a little to small, after all, we are housing thousands of organisms from 30 different species in that area. So maybe around 100 by 100?

You’re probably right. It would look weird, and plus, what about people with different monitors, will their patches be bigger? So we should choose a defined sized and do 100 by 100 squares of that size. I’ve been meaning to write a post about why we would even need to break everything down into smaller squares for a while now, so hopefully I’ll do it in the next few days or so.

As I say patch size is not super important for the CPA, doubling the size of the patch simply means doubling the number of compounds in the patch. It doesn’t really change much else.

The way the systems are connected (this is where we got to last time we had this discussion) is that basically there is no feedback from the swimming around into the CPA. So if you die while swimming around it means nothing to your species overall. The way you influence the CPA is simply through the editor (as that’s the only freedom the ai organisms have).

Going the other way though the patch you experience while swimming around is slaved to the CPA model. So if your patch has 25% species A then 25% of the microbes you meet while swimming around will be species A. It would be best (and this is probably an unreachable ideal) if your experience of swimming around is so true to the CPA that you can intuitively upgrade your cell (like “I’m too slow to catch that prey I want to catch so more speed” etc) and then your species will do better as a result. So when you are better adapted so is your species. Getting towards this ideal requires good modelling. The processes modelling (like photosynthesis and protein synthesis etc) is the same at both the cell level and the species level. It’s mainly the predation equation where we need to get this right.

However I am hopeful we can get a good link between the two systems.

@TheCreator: you’re misunderstanding the relationship between the patch and the gameplay area. The patch is really big. The gameplay area is by necessity really small, despite being functionally infinite in all directions, because the scale of the game is just so tiny.

When I talk about modeling variability of compound availability in the patches, I’m saying absolutely nothing about what the gameplay area (eg, compound clouds) is like. Maybe we can try arguing over the distinction further in slack, it’s an important one but we get stuck in arguing this particular point too often already.

If a patch is a glass of water, then the gameplay area is where you watch a couple molecules of water bounce off each other. Patches are bulk systems.

One important thing, mathematically, to figuring out how we’ll compute the movements of compounds is the idea of ‘intra-patch variability’. We want to be able to model a patch as a bulk system, but we can’t model it as a uniform system, because a lot of species find their niches in exploiting the particulars of various distributions (for example, predators that feed off the young and weak). How should we do it? Pick a distribution that would work well enough and give us a lot of analytical flexibility? Use small arrays for histograms? I’d prefer the former, but so far through my reading I’m not sure if there’s an easy answer. For example, the normal distribution, which is otherwise generally very useful, isn’t too analytically friendly, and exponential distributions, or, say, fat-tailed distributions, have uses that are more specific…

I guess it would be good to figure out what kinds of variability we need to model (without explicitly, say, mapping them out). There are distributions in organism size, and other related stats, which shouldn’t be that hard, but there are also distributions in resources over space, and that’s the tricky bit. For example, given a resource distributed as random points (say, oases, or small islands), what’s the probability distribution of nearest-neighbour distance? That’s just an example of the kind of question we need to answer to be able to turn a complex system into a much more simple analytical equation.

Now, I don’t want to stop you guys from arguing over points that we’ve covered in long discussions before, but I’d like to be able to make some progress over where we left the model a few months ago (we’ve been engaging in intellectual navel-gazing on this front ever since imo).

@tjwhale I know you’re not wont to be pushy, but I think the discussion started the right way but needs to be focused on where we go from here, and not how we got to where we are, for anything to be accomplished here.

I’m not sure about intra-patch variability, I think it might make more sense to divide the patch into several. So in your example the desert would be one patch and the oasis would be another. I think trying to model them in the same one is going to be difficult.

The reason is that consider a single encounter between two members of different species (A and B). Either there is 10% chance this an oasis and a 90% chance this is in desert (or whatever) and over 1,000,000 interactions these probabilities will disappear and the patch will just be some interpolation between desert and oasis. It will be hard for auto-evo to know how to specialise a species into the desert or oasis as the signals will be mixed.

Or species A only hangs out in the oasis, and therefore it’s like modelling a sub-patch, so if you meet A you know you are in the oasis. So really there are two patches, one desert and one oasis and A is not in the desert patch.

I’m also not sure that it’s wise to have differentiation between the members of a species. I think every one should be considered as genetically identical adults. Modelling the combat in the microbe stage alone is going to be very complex, and doing it in the creature stages is already going to be nigh on impossible. And above I was saying how important that modelling is, so that the player intuitvely feels how to upgrade their species so they do better in the CPA, which is huge if we can achieve it.

This does exclude the possibility of, for example, wolves that can kill young bison but not adult ones. In this setup wolves would be completely unable to kill bison as all would start as adults. This also means the model won’t distinguish the advantages and disadvantages of having an egg vs mammalian gestation. How important do you guys think these things are for the model.

Obviously in the running around it is important to let the player choose these things. When modelling them if you hunt species A and it’s a youth for 10% of it’s life then 10% of the time you meet a youth rather than an adult and so this is functionally equivalent to a species which is ~95% as strong as one which is all adults. Is the only issue the wolves vs bison one above or are there others?

@moopli - I agree that we need to move the conversation forward, and realise that some of what I’m about to say has been discussed before, but I’m hoping that I can clarify a few points which should make further decisions easier. I’ll try and discuss new stuff at the end of the post/next post.

I largely agree with the summary in the OP. We’ve discussed before whether we should calculate population from compounds or vice versa, I’m in favour of the reverse because it allows us to represent over- or under-nourished populations, but as per the post above, do we really need (and can we use) that much detail? The only obvious use is as a measure of fitness - if all (or the average) individuals are under their blueprint weight, then its species is struggling. We have alternative ways of measuring fitness though, such as birth/death rate. It could also help models forms of predation which aren’t absolute (a kills b, eats b, b’s population goes down by x per y population of a), such as parasitism (a feeds on b, b’s average weight goes down by x% per y population of a), though again, averaged over a large enough population there’s not much difference between the approaches.

Regarding patch shape, I discussed this on slack some time ago. Having regularly shaped (square or hexagonal grid) patches simplifies some things, but both misrepresents the actual shape of biomes*, and is a waste of resources where some biomes are large and uniform, and others much smaller (forcing a high resolution). What we’ll most likely use is an irregularly shaped map, with each patch representing a subregion of a biome (i.e.: each patch is uniform, each biome may contain many patches). A rough algorithm for this is to use a particle swarm, with each particle avoiding others, and more strongly avoiding biome boundaries, and then using voronoi to create a patch around each particle. e.g.: http://www.wblut.com/blog/wp-content/2008/04/voronoi-2696.png (note that this image appears to be a nested voronoi, which is cool, and gives me all sorts of other ideas, but what I’m suggesting here is just a plain voronoi). This doesn’t work great for the oases in the desert scenario (as oases are small, discrete biomes, we’d need a patch for each one, which is ok, but the voronoi structure of the desert around it would be wasteful), which can be solved either by tweaking the patch generation algorithm, or by handling intra-patch variation (will get onto this later).

• clarification: the game won’t have defined biomes, I use biome as a shorthand for a contiguous area with a relatively uniform set of environmental conditions.

As for the question of migration, and the resulting compound flow between biomes, I think its necessary for the migrants to take their compounds with them (which should be a relatively small % of either patches’ contents). I also think its necessary for some compounds to be mobile between patches (anything atmospheric or dissolved in a common waterbody, not so much anything contained in the soil, unless we model rivers to the degree necessary to move soil nutrients around), which should prevent depletion or build up of most, but not all compounds. In the remaining cases, would it be unrealistic if migration to more favourable neighbouring patches caused a depletion of nutrients (i.e.: proteins etc, mostly contained either in living things or soil detritus) effectively turning a patch into a wasteland?

One part of the OP I don’t agree with (along with some of its implications), is the idea of a patch having species ‘slots’. This derives from a very old and outdated concept, where each patch would have a number of small/medium/large plant/herbivore/carnivore slots (in order to force a food chain, which we no longer need to do thanks to population dynamics). The idea has probably persisted because we often refer to the 30 species limit per patch (which is a computational, rather than design limitation, as species interactions scale as the square of the number of species, we can probably have a few dozen species per patch, but not many more). What this specifically doesn’t mean is that a patch is underfilled if it has less than 30 species - i.e.: I don’t think migration should be more likely, or speciation possible, if a patch has less than N species. Migration should be a mostly random occurence, controlled by whether the migrating species can succeed in the target patch (either by checking this before allowing migration, or migrating regardless and then seeing if it survives long enough to become established). In effect the habitability of a patch should control the number of species present, not an arbitrary target. We may need to set a hard limit of 30 species to prevent any patch becoming too large, but that’s all it should be used for.

For speciation, I don’t think its realistic for this to happen within a patch, as no real population arbitrarily divides itself in two and the evolves differently. Unless theres a geographical seperation, or the subspecies are already too distinct to interbreed, crossbreeding will keep the two populations genetically mixed. I think the separate mutation of populations in different patches is enough to model speciation, we shouldn’t also be doing it within a patch.

Regarding diffentiation within a species (beyond simple size), I’ve always wanted to treat different life stages as separate populations, so that for a species you have, e.g.:

• a population of eggs, that has no interaction with its environment (other than being a food source), is added to by reproduction, and removed from by hatching
• a population of helpless infants, fed (parasitically) from the adults, added to by hatching, removed from by maturation, very easy to catch/kill
• a population of pre-reproductive juveniles, that are able to feed for themselves, somewhat more capable of defending themselves
• a population of reproductive adults, that produce eggs, and feed infants
• a population of post-reproductive adults, which may be more frail, but may also help feed infants
This significantly increases the number of ‘species’ CPA has to deal with, so it may not be computationally feasible, but if it is we have meaningful life stages, and meaningful choices about a species life history.

A few other minor points, then I’ll get onto some new/more productive stuff:

• re: [4] in OP, I don’t think theres any need to prevent over-dominance, in the same way (discussed long ago) that we don’t need to prevent over-stability in a species (e.g.: sharks and crocs, which have barely changed in millions of years). If it happens organically, there are realistic equivelants (not least of which: man, and the player may need a similar level of dominance to become civilised), we just need to avoid the model producing this situation (and many other non-fun situations) too often.
• re: the patch size during gameplay, this should be infinite. Atleast in the cell stage you shouldn’t be able to move between patches within your lifetime, this doesn’t really change until you get migrating birds/large mammals. One exception is gameplay on a patch boundary (e.g.: intertidal zone), though we can have these be separate patches, where the boundary is distinct enough to offer a new niche/biome.

Intra-patch variability is one of the major unresolved issues with CPA, especially as its results in possibly the most marked difference between gameplay and CPA simulations - in game you never interact with the ‘average’ environment, you either seek out or avoid high/low concentrations of a particular environmental variable (EV) (whether that be light/heat/acidity, or a particular compound). Averaging out the impact of variable concentrations on CPA would work if the position of individual organisms was uniformly random, and the distribution of each EV underlying the spatial positions of members of a population also uniform (or normal) random. The problem is two-fold, 1: as above individuals react to these EV’s, either seeking out or avoiding them, so that the relationship between the population and each EV is not independant, 2: likewise, EV’s aren’t independant, it’s likely to be warmer where it’s lighter, oxygen concentrations will be higher where theres light, CO2 concentrations are likely to be higher where oxygen concentrations are lower, and some EV’s will have concentrations dependent on two or more other EV’s, or not simply proportional (e.g.: a compound which is rapidly produced from two others, by process bacteria, would be most concentrated on the interface between a high concentration area of the two compounds, but would prevent those compounds from ever being in high concnetrations together). Because of this, we cannot simply average according to overall concentration, or by trying to calculate the proportion time spent, by what proportion of the population, in high and low concentrations of each compound, because those times aren’t independant.
Example:
If 50% of a patch is high in O2, and 50% of that patch is high in glucose, time averaging would suggest your spending 25% unable to respire (o2 + glucose -> co2 + water) because you have access to neither, 25% with o2 but no glucose, 25% with glucose but no o2, and only 25% with both available. Most likely, o2 and glucose concentrations will be related, as they come primarily from the same source (photosynthesisers), and you’d end up with roughly 50% of the patch suitable for respiration. Moreover, that population, if it is an obligate aerobe, will spend as much time as physically possible in that area, leading both to an association between that species presence and those conditions (which would affect other species’ interactions with it, e.g.: largely preventing predation of/by an obligate anaerobe), and a nearly 100% time spent able to respire, even though the average methods suggests it’d be 25%.
Something similar happens when 10% of a desert is oases, where there is a discrete association between several EV’s, species, etc. In that case this could very easily be handled with separate patches, but there will be many more fuzzy/continuous examples where splitting into patches isn’t an option, and a solution to the latter would also potentially solve the former.
The question then is whether we can deal with this? There’s no chance of mapping out all these relationships beforehand, especially as many of them will be changed by the species present. I can see ways of detecting/calculating many of them, but the amount of complexity this would add to CPA is a little intimidating. It does have one important benefit though - it would take a lot of input from species’ behaviour, what they avoid, what they seek out, where they like to rest/eat/congregate, what conditions they find acceptable for hunting, to some extent even whether the extend to which they behave as a herd. That information is something we’ve previously had difficulty fitting into CPA, and that would create another disconnect between gameplay and simulation.

Finally, the predation equations. We’ve been procrastinating on these for nearly two years (since the original population dynamics thread on the old forums), at first because a lot of detail is needed to make them work in an interesting way, and a lot of work to keep everything balanced, more recently because of the decision to slave gameplay to CPA, and the resulting need to have the feedback into gameplay be both meaningful (i.e.: fun) and consistent/intuitive (i.e.: if you identify a weakness in gameplay, it should be relatively easy to pick a mutation that will help). Note that the above applies to all of the population dynamics equations, not just predation, but most of the others (e.g: compound absorbtion, excretion, basic metabolism, reproduction, mortality) are conceptually simpler because they only take into account a single species and its environment.

…at this point I’ve run out of time to write much more. Reading this thread, and writing the above, has made me realise just how much more work is needed before we can even write the CPA system implementation, I had thought that most of what needed to be done was fleshing out the data: what compounds will we have, what organelles, what processes, what kind of mutations/adaptations, what kind of inter-species interactions, etc. I’ll try to read up and reply on the other relevant threads on this forum, and perhaps start hashing out some predation equations in a new thread soon.

There is one particular case that cannot be handled purely through population numbers, and that is the problem of organisms needing specific nutrients. For example, herds of horses might converge on a salt lick because they need those electrolytes.

Conveniently enough, the example of a salt lick leads me to adressing the rest of your post:

You’ve gotten right to the core of what I was pointing out above, thank you – we can’t treat an ecosystem like a uniform solution, with perfect, immediate mixing that lets us take local equations and apply them on global averages.

What I suggest is using a matrix of “convergence”, which would essentially be a measure of how much the distribution of thing X matches the distribution of thing Y. If we can figure out rules for modifying these convergences, we will never need to plot out the distributions of X, Y, Z or anything else. Some basic rules might be “X depending positively on Y means that X will act to increase its convergence with Y”, “Y depending negatively on X means that Y will act to minimize convergence with X”, etc. Properly constructed, such a measure of convergence could be used directly in the predatory equation, for example, to scale the number of interactions between predator and prey.

I’ll keep thinking about this, I suspect it might be key. One of you will probably have some useful insight so I’ll stop here for now.

I’m not yet convinced this is true.

To take the salt lick example what real difference does it make? The averaged equations will be able to determine that the horses are competing over the salt in the environment. When predators hunt the horses those which are fastest will survive better than those who are slow, the salt lick makes no difference there. Can you explain how, on average, the spatial location of the salt lick changes things?

What do you mean specifically when you say ‘population numbers’? I get that discrete elements (salt licks, oases, carcasses, etc.) cannot necessarily be handled in the same way as continuous (dis-)associations between compounds, but I feel like your saying something more with this sentence and I’m not grasping it.

Your solution of a convergence matrix is similar to what I had in mind, though I hadn’t thought of implementing it quite so cleanly. I do think though, that we can extend it to cover discrete elements, though we’ll lose the neatness of a single matrix. I’m also not sure a matrix can encapsulate more complex relationships (A is present only when both B and C are, or A is present when B XOR C are, but not both), though at the same time I’m not sure we need to (or can justify that much added complexity). Whatever solution we come up with, regardless of how it works internally, should probably produce relatively simple output (given X distribution of other compounds, compound A has distribution Y, though I don’t think that’s quite the function form we’re after), and hide any internal complexity, otherwise that complexity will spill over into all dependent systems (i.e.: the rest of CPA).

@tjwhale - the difference it makes is in being able to represent some of the scenarios you see during gameplay, salt licks may not be the best example, so I’ll use watering holes. Watering holes are the only source of drinking water during the dry season, every animal is forced to drink there, making it a hotspot for predators, especially those specifically adapted (e.g.: crocs). If we average out the presence of water you’ll end up with too little everywhere (when actually there is enough, if you know where to find it) resulting in ubiquitous low level dehydration (when it should be patchy and extreme), animals uniformly spread over the arid landscape (when they should be concentrated around, and between, watering holes), and predator-prey collisions being very rare (when they should be frequent, though concentrated). All of that has knock on affects for auto-evo: predators don’t specialise to hunt in/near watering holes; prey aren’t adapted to be alert to ambush; all animals adapt to tolerate minor dehydration, instead of adapting to find watering holes and store enough water to move between them; and predators become adapted to making the most of rare encounters with weakened prey (i.e.: long, stamina draining chases), instead of riskier ambushes on healthy prey. All of that then feeds back to gameplay, where the AI species appear poorly adapted (both physically and behaviourally) for the environment they’re in.

As for how we deal with discrete heterogeneities (I think continuous ones are relatively well dealt with by moopli’s convergence matrix), we should be able to work from a number, or density of discrete objects up (so x watering holes per square kilometer, or y cell carcasses per square micrometer in the cell stage). Each object contains a number of compounds, which we could average over that area. Instead, we can calculate an encounter rate between objects and organisms, based on each species detection ability, and behavioural attraction, for that object. This calculation could vary with its own convergence matrix for relations between discrete objects and continuous environment variables (e.g.: no watering holes where drainage is high), and vice versa (oases increase (or rather, concentrate) the local soil moisture content). Once we have that encounter rate, we have something that’s starting to look like a predation (or other species interaction) equation, and we can calculate the time spent by species drinking/salt licking/etc, and from that the proportion of predations that happen during such activities, which skews the evolution of both predator and prey to adapt to that situation. Exactly how that can all be done I’m not sure, because I currently see information flowing both up and down-stream in too many places to make for a nice implementation, but it looks doable.

One big implementation concern I have is whether we end up having to calculate population level changes for each possible situation (near a watering hole, distant from one, in high light levels, in low light/high co2 areas, etc.), or whether we can combine information further up the process (i.e: before it enters population dynamics) and achieve the same result, as otherwise we may have a computational issue.

Well there is going to have to be an “ambush” strategy where a slow creature can kill a fast one (for spiders and flies, for example) so I imagine a crocodile would use that. Moreover presumably the main defence against ambush will be “alertness” which will need to be a species parameter (costs energy and is boosted by certain organs). So you could have crocs vs gazelles in the averaged situation.

I don’t quite understand the “convergence matrix”, does it contain a single number as to how closely correlated two distributions are? So 1 means everywhere there is water there is a fruit tree and -1 means there are no fruit trees where there is water? Surely you still need to know at lease something about one of the distributions? Even if you know fruit trees and water are 100% correlated you need to know whether water is evenly spread or clumpy, at least?

How would this variability be fitted with compound absorption / uptake? So I am a species of gazelle and I want to drink some water. Do I do some sort of check to see how well my distribution overlaps with that of the water? How is my distributions stored and compared to that of the water? Do I have some kind of internal parameter which says “stay near the water x% of the time?” Do I have such a parameter for every compound / resource? What if they conflict, so I am told to stay near water and fruit trees but no fruit trees are near water?

Edit: Also on the issue of slots I mean only a cap on the number of species per patch for computational reasons.

Edit 2: I’m also concerned about non-uniqueness. So in this image both set-ups have 50% overlap between the colours. But in the first they all overlap in the same box, in the second there is no place where all 3 overlap.

http://imgur.com/z9aQqfe

The issue is that in the above example, when the environment is averaged, the need for (and effectiveness of) an ambush strategy isn’t represented by the model, so no species (the croc) will ever evolve to use it. The need might be obvious in gameplay, and to the player, but in the abstracted world experienced by the AI auto-evo, there would be no evidence of animals congregating in one area, typically to drink (and therefore distracted, even if also alert).

I’m also not entirely clear how the convergence matrix would be applied, but the data it contains (pretty much as you describe it) should be enough. I think it’s not so much saying ‘given distribution x of compound a, distribution of compound b will be y’, but ‘you can have compounds a and b together, but to get c you need to spend a proportion of your time away from both’, which leads to a behavioural choice about which is most important (or more likely, which balance is optimal).

I’m actually not sure if it matters how uniform or ‘clumpy’ the individual distributions are, I feel like it should make a difference, but other than the extreme example of discrete objects, I’m not sure it does? Would compounds which have very strong dis-/associations with other compounds effectively be more clumped, and vice versa, thereby making the uniform-clumpy measure redundant?

I don’t think we need to calculate actual distributions at any point (or at least, that’s the ideal). I think what you need is your own association/convergence matrix that represents what proportion of an individuals time is spent in each environment (able to absorb each compound, where that compounds concentration is an ‘environmental variable’). This would be based partly on behavioural choices (‘I avoid dense vegetation/shade because its harder to see predators’), and partly from physical characteristics (‘I’m not often found in low oxygen areas because I’d suffocate and die there’). Combining the data from these two (is this a simple matrix multiplication?) gives you the actual time spent in each environment, which will differ from both the average environmental state, and the species preference, such that some aspects of the preference will need to be exaggerated to get the desired outcome (if water is scarce you’ll need a very high preference for spending time there to get enough). If the parameters conflict, you’ll need to split your time between the two, e.g.: drinking in the waterhole to get water, then heading back to the relative safety of the wasteland to avoid predation.

I realise this is a little hand-wavy at this point, I’m not yet sure how it would work (maybe moopli has a better idea, as he suggested convergence matrices). I really only have an intuitive feeling that we can produce the necessary data to drive this, and process it in such a way as to add meaningfully to the current CPA framework, but that’s how all these things start. The only two possible setbacks I see are computational, and whether we actually need this added complexity. We can’t answer the first without a more explicit system, and for the second, I feel that this would be one way of adding significant depth to the behavioural choices made for each species.

re: edit 1 - that’s fine, I’m just uncomfortable with using slots to say anything other than ‘we can’t afford to simulate any more species here’.

re: edit 2 - I can’t see the image, but I’m assuming you have three EV’s: A, B, C, each with a specified 50% overlap with each of the others, and two distributions a little like this:

this is kind of what I was getting at when I asked whether a convergence matrix could represent more complicated relationships. I’ll let moopli answer that (as I’m only guessing at what a convergence matrix actually is), but I suspect not. I’m also not entirely sure whether or not it matters…

note - I’m having issues accessing anything on imgur (my ISP appears to be censoring it), can you host images anywhere else (the public folder of your dropbox works well for this, alternatively use the upload option on the forum itself)?

You have the diagram exact! Very nice.

Ok how about this, split the patch up into n areas. For each area define the amount of compounds in that area as a column of a matrix. Have each species have a vector which represents how much time they spend in each area. When predation happens take a dot product of the two vectors to determine how much overlap there is. Here’s a diagram, the top line is computing the total compounds available to you depending on where you spend your time, the bottom line is computing your predation overlap with a predator that spends all it’s time in sub area 1.

I guess this leads the the question “how do we compute the overlap between compounds?” Are we going to set this in advance or let it be procedural?

(Also I think that’s the biggest dot product symbol of all time, Thrive project, setting new records )

That’s a little more like it, but you have an arbitrary number of areas (where n is equal to the number of EVs). I wonder if you could achieve the same with just the convergence matrix, so that rather than saying ‘area 1 has most of the water, but also some of compounds B, C)’, and then splitting your time to get the compound balance you want, your saying something more like ‘water is associated with compounds B and C by degrees X and Y’, and then noticing that you get so much of B and C when gathering water, and adjust the amount of time you spend gathering each accordingly.

There’s very little difference between the two approaches, the math you show basically wouldn’t change, but it removes the idea of an arbitrary number of areas within a patch.

As for how we calculate overlap, I think we have to do it procedurally. If we know that most of the oxygen in a patch comes from photosynthesisers, and that they either spend most of their time in the light, or that they only photosynthesise when in light, then we know that most of the oxygen must be in areas that are light. If the turnover of oxygen is low (i.e.: there is much more of it in the environment than regularly gets cycled through organisms) then it will be more diffuse, and more present in areas that aren’t producing it, as it has had time to diffuse there. We can do similar calculations if we know that most environmental protein comes from cell carcasses, and that most of these are produced from species X under predation from species Y, and that these two species most often collide in conditions Z, etc. Obviously there’s a fair bit of work to do there, and there’s a risk of some circular calculations.

I’m still not sure if we also need to know the individual distribution of each EV (independent of other EVs), most likely expressed as a histogram of densities vs area of the patch at that density (or an equation for that histogram), or how we would use that data.

The number of rows of the matrix is the number of ev’s but the number of areas can be unrelated. It’s an N x M matrix multiplying an M vector where N is the number of ev’s and M is the number of areas.

Good point, for some reason I assumed we might be multiplying multiple of these matrices, but that probably wouldn’t apply. It still leaves us with an arbitrary number of areas, which I’m not sure how we’d determine?