Advanced topics#
Executing multiple replicates in parallel using concurrent.futures
#
You have many options when it comes to running many replicates of a model. For example, you could write a single Python script plus a shell script that your local cluster understands, sending your simulation to hundreds of compute nodes.
This section introduces concurrent.futures
as a method to execute
many replicates in separate Python processes using a single script.
Fully worked-out examples of varying complexity can be found here:
A common set of idioms used by these examples are:
The function to run the simulation typically does not return the instance of
fwdpy11.DiploidPopulation
. Such a return requires pickling, which is very expensive. If you need to store these population, write them to files rather than returning them.If you do return something to the main “collector” process, keep it simple!
Random number seeds are set in
__main__
usingnumpy.random.randint()
after callingnumpy.random.seed()
using a user-provided seed.
How long to run the simulation#
Deciding how long to run a simulation depends on whether or not a “burn in” phase is required to get the model to statistical equilibrium. For example, if you want to get the steady-state properties of a model with many selected mutations and linkage, you will need to burn in. If you want to know how the short term dynamics of the neutral site-frequency spectrum are affected by suddenly introducing strongly-beneficial mutations, then you may not need to burn in.
So, then, how long to burn in? Clearly, the answer should be “the least that you have to”! As always, there are a few different things to consider:
You want the distribution of final outcomes to be “correct”. Theory often gives us expressions for expectations and sometimes for variances of things we care about, like the number of mutations in a sample. However, comparing distributions will often require a statistical analysis comparing the output of independent simulations. For example, for a neutral model, the distribution of the number of mutations under an infinitely-many sites model for a small sample should match
msprime
very well. For a steady-state model of recurrent hitch-hiking [KHL89], results from small samples should match coalescent simulations such as [KS16]. Clearly, all of the different methods should match analytical results where possible.You probably want all of your trees to be fully coalesced to a single common ancestor. The time back to a final ancestor has a very long tail. In models with recombination, it is not uncommon to have a tree or three with multiple roots at the end of a simulation.
For the first point, a burn-in of 10N
generations or so
is usually sufficient. Before we simulated with tree sequence
recording, we simulated entire “genomes” (neutral mutations and all),
which was rather slow. (Here “we” is the field in general.) We then
(hopefully!) compared our results to something like msprime
. The
agreement was really good, so one answer is “about 10N
. I’m being
vague about N
here–typically, it would be equal to the starting effective
population size of your population (e.g. in the absence of selection).
With tree sequence recording, we run into the “uncoalesced marginal
trees” issue mentioned above. We could just simulate much longer
to get rid of this problem, but it is simply easier to start
with a tree sequence from msprime
.
The “Eyre-Walker” model of complex traits#
[EW10] describes a model relating mutations with fitness effect \(S = |2Ns|\) to \(z\), their effect on a phenotype/trait according to \(z = \delta S^\tau(1+\epsilon)\). Here, \(\epsilon\) is a draw from a Gaussian distribution with mean zero, \(\delta\) is \(1\) or \(-1\) with equal probability, and \(\tau\) “tunes” the correlation between fitness effect and effect on the trait.
Implementing this model is quite straightforward, as the trait values do not affect the dynamics of the model. For this flavor of a “pure pleiotropy” model, the technical details reduce to a standard population genetic model where we tack on the trait values at the end.
Let’s set up a model where \(S\) is exponentially distributed with mean -20
.
We’ll run a small population for a few generations. This model will be nowhere
near equilibrium, but we’re just using it as an example:
import fwdpy11
import numpy as np
N = 1000
sregions = [fwdpy11.ExpS(beg=0.0, end=1.0, weight=1.0, mean=-20, scaling=2 * N)]
recregions = [fwdpy11.PoissonInterval(beg=0.0, end=0.1, mean=1e-3)]
gvalue = fwdpy11.Multiplicative(scaling=2.0)
pdict = {
"nregions": [],
"sregions": sregions,
"recregions": recregions,
"rates": (0.0, 1e-3, None),
"gvalue": gvalue,
"prune_selected": False,
"demography": fwdpy11.ForwardDemesGraph.tubes([N], burnin=150, burnin_is_exact=True),
"simlen": 150,
}
params = fwdpy11.ModelParams(**pdict)
pop = fwdpy11.DiploidPopulation(N, 1.0)
rng = fwdpy11.GSLrng(54321)
fwdpy11.evolvets(rng, pop, params, 100)
print(len(pop.tables.mutations))
16
Define a function relating \(S\) to \(z\):
def getz(S, tau, sigma):
if np.random.uniform() < 0.5:
delta = 1
else:
delta = -1
epsilon = np.random.normal(loc=0, scale=sigma, size=1)[0]
return delta * np.power(np.abs(S), tau) * (1.0 + epsilon)
Apply the function and look at the results:
np.random.seed(101010)
zvals = {}
for i, m in enumerate(pop.tables.mutations):
zvals[m.key] = getz(2 * N * pop.mutations[m.key].s, 0.5, 1.0)
for k, v in zvals.items():
print(pop.mutations[k].s, v)
-0.009545936425350269 12.57306047991845
-0.010349104565670205 -2.2118130398591935
-0.011602139413926871 -7.084718822750244
-0.0032279509968761435 -1.737328013826212
-0.015449207258832993 -5.131892545676068
-0.0077168400201644664 4.512341872600712
-0.0022531736886723442 -4.096721449329499
-0.006044053890735736 -8.652700112679995
-0.0023465570676289957 0.6320311886584215
-0.0007746986635068075 2.2164750099025436
-0.021502039265014553 9.597464398344142
-0.009716042965590406 -7.443653191909565
-0.0008560999958330585 0.26396822983492146
-0.01597600966131924 4.994348870871331
-0.003548690338568886 1.3665737289445181
-0.005819804232148941 8.384444628613952
Traverse the tree sequence to get individual phenotypes under a strictly additive model:
phenotypes = np.zeros(pop.N)
node_to_individual = {}
for i, j in enumerate(pop.diploid_metadata):
assert j.nodes[0] not in node_to_individual
assert j.nodes[1] not in node_to_individual
node_to_individual[j.nodes[0]] = i
node_to_individual[j.nodes[1]] = i
ti = fwdpy11.TreeIterator(pop.tables, pop.alive_nodes, update_samples=True)
for t in ti:
for m in t.mutations():
for n in t.samples_below(m.node):
phenotypes[node_to_individual[n]] += zvals[m.key]
The trait value distribution is:
np.unique(phenotypes, return_counts=True)
(array([-17.30540023, -15.73741894, -11.18144027, -10.39002813,
-8.82204684, -8.65270011, -7.44365319, -7.08471882,
-5.13189255, -4.09672145, -2.57237695, -2.21181304,
-1.73732801, 0. , 0.26396823, 0.63203119,
1.26406238, 1.36657373, 1.63054196, 2.21647501,
4.51234187, 4.99434887, 5.62638006, 5.8789156 ,
8.38444463, 11.81393941, 12.57306048]),
array([ 1, 1, 1, 1, 2, 11, 22, 39, 2, 17, 1, 16, 28,
799, 6, 12, 1, 13, 1, 7, 13, 1, 1, 1, 1, 1,
1]))
The mean trait value and the genetic variance are:
phenotypes.mean()
phenotypes.var()
np.float64(5.8498620701229225)
For our final trick, let’s store them in the individual metadata:
md = np.array(pop.diploid_metadata, copy=False)
md["g"][:] = phenotypes
for i, j in zip(md["g"][:5], phenotypes[:5]):
print(i, j)
-1.737328013826212 -1.737328013826212
0.0 0.0
0.0 0.0
0.0 0.0
-7.084718822750244 -7.084718822750244
The trick is that the numpy
array is a non-owning array, meaning
that it is a simple “view” of the underlying C++
data. Further,
it happens to be read/write, and thus we can modify it. (Try not
to abuse this. More often than not, you’ll just break stuff.)
I feel that some comments about this model are warranted:
The model is a simplification of earlier work by Keightley and Hill ([KH88], [KH90]), which should be cited alongside [EW10].
[KH88] and [KH90]) show that this model predicts heritability (the genetic variance) is linear-ish with
N
, with the details depending somewhat on the model parameters. The biological reasonableness of that prediction is dubious. [JB05] discuss this point in some detail, and give other relevant references.Depending somewhat on the parameters of the
getz
function, one will eventually generate an intermediate-frequency variant with a massive \(z\), meaning that it would explain a considerable amount of the genic variance for the trait (high \(2pqz^2\)). Such outcomes are contrary to the results of human GWAS.It becomes tempting to start doing arbitrary things when applying this model to simulation output. For example, only treating mutations in certain frequency ranges as affecting trait values. However, doing so means the predictions made by such procedures are not natural outcomes of evolutionary models. Consider only applying the
getz
function to mutations with frequency \(< x\) in order to say something about “rare alleles”. This treatment of the data actually changes the model: mutations that are common now (at the end of the simulation) were rare at some point in the past, and had frequencies \(< x\). Therefore, the model is one where mutations suddenly stop affecting the trait once they hit some critical frequency. Presumably, the same variants would affect the trait again should they drift to a frequency below \(x\) if the simulation is run a bit longer.
Writing genetic value classes in Python#
Version 0.9.0
added the ability to write custom genetic value types in
Python. Such classes are useful for prototyping new models
and/or generating example models for teaching/illustrating concepts.
New in version 0.9.0.
Concepts#
The “dimensionality” of a model#
fwdpy11
supports both univariate and multivariate trait simulations.
Thus, a genetic value implicitly has a dimensionality. For the
case of a standard population-genetic simulation where mutations
directly affect fitness, that dimensionality is 1
. Simply put,
the genetic value is fitness.
For a simulation of mutations affecting n
traits, the dimensionality
is n
.
The dimensionality matters because we need to define a single genetic value
to populate the fwdpy11.DiploidMetadata.g
field for an offspring.
Doing so is simple for a one-dimensional trait. For a simulation with
pleiotropy, we have a decision to make. Should fwdpy11.DiploidMetadata.g
store the Euclidean distance from the optima? Or should it store the
individual’s value for some “focal” trait? The API
described here is
flexible enough to accommodate both.
The dimensionality of your genetic value to fitness map and your genetic value object must match. For the trait examples discussed above, this requirement should make good sense. For the case of modeling correlated effect sizes of mutations in different demes (see here), we still require this although it makes less sense. (Such simulations are limited to either the standard population genetic case or the case of selection on a single trait. But relaxing this requirement for some models but not others becomes problematic.)
The current version of this section only discusses one-dimensional genetic values.
Todo
Update for multi-dimensional genetic values once we get documentation in place for simulations of Gaussian stabilizing selection with pleiotropy.
Supporting time-dependent models#
Genetic value methods may have dynamic states that depend on some
state of the population. For example, something may change once
pop.generation > x
.
To support such time-dependent states, classes have a method with the following signature:
def update(self, pop: fwdpy11.DiploidPopulation) -> None:
pass
A type with no time-dependent state can in fact use the function shown
above. A concrete example is shown below,
in an example class called PyGSSRandomOptimum
. That example does not
use any information from pop
to update its state.
Performance#
It will be hard to get “awesome” performance with Python genetic
value classes. The culprit are abstract methods that get called
once per offspring. That’s simply a lot of Python/C++ boundary
crossing. The update
function described above is called only
once per generation, making it much less of a performance killer.
We’ve tried hard to make the data exchange from C++ to your Python
code as efficient as possible. We accomplished this by carefully
considering how we implement the data
argument that gets passed
in. Benchmarking showed that carefully designed proxies to the
underlying C++ data greatly improved run times.
Importantly, you do not have to write every component in Python!
If you are interested in additive effects models but a new “noise”
model, then use fwdpy11.Additive
along with your custom
type.
fwdpy11
provides the following helper functions to improve efficiency:
- fwdpy11.strict_additive_effects()#
This function computes the sum of effect sizes in a diploid genome, ignoring dominance. The return value is an offset from zero, so add 1.0 to convert to a fitness if needed.
- Parameters:
pop (fwdpy11.DiploidPopulation) – The population
metadata (fwdpy11.DiploidMetadata) – The offspring metadata
- Returns:
Sum of effect sizes
- Return type:
- fwdpy11.additive_effects()#
This function computes the sum of effect sizes in a diploid genome, accounting for dominance. The return value is an offset from zero, so add 1.0 to convert to a fitness if needed.
- Parameters:
pop (fwdpy11.DiploidPopulation) – The population
metadata (fwdpy11.DiploidMetadata) – The offspring metadata
- Returns:
Sum of effect sizes (accounting for heterozygous effects)
- Return type:
The first function is equivalent to the following Python code:
def strict_additive_effects(
pop: fwdpy11.DiploidPopulation,
metadata: fwdpy11.DiploidMetadata,
) -> float:
g = 0.0
dip = pop.diploids[metadata.label]
for i in [dip.first, dip.second]:
for k in pop.haploid_genomes[i].smutations:
g += pop.mutations[k].s
return g
The second function uses C++ code from fwdpp
to do the calculation
accounting for dominance effects.
Technical issues#
This section discusses some important technicalities.
Abstract base classes#
In object oriented lingo, the fwdpy11
base classes here are abstract
base classes or ABCs
. This means that they simply describe a required
interface. When you inherit from an ABC, you must fill in the details
of that interface.
Initializing the super class#
When writing object-oriented code, it is important that a derived
class properly initialize the base class. In Python, base
classes are often called superclasses
and derived classes are
subclasses
.
The idiomatic approach in Python 3
is:
class Base(object):
def __init__(self):
self.b = "I am a Base"
class Derived(Base):
def __init__(self):
self.d = "I am a Derived"
super(Derived, self).__init__()
d = Derived()
print(d.b)
I am a Base
Failure to do that super(...)
business would result in an
AttributeError
when trying to print d.b
.
However, the classes that you are writing are not straightforward
Python classes! While super(...)
may work, it is not
guaranteed. To quote the pybind11
documentation:
Note that a direct init constructor should be called, and super() should not be used. For simple cases of linear inheritance, super() may work, but once you begin mixing Python and C++ multiple inheritance, things will fall apart due to differences between Python’s MRO and C++’s mechanisms.
The source of this quote is the pybind11
manual section on
Classes.
The examples that follow take their advice.
attrs
#
The attrs package provides a library
of convenient class decorators that significantly reduce the amount
of “boiler plate” code needed to write Python
classes. Some of the examples below will use this library. You must
not use slots=True
! The superclasses
already use the slots mechanism,
and it seems difficult for a derived class to also use slots and properly
initialize the base.
Other than this caveat regarding slots, we enthusiastically recommend
attrs
.
Custom decorators#
Speaking of decorators, we provide several that make writing genetic value types a bit easier. You will see concrete examples using them later.
- @fwdpy11.custom_genetic_value_decorators.genetic_value_is_trait_default_clone(ndim=1)#
This decorator is a class that adds the ability for custom types to be passed to classes derived from
fwdpy11.DiploidGeneticValue
.Because this decorator is a class, you need the
()
to apply it.This decorator is required because of the very different object models of C++ and Python. See [here](pybind/pybind11#1049) for some of the gory details.
- @fwdpy11.custom_genetic_value_decorators.genetic_value_noise_default_clone#
This is the analog to
fwdpy11.custom_genetic_value_decorators.genetic_value_is_trait_default_clone
for noise objects inheriting fromfwdpy11.GeneticValueNoise
.Unlike its analog, this decorator is a function and not a class.
- @fwdpy11.custom_genetic_value_decorators.default_update#
This decorator adds the following function to
cls
, which is a no-op function:def update(pop: fwdpy11.DiploidPopulation): pass
Note
In the future, we will experiment with adding a C++ implementation of this no-op.
Genetic value to fitness maps#
Here, one derives a new class from fwdpy11.GeneticValueIsTrait
.
The basic form of such a class is:
import fwdpy11
import fwdpy11.custom_genetic_value_decorators
@fwdpy11.custom_genetic_value_decorators.genetic_value_is_trait_default_clone()
class MyGeneticValueToFitness(fwdpy11.GeneticValueIsTrait):
def __init__(self):
# Do NOT call super()!
fwdpy11.GeneticValueIsTrait.__init__(self)
def __call__(self, data: fwdpy11.DiploidGeneticValueToFitnessData) -> float:
pass
def update(pop: fwdpy11.DiploidPopulation) -> None:
pass
If the object does not need to manage an internal state that changes over time, we can simplify a bit with another decorator:
import fwdpy11
import fwdpy11.custom_genetic_value_decorators
@fwdpy11.custom_genetic_value_decorators.default_update
@fwdpy11.custom_genetic_value_decorators.genetic_value_is_trait_default_clone()
class MyGeneticValueToFitness(fwdpy11.GeneticValueIsTrait):
def __init__(self):
# Do NOT call super()!
fwdpy11.GeneticValueIsTrait.__init__(self)
def __call__(self, data: fwdpy11.DiploidGeneticValueToFitnessData) -> float:
pass
Two complete examples come from the test suite. The first example reimplements
fwdpy11.GaussianStabilizingSelection
in Python for a single trait.
The second example implements a model
where the optimum changes to a new value each generation. (Note that
the second model requires that numpy.random.seed()
be called elsewhere
so that results are reproducible.)
import math
import attr
import numpy as np
import fwdpy11
import fwdpy11.custom_genetic_value_decorators
from fwdpy11._types import ForwardDemesGraph
@fwdpy11.custom_genetic_value_decorators.default_update
@fwdpy11.custom_genetic_value_decorators.genetic_value_is_trait_default_clone()
@attr.s()
class PyGSS(fwdpy11.GeneticValueIsTrait):
opt = attr.ib()
VS = attr.ib()
def __attrs_post_init__(self):
fwdpy11.GeneticValueIsTrait.__init__(self)
def __call__(self, data: fwdpy11.DiploidGeneticValueToFitnessData) -> float:
return math.e ** (
-((data.offspring_metadata.g + data.offspring_metadata.e - self.opt) ** 2)
/ (2 * self.VS)
)
def validate_timings(self, deme: int, demography: ForwardDemesGraph) -> None:
pass
@fwdpy11.custom_genetic_value_decorators.genetic_value_is_trait_default_clone()
@attr.s()
class PyGSSRandomOptimum(fwdpy11.GeneticValueIsTrait):
opt = attr.ib()
VS = attr.ib()
def __attrs_post_init__(self):
fwdpy11.GeneticValueIsTrait.__init__(self)
self.optima = []
def __call__(self, data: fwdpy11.DiploidGeneticValueToFitnessData) -> float:
return math.e ** (
-((data.offspring_metadata.g + data.offspring_metadata.e - self.opt) ** 2)
/ (2 * self.VS)
)
def update(self, pop: fwdpy11.DiploidPopulation) -> None:
self.opt = np.random.normal(
0.0,
1.0,
1,
)[0]
self.optima.append((pop.generation, self.opt))
def validate_timings(self, deme: int, demography: ForwardDemesGraph) -> None:
pass
The data type#
The type passed into the __call__
function is:
- class fwdpy11.DiploidGeneticValueToFitnessData#
New in version 0.9.0.
This class supports the buffer protocol, which exposes the genetic values array. The most efficient access will be via a
memoryview
.Instances of this class have the following attributes:
- offspring_metadata#
An instance of
fwdpy11.DiploidMetadata
, giving you access to those fields that have been assigned to the offspring. (This is a copy of the metadata from the C++ side.)
- offspring_metadata_index#
A 64 bit integer that gives the location (index) of
offspring_metadata
infwdpy11.DiploidPopulation.diploid_metadata
. This index is useful in the event of mass migrations via copies, which can cause a mismatch betweenfwdpy11.DiploidMetadata.label
and this value.
- parent1_metadata#
The first parent’s metadata (
fwdpy11.DiploidMetadata
).
- parent2_metadata#
The second parent’s metadata (
fwdpy11.DiploidMetadata
).
Genetic value “noise”#
A custom “noise” class inherits from fwdpy11.GeneticValueNoise
.
A minimal implementation has the following form:
@fwdpy11.custom_genetic_value_decorators.genetic_value_noise_default_clone
class MyGeneticValueNoise(fwdpy11.GeneticValueNoise):
def __call__(self, data: fwdpy11.DiploidGeneticValueNoiseData) -> float:
pass
def update(pop: fwdpy11.DiploidPopulation) -> None:
pass
If your model can allow a no-op update
function, you may apply
the decorator
fwdpy11.custom_genetic_value_decorators.genetic_value_noise_default_clone
.
A simple example from the test suite implements Gaussian noise with mean zero
and standard deviation 0.1
:
import fwdpy11
import fwdpy11.custom_genetic_value_decorators
@fwdpy11.custom_genetic_value_decorators.default_update
@fwdpy11.custom_genetic_value_decorators.genetic_value_noise_default_clone
class PyNoise(fwdpy11.GeneticValueNoise):
def __init__(self):
fwdpy11.GeneticValueNoise.__init__(self)
def __call__(self, data: fwdpy11.DiploidGeneticValueNoiseData) -> float:
return fwdpy11.gsl_ran_gaussian_ziggurat(data.rng, 0.1)
See fwdpy11.gsl_ran_gaussian_ziggurat()
for details on that function.
The data type#
- class fwdpy11.DiploidGeneticValueNoiseData#
New in version 0.9.0.
Instances of this class have the following attributes:
- rng#
The simulation’s random number generation, an instance of
fwdpy11.GSLrng
- offspring_metadata#
An instance of
fwdpy11.DiploidMetadata
, giving you access to those fields that have been assigned to the offspring. (This is a copy of the metadata from the C++ side.)
- offspring_metadata_index#
A 64 bit integer that gives the location (index) of
offspring_metadata
infwdpy11.DiploidPopulation.diploid_metadata
. This index is useful in the event of mass migrations via copies, which can cause a mismatch betweenfwdpy11.DiploidMetadata.label
and this value.
- parent1_metadata#
The first parent’s metadata (
fwdpy11.DiploidMetadata
).
- parent2_metadata#
The first parent’s metadata (
fwdpy11.DiploidMetadata
).
New genetic value models#
The abstract base class#
The ABC type is fwdpy11.PyDiploidGeneticValue
:
- class fwdpy11.PyDiploidGeneticValue#
- __init__()#
- Parameters:
ndim (int) –
genetic_value_to_fitness (fwdpy11.GeneticValueIsTrait or None) –
noise (fwdpy11.GeneticValueNoise or None) –
- abstract calculate_gvalue()#
- Parameters:
data (fwdpy11.PyDiploidGeneticValueData) – Input data
- Returns:
The value to be stored in the offspring’s
fwdpy11.DiploidMetadata.g
- Return type:
- genetic_value_to_fitness()#
- Parameters:
data (fwdpy11.DiploidGeneticValueToFitnessData) – Input data
- Returns:
fitness
- Return type:
Note
This function does not need to be defined by derived classes most of the time. The default behavior is to apply the instance of
fwdpy11.GeneticValueToFitnessMap
stored by the instance offwdpy11.DiploidGeneticValue
(orfwdpy11.PyDiploidGeneticValue
). Defining this function in a derived class skips calling held instance in favor of the derived class implementation. In general, one only needs to derive this class for models where either individual genetic values depend on genotypes of the rest of the population. See here.
- update()#
- Parameters:
pop (fwdpy11.DiploidPopulation) – The population being simulated
- Return type:
None
Note
A default implementation can be defined using the decorator
fwdpy11.custom_genetic_value_decorators.genetic_value_noise_default_clone
.
Outline of user-defined classes#
The form of a custom genetic value type inherits from the ABC described above and would look something like this:
class MyGeneticValueObject(fwdpy11.PyDiploidGeneticValue):
def __init__(self, *args):
# Do not call super()!
fwdpy11.PyDiploidGeneticValue.__init__(...)
def calculate_gvalue(self, data: fwdpy11.PyDiploidGeneticValueData) -> float:
pass
def update(self, pop: fwdpy11.DiploidPopulation) -> None:
pass
Method requirements#
fwdpy11.PyDiploidGeneticValue
does provide a default update
function that updates the genetic-value-to-fitness-map and noise objects.
If your model can allow a no-op update
function, you may apply
the decorator
fwdpy11.custom_genetic_value_decorators.genetic_value_noise_default_clone
.
The genetic_value_to_fitness
function must completely replace the
functionality of an instance of fwdpy11.GeneticValueIsTrait
.
See here for details.
The calculate_gvalue
function is more complex because it has more
responsibilities:
It must return a value that will populate the genetic value field of the offspring’s metadata (
fwdpy11.DiploidMetadata.g
).It also needs to populate an array of
float
that represents the genetic values for all trait dimensions. See here. For the common case of a one-dimensional genetic value, you populate element0
of this array with value that you will return from this function. The “genetic values array” is accessible via the Python buffer protocol. The next section shows a concrete example.
Note
The steps described in 2
are populating
fwdpy11.DiploidGeneticValue.genetic_values
. In the
future, we may provide read-write access to that field, which
would constitute an API change. However, the path of least
resistance in the short term was to implement the current approach.
Example: strictly additive effects on a trait#
Here is an example of a strict additive model of a trait,
taken from the test suite. You’ll notice that we are accessing
arrays on the C++
using Python memoryview
objects.
This is a very efficient method, and much faster
than creating a temporary, non-owning numpy.ndarray
.
import fwdpy11
import fwdpy11.custom_genetic_value_decorators
@fwdpy11.custom_genetic_value_decorators.default_update
class PyAdditive(fwdpy11.PyDiploidGeneticValue):
def __init__(self, gvalue_to_fitness=None, noise=None):
fwdpy11.PyDiploidGeneticValue.__init__(self, 1, gvalue_to_fitness, noise)
def calculate_gvalue(self, data: fwdpy11.PyDiploidGeneticValueData) -> float:
s = fwdpy11.strict_additive_effects(data.pop, data.offspring_metadata)
memoryview(data)[0] = s
return s
If you want to account for the heterozygous effect of mutations, then the
most explicit approach is probably to use collections.Counter
:
def calculate_gvalue(self, data):
s = 0.0
c = Counter()
pop = data.pop
dip = data.diploids[data.offspring_metadata.label]
for g in [dip.first, dip.second]:
for k in pop.haploid_genomes[g].smutations:
m = pop.mutations[k]
c.update([(m.pos, m.s, m.h)])
for i, j in c.items():
if j == 1: # Aa
s += i[1] * i[2]
else: # aa
# This is a hard-coded "scaling"
# of 2
s += 2.0 * i[1]
# Use a memory view to update
# the genetic values array
memoryview(data)[0] = s
return s
For additive cases like this, though, you’ll get better performance
with fwdpy11.additive_effects()
.
The data types#
- class fwdpy11.PyDiploidGeneticValueData#
New in version 0.9.0.
This class supports the buffer protocol, which exposes the
genetic values array. The most efficient access will
be via a memoryview
.
Instances of this class have the following attributes:
- rng#
The simulation’s random number generation, an instance of
fwdpy11.GSLrng
- pop#
The population, an instance of
fwdpy11.DiploidPopulation
- offspring_metadata#
The offspring’s
fwdpy11.DiploidMetadata
- parent1_metadata#
The first parent’s
fwdpy11.DiploidMetadata
- parent2_metadata#
The second parent’s
fwdpy11.DiploidMetadata
- offspring_metadata_index#
A 64 bit integer that gives the location (index) of
offspring_metadata
infwdpy11.DiploidPopulation.diploid_metadata
. This index is useful in the event of mass migrations via copies, which can cause a mismatch betweenfwdpy11.DiploidMetadata.label
and this value.
Subtleties of starting/finishing tree sequences using msprime
#
Some care is required starting a simulation from a tree sequence (see here) or finishing a simulation with one (see here).
If simulating with discrete recombination positions, it may be important to realize that the current stable releases of msprime
model continuous recombination breakpoints.
A future msprime
release will support discretized positions.
When simulating with genome lengths longer than unity (1.0), determining the correct recombination rate for msprime
needs additional work.
The examples shown here and here show the following method for determining the msprime
recombination rate:
recombination_rate = rho / Ne / 4
That code implies a genome length of 1.0.
A more general example, for a genome of length L
would look like:
recombination_rate = rho / Ne / 4 / L