Technical overview of genetic value calculations#
A mock overview in Python#
This section illustrates the technical details of genetic value objects
by presenting a mock up of their API
in Python. To help document
key concepts, we make use of Python 3’s type hinting features from
typing
and decorators from abc
:
A class inheriting from
abc.ABC
is an abstract base class. It is not possible to create instances of ABCs. Rather, you must subclass them.A function decorated with
abc.abstractmethod()
is a part of an ABC’s interface that must be defined by a subclass. In other words, to be a validMockDiploidGeneticValue
, subclasses must define the abstract methods. In C++ terminology, these are “pure virtual” functions.A function decorated with
typing.final()
must not be redefined in a subclass. Doing so is considered an error, although Python will allow it.In Python, any undecorated functions are viewed as default implementations of the class’ behavior. A subclass may or may not redefine them as needed.
These concepts do not map perfectly from Python to C++, but it is close enough for now. Simply put, C++ is much stricter than Python.
The following code block shows a Python mock-up corresponding to
fwdpy11.DiploidGeneticValue
. (The next section shows the actual
definition of fwdpy11.DiploidGeneticValue
.)
This class is an abstract base class. The initialization semantics are:
The object is initialized with the dimensionality of the model, a genetic value to fitness map, and a noise function.
If the genetic value to fitness map is
None
, then the genetic value is assumed to be fitness itself (a “standard population genetic” simulation). Thus, we supplyfwdpy11.GeneticValueIsFitness
.If the noise object is
None
, then the model has no random effects and we supplyfwdpy11.NoNoise
.
The remaining elements of the class are its public interface, consisting of a mixture of final functions, abstract methods, and default implementations. Each function takes a single argument, most of which are declared as some sort of mocked data type. These data types hold data that are relevant to carrying out our calculations. Below, the C++ versions get defined and the Python versions are listed at the end of this section.
import abc
import typing
import attr
import numpy as np
import fwdpy11
@attr.s(auto_attribs=True)
class MockDiploidGeneticValue(abc.ABC):
ndim: int
genetic_value_to_fitness_map: typing.Union[None, fwdpy11.GeneticValueToFitnessMap]
noise_function: typing.Union[None, fwdpy11.GeneticValueNoise]
def __attrs_post_init__(self):
self.gvalues = np.zeros(self.ndim)
if self.genetic_value_to_fitness_map is None:
self.genetic_value_to_fitness_map = fwdpy11.GeneticValueIsFitness(self.ndim)
if self.noise_function is None:
self.noise_function = fwdpy11.NoNoise()
@typing.final
def __call__(self, data: MockDiploidGeneticValueData) -> None:
# Set the genetic value metadata for offspring
data.offspring_metadata.g = self.calculate_gvalue(data)
# Set the random effects metadata for offspring
data.offspring_metadata.e = self.noise(MockDiploidGeneticValueNoiseData(data))
# Set the fitness metadata for offspring
data.offspring_metadata.w = self.genetic_value_to_fitness(
MockDiploidGeneticValueToFitnessData(data)
)
@abc.abstractmethod
def calculate_gvalue(self, data: MockDiploidGeneticValueData) -> float:
"""
This function must populate self.gvalues
and return a single value for the offspring's DiploidMetadata.g
"""
raise NotImplementedError("calculate_gvalue not implemented")
def genetic_value_to_fitness(
self, data: MockDiploidGeneticValueToFitnessData
) -> float:
return self.genetic_value_to_fitness_map(data)
def noise(self, data: MockDiploidGeneticValueNoiseData) -> float:
return self.noise_function(data)
@abc.abstractmethod
def update(self, pop: fwdpy11.DiploidPopulation) -> None:
raise NotImplementedError("update not implemented")
To make things complete, a Python analogue of fwdpy11.GeneticValueToFitnessMap
would look like this:
@attr.s(auto_attribs=True)
class MockGeneticValueToFitnessMap(abc.ABC):
ndim: int
@abc.abstractmethod
def __call__(self, data: MockDiploidGeneticValueToFitnessData) -> float:
raise NotImplementedError("__call__ not implemented")
@abc.abstractmethod
def update(self, pop: fwdpy11.DiploidPopulation) -> None:
raise NotImplementedError("update not implemented")
Similarly, this is how we’d implement fwdpy11.GeneticValueNoise
in Python:
class MockGeneticValueNoise(abc.ABC):
@abc.abstractmethod
def __call__(self, data: MockDiploidGeneticValueNoiseData) -> float:
raise NotImplementedError("__call__ not implemented")
@abc.abstractmethod
def update(self, pop: fwdpy11.DiploidPopulation) -> None:
raise NotImplementedError("update not implemented")
Behavior during a simulation#
During a simulation, the following operations happen:
gv.update(pop)
gv.genetic_value_to_fitness_map.update(pop)
gv.noise_function.update(pop)
#offspring_metadata is a list of fwdpy11.DiploidMetadata
: for o in offspring_metadata:
#Execute the calculations for each offspring
: gv(MockDiploidGeneticValueData(o, pop))
The definition of our classes and the way that update
functions are applied
implies that all three objects are updated independently from one another.
This independence covers a large number of use cases. For example, a moving
trait optimum (fwdpy11.GaussianStabilizingSelection
) only needs acces to
fwdpy11.DiploidPopulation.generation
to know if it needs to update its
internal state.
For some models, this independent updating doesn’t do the job. For example, if
fitness depends on an individual’s genetic value in comparison to some or all
other genetic values in the population (as may be the case for models of
social interactions), then we don’t have an easy way to
implement that using a class derived from fwdpy11.GeneticValueIsTrait
.
To handle these more complex models, subclasses may simply override the
functions genetic_value_to_fitness
and/or noise
as needed.
“Snowdrift” models of social interactions are one example where this method
is necessary (see here for a C++ implementation
and here for a Python version).
The data objects#
The member functions of our mock classes take different argument types. These types hold things like the offspring metadata, parental metadata, etc., that may be needed to calculate return values.
For a genetic value object written in Python:
MockDiploidGeneticValueData
isfwdpy11.PyDiploidGeneticValueData
.MockDiploidGeneticValueToFitnessData
isfwdpy11.DiploidGeneticValueToFitnessData
.MockDiploidGeneticValueNoiseData
isfwdpy11.DiploidGeneticValueNoiseData
.
The data types closely match those used by the C++ API, which are
described in the next section. The difference is that
fwdpy11.PyDiploidGeneticValueData
supports the buffer protocol
in order to have write access to fwdpy11.DiploidGeneticValue.genetic_values
.
For the other two classes, their Python versions are exactly like their C++
back ends.
See here for details on writing Python genetic value classes.
The C++ side#
This section shows the exact C++ implementations behind fwdpy11
ABCs
.
For those users implementing custom models, subclassing these types is
the way to maximize performance. Concrete examples of subclasses are shown
below.
Note
The C++ classes described here have functions that interact
with Python. A future version of fwdpy11
will likely
remove these functions from the C++ side and move them to
the Python-only side in order to promote thread safety.
The C++ back-end for fwdpy11.DiploidGeneticValue
is found in the header file
<fwdpy11/genetic_values/DiploidGeneticValue.hpp>
:
#ifndef FWDPY11_DIPLOID_GENETIC_VALUE_HPP__
#define FWDPY11_DIPLOID_GENETIC_VALUE_HPP__
#include <cstdint>
#include <vector>
#include <fwdpy11/rng.hpp>
#include <fwdpy11/types/DiploidPopulation.hpp>
#include <fwdpy11/genetic_value_to_fitness/GeneticValueToFitnessMap.hpp>
#include <fwdpy11/genetic_value_to_fitness/GeneticValueIsFitness.hpp>
#include <fwdpy11/genetic_value_noise/NoNoise.hpp>
#include <fwdpy11/genetic_value_data/genetic_value_data.hpp>
namespace fwdpy11
{
class DiploidGeneticValue
/// API class
///
/// Things to note:
/// Any deme/geography-specific details must be handled by the derived class.
{
private:
template <typename Base, typename Default, typename... Args>
std::shared_ptr<Base>
process_input(const Base* input, Args... args)
{
if (input == nullptr)
{
return std::shared_ptr<Base>(new Default{args...});
}
return input->clone();
}
public:
std::size_t total_dim;
std::vector<double> gvalues;
// Even though these are stored as shared_ptr,
// this class is non-copyable because its state
// may change over time via the various update
// functions.
std::shared_ptr<GeneticValueToFitnessMap> gv2w;
std::shared_ptr<GeneticValueNoise> noise_fxn;
DiploidGeneticValue(std::size_t ndim, const GeneticValueToFitnessMap* gv2w_,
const GeneticValueNoise* noise)
: total_dim(ndim), gvalues(total_dim, 0.),
gv2w{process_input<GeneticValueToFitnessMap, GeneticValueIsFitness,
std::size_t>(gv2w_, ndim)},
noise_fxn{process_input<GeneticValueNoise, NoNoise>(noise)}
{
}
// The type is move-only
virtual ~DiploidGeneticValue() = default;
DiploidGeneticValue(const DiploidGeneticValue&) = delete;
DiploidGeneticValue(DiploidGeneticValue&&) = default;
DiploidGeneticValue& operator=(const DiploidGeneticValue&) = delete;
DiploidGeneticValue& operator=(DiploidGeneticValue&&) = default;
virtual double calculate_gvalue(const DiploidGeneticValueData data) = 0;
virtual void update(const DiploidPopulation& pop) = 0;
// To be called from w/in a simulation
inline void
operator()(DiploidGeneticValueData data)
{
data.offspring_metadata.get().g = calculate_gvalue(data);
data.offspring_metadata.get().e = noise(DiploidGeneticValueNoiseData(data));
data.offspring_metadata.get().w = genetic_value_to_fitness(
DiploidGeneticValueToFitnessData(data, gvalues));
}
virtual double
genetic_value_to_fitness(const DiploidGeneticValueToFitnessData data)
{
return gv2w->operator()(data);
}
virtual double
noise(const DiploidGeneticValueNoiseData data) const
{
return noise_fxn->operator()(data);
}
};
} //namespace fwdpy11
#endif
Some notes:
Only functions marked
virtual
may be overridden by a subclass.A function with the following form is a “pure virtual function”, which is the C++ analogue of
abc.abstractmethod()
:virtual void member_function() = 0;
Attempting to allocate an instance of a subclass that has not defined all pure virtual functions will fail to compile.
Classes like
fwdpy11.Additive
take instances offwdpy11.GeneticValueIsTrait
instead offwdpy11.GeneticValueIsFitness
. The reason is that most use cases requiring a non-None
argument here are models of traits. For other cases, the default (fwdpy11.GeneticValueIsFitness
) suffices. However, the C++ API allows you to subclass the base class however you wish.
The interface class for genetic value to fitness maps is <fwdpy11/headers/fwdpy11/genetic_value_to_fitness/GeneticValueToFitnessMap.hpp>
:
#ifndef FWDPY11_GENETIC_VALUE_TO_FITNESS_MAP_HPP__
#define FWDPY11_GENETIC_VALUE_TO_FITNESS_MAP_HPP__
#include <memory>
#include <fwdpy11/types/DiploidPopulation.hpp>
#include <fwdpy11/genetic_values/default_update.hpp>
#include <fwdpp/util/named_type.hpp>
#include <fwdpy11/genetic_value_data/genetic_value_data.hpp>
namespace fwdpy11
{
struct genetic_value_maps_to_fitness
{
};
using maps_to_fitness
= fwdpp::strong_types::named_type<bool, genetic_value_maps_to_fitness>;
struct GeneticValueToFitnessMap
{
std::size_t total_dim;
bool isfitness;
explicit GeneticValueToFitnessMap(std::size_t ndim, const maps_to_fitness& m)
: total_dim{ndim}, isfitness{m.get()}
{
}
virtual ~GeneticValueToFitnessMap() = default;
GeneticValueToFitnessMap(const GeneticValueToFitnessMap&) = delete;
GeneticValueToFitnessMap(GeneticValueToFitnessMap&&) = default;
GeneticValueToFitnessMap& operator=(const GeneticValueToFitnessMap&) = delete;
GeneticValueToFitnessMap& operator=(GeneticValueToFitnessMap&&) = default;
virtual double
operator()(const DiploidGeneticValueToFitnessData /*data*/) const = 0;
virtual void update(const DiploidPopulation& /*pop*/) = 0;
virtual std::shared_ptr<GeneticValueToFitnessMap> clone() const = 0;
};
} //namespace fwdpy11
#endif
The C++ definition of fwdpy11.GeneticValueIsTrait
inherits from the above, and is another interface class (because it does not define any of the pure virtual functions):
#ifndef FWDPY11_GENETIC_VALUE_IS_TRAIT
#define FWDPY11_GENETIC_VALUE_IS_TRAIT
#include "GeneticValueToFitnessMap.hpp"
namespace fwdpy11
{
struct GeneticValueIsTrait : public GeneticValueToFitnessMap
/// Another ABC. Effectively a type trait
{
explicit GeneticValueIsTrait(std::size_t ndim)
: GeneticValueToFitnessMap(ndim, maps_to_fitness(false))
{
}
virtual ~GeneticValueIsTrait() = default;
GeneticValueIsTrait(const GeneticValueIsTrait&) = delete;
GeneticValueIsTrait(GeneticValueIsTrait&&) = default;
GeneticValueIsTrait& operator=(const GeneticValueIsTrait&) = delete;
GeneticValueIsTrait& operator=(GeneticValueIsTrait&&) = default;
};
} // namespace fwdpy11
#endif
The three data types MockDiploidGeneticValueData
, MockDiploidGeneticValueNoiseData
,
and MockDiploidGeneticValueToFitnessData
map to the following three classes,
respectively, which hold references to objects from the simulation:
#ifndef FWDPY11_GENETIC_DATA_HPP__
#define FWDPY11_GENETIC_DATA_HPP__
#include <fwdpy11/rng.hpp>
#include <fwdpy11/types/DiploidPopulation.hpp>
namespace fwdpy11
{
struct DiploidGeneticValueData
{
std::reference_wrapper<const fwdpy11::GSLrng_t> rng;
std::reference_wrapper<const fwdpy11::DiploidPopulation> pop;
std::reference_wrapper<const fwdpy11::DiploidMetadata> parent1_metadata,
parent2_metadata;
std::size_t metadata_index;
std::reference_wrapper<fwdpy11::DiploidMetadata> offspring_metadata;
DiploidGeneticValueData(const fwdpy11::GSLrng_t& rng_,
const fwdpy11::DiploidPopulation& pop_,
const fwdpy11::DiploidMetadata& parent1_metadata_,
const fwdpy11::DiploidMetadata& parent2_metadata_,
std::size_t metadata_index_,
fwdpy11::DiploidMetadata& offspring_metadata_)
: rng(rng_), pop(pop_), parent1_metadata(parent1_metadata_),
parent2_metadata(parent2_metadata_), metadata_index(metadata_index_),
offspring_metadata(offspring_metadata_)
{
}
};
// NOTE: the next two structs may be collapsible into one?
struct DiploidGeneticValueNoiseData
{
std::reference_wrapper<const fwdpy11::GSLrng_t> rng;
std::reference_wrapper<const fwdpy11::DiploidMetadata> parent1_metadata,
parent2_metadata, offspring_metadata;
std::size_t metadata_index;
explicit DiploidGeneticValueNoiseData(const DiploidGeneticValueData& data)
: rng(data.rng), parent1_metadata(data.parent1_metadata),
parent2_metadata(data.parent2_metadata),
offspring_metadata(data.offspring_metadata),
metadata_index(data.metadata_index)
{
}
};
struct DiploidGeneticValueToFitnessData
{
std::reference_wrapper<const fwdpy11::GSLrng_t> rng;
std::reference_wrapper<const fwdpy11::DiploidMetadata> parent1_metadata,
parent2_metadata, offspring_metadata;
std::reference_wrapper<const std::vector<double>> gvalues;
std::size_t metadata_index;
explicit DiploidGeneticValueToFitnessData(const DiploidGeneticValueData& data,
const std::vector<double>& gvalues_)
: rng(data.rng), parent1_metadata(data.parent1_metadata),
parent2_metadata(data.parent2_metadata),
offspring_metadata(data.offspring_metadata),
gvalues(gvalues_),
metadata_index(data.metadata_index)
{
}
};
}
#endif
Example: strict additive effects#
This example comes from the test suite:
#include <algorithm> //for std::max
#include <pybind11/pybind11.h>
#include <fwdpy11/genetic_values/DiploidGeneticValue.hpp>
#include <fwdpy11/genetic_values/default_update.hpp>
struct additive : public fwdpy11::DiploidGeneticValue
{
additive() : fwdpy11::DiploidGeneticValue{1, nullptr, nullptr}
{
}
double
calculate_gvalue(const fwdpy11::DiploidGeneticValueData data) override
{
const auto& pop = data.pop.get();
const auto diploid_index = data.offspring_metadata.get().label;
double sum = 0;
for (auto m : pop.haploid_genomes[pop.diploids[diploid_index].first].smutations)
{
sum += pop.mutations[m].s;
}
for (auto m : pop.haploid_genomes[pop.diploids[diploid_index].second].smutations)
{
sum += pop.mutations[m].s;
}
gvalues[0] = std::max(0.0, 1.0 + sum);
return gvalues[0];
}
pybind11::object
pickle() const
{
return pybind11::bytes("custom_additive");
}
DEFAULT_DIPLOID_POP_UPDATE();
};
//Standard pybind11 stuff goes here
PYBIND11_MODULE(custom_additive, m)
{
pybind11::object imported_custom_additive_base_class_type
= pybind11::module::import("fwdpy11").attr("DiploidGeneticValue");
pybind11::class_<additive, fwdpy11::DiploidGeneticValue>(m, "additive")
.def(pybind11::init<>())
.def(pybind11::pickle([](const additive& a) { return a.pickle(); },
[](pybind11::object o) {
pybind11::bytes b(o);
auto s = b.cast<std::string>();
if (s != "custom_additive")
{
throw std::runtime_error(
"invalid object state");
}
return additive();
}))
.def("validate_timings", [](const additive&, int, pybind11::object) {});
}
Example: a snowdrift model#
This example is based on [DHK04] and comes from the fwdpy11
test suite.
Todo
Describe in words the implementation details.
The low-level details in C++ are:
/* Implement a stateful fitness model.
* We define a new C++ type that will be
* wrapped as a fwdpy11.DiploidGeneticValue
* object.
*/
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <fwdpy11/types/DiploidPopulation.hpp>
#include <fwdpp/fitness_models.hpp>
#include <fwdpy11/genetic_values/DiploidGeneticValue.hpp>
namespace py = pybind11;
struct snowdrift : public fwdpy11::DiploidGeneticValue
/* This is the low-level implementation of our
* stateful fitness object. It records the
* model parameters and holds a
* vector to track individual phenotypes.
*
* Here, we publicly inherit from fwdpy11::DiploidGeneticValue,
* which is defined in the header included above. It is
* an abstract class in C++ terms, and is reflected
* as a Python Abstract Base Class (ABC) called
* fwdpy11.DiploidGeneticValue.
*
* The phenotypes get updated each generation during
* the simulation.
*
* The phenotypes will be the simple additive model,
* calculated using fwdpp's machinery.
*/
{
const double b1, b2, c1, c2, slope, sig0;
// This is our stateful data,
// which is a record of the
// additive genetic values of all
// diploids
std::vector<double> phenotypes;
const fwdpp::additive_diploid additive;
// This constructor is exposed to Python
snowdrift(double b1_, double b2_, double c1_, double c2_, double slope, double p0)
: fwdpy11::DiploidGeneticValue{1, nullptr, nullptr}, b1(b1_), b2(b2_), c1(c1_),
c2(c2_), slope(slope), sig0(1. / slope * std::log(p0 / (1. - p0))),
phenotypes(), additive{fwdpp::trait{2.0}}
{
if (p0 == 0 || p0 == 1.)
{
throw std::invalid_argument("p0 must be 0 < p0 < 1");
}
}
double
calculate_gvalue(const fwdpy11::DiploidGeneticValueData data) override
// The call operator must return the genetic value of an individual
{
gvalues[0] = phenotypes[data.metadata_index];
return gvalues[0];
}
double
genetic_value_to_fitness(
const fwdpy11::DiploidGeneticValueToFitnessData data) override
// This function converts genetic value to fitness.
{
double zself = data.offspring_metadata.get().g;
auto N = phenotypes.size();
auto other = gsl_rng_uniform_int(data.rng.get().get(), N);
while (other == data.offspring_metadata.get().label)
{
other = gsl_rng_uniform_int(data.rng.get().get(), N);
}
double zpair = zself + phenotypes[other];
double a
= 1. + b1 * zpair + b2 * zpair * zpair - c1 * zself - c2 * zself * zself;
return std::max(a, 0.0);
}
void
update(const fwdpy11::DiploidPopulation &pop) override
// A stateful fitness model needs updating.
{
phenotypes.clear();
for (auto &md : pop.diploid_metadata)
{
// A diploid tracks its index via
// fwdpy11::DiploidMetadata::label
auto g = additive(pop.diploids[md.label], pop.haploid_genomes,
pop.mutations);
phenotypes.push_back(1. / (1. + std::exp(-slope * (g + sig0))));
}
}
};
PYBIND11_MODULE(ll_snowdrift, m)
{
m.doc() = "Example of custom stateful fitness model.";
// We need to import the Python version of our base class:
pybind11::object imported_snowdrift_base_class_type
= pybind11::module::import("fwdpy11").attr("DiploidGeneticValue");
// Create a Python class based on our new type
py::class_<snowdrift, fwdpy11::DiploidGeneticValue>(m, "_ll_DiploidSnowdrift")
.def(py::init<double, double, double, double, double, double>(), py::arg("b1"),
py::arg("b2"), py::arg("c1"), py::arg("c2"), py::arg("slope"),
py::arg("p0"))
.def_readwrite("phenotypes", &snowdrift::phenotypes)
.def("validate_timings", [](const snowdrift &, int, pybind11::object) {});
}
The user-facing Python class is implemented using attrs
, which
is handy because we don’t have to write C++ code to pickle/unpickle:
import attr
import fwdpy11.class_decorators
import ll_snowdrift
@fwdpy11.class_decorators.attr_class_to_from_dict
@attr.s()
class DiploidSnowdrift(ll_snowdrift._ll_DiploidSnowdrift):
"""
This is the user-facing Python representation of
a simple snowdrift model.
"""
b1 = attr.ib()
b2 = attr.ib()
c1 = attr.ib()
c2 = attr.ib()
slope = attr.ib()
p0 = attr.ib()
def __attrs_post_init__(self):
# Need to initialize the C++ base class
super(DiploidSnowdrift, self).__init__(
self.b1, self.b2, self.c1, self.c2, self.slope, self.p0
)
def __getstate__(self):
# asdict is provided by the 2nd class
# decorator (reading from bottom to top)
# We pickle this class as a tuple.
# Element 0 contains the kwargs + values
# needed for __init__ and we send along
# the current phenytpes from the base class
return (self.asdict(), self.phenotypes)
def __setstate__(self, t):
# update the representation of the Python/attrs
# class
self.__dict__.update(t[0])
# Failure to properly init the C++ base class
# will lead to segmentation faults.
super(DiploidSnowdrift, self).__init__(**t[0])
# Reset the base class attribute
self.phenotypes = t[1]