Technical overview of genetic value calculations#

A mock overview in Python#

This section illustrates the technical details of genetic value objects by presenting a mock up of their API in Python. To help document key concepts, we make use of Python 3’s type hinting features from typing and decorators from abc:

  • A class inheriting from abc.ABC is an abstract base class. It is not possible to create instances of ABCs. Rather, you must subclass them.

  • A function decorated with abc.abstractmethod() is a part of an ABC’s interface that must be defined by a subclass. In other words, to be a valid MockDiploidGeneticValue, subclasses must define the abstract methods. In C++ terminology, these are “pure virtual” functions.

  • A function decorated with typing.final() must not be redefined in a subclass. Doing so is considered an error, although Python will allow it.

  • In Python, any undecorated functions are viewed as default implementations of the class’ behavior. A subclass may or may not redefine them as needed.

These concepts do not map perfectly from Python to C++, but it is close enough for now. Simply put, C++ is much stricter than Python.

The following code block shows a Python mock-up corresponding to fwdpy11.DiploidGeneticValue. (The next section shows the actual definition of fwdpy11.DiploidGeneticValue.)

This class is an abstract base class. The initialization semantics are:

  • The object is initialized with the dimensionality of the model, a genetic value to fitness map, and a noise function.

  • If the genetic value to fitness map is None, then the genetic value is assumed to be fitness itself (a “standard population genetic” simulation). Thus, we supply fwdpy11.GeneticValueIsFitness.

  • If the noise object is None, then the model has no random effects and we supply fwdpy11.NoNoise.

The remaining elements of the class are its public interface, consisting of a mixture of final functions, abstract methods, and default implementations. Each function takes a single argument, most of which are declared as some sort of mocked data type. These data types hold data that are relevant to carrying out our calculations. Below, the C++ versions get defined and the Python versions are listed at the end of this section.

import abc
import typing
import attr
import numpy as np
import fwdpy11


@attr.s(auto_attribs=True)
class MockDiploidGeneticValue(abc.ABC):
    ndim: int
    genetic_value_to_fitness_map: typing.Union[None, fwdpy11.GeneticValueToFitnessMap]
    noise_function: typing.Union[None, fwdpy11.GeneticValueNoise]

    def __attrs_post_init__(self):
        self.gvalues = np.zeros(self.ndim)
        if self.genetic_value_to_fitness_map is None:
            self.genetic_value_to_fitness_map = fwdpy11.GeneticValueIsFitness(self.ndim)
        if self.noise_function is None:
            self.noise_function = fwdpy11.NoNoise()

    @typing.final
    def __call__(self, data: MockDiploidGeneticValueData) -> None:
        # Set the genetic value metadata for offspring
        data.offspring_metadata.g = self.calculate_gvalue(data)
        # Set the random effects metadata for offspring
        data.offspring_metadata.e = self.noise(MockDiploidGeneticValueNoiseData(data))
        # Set the fitness metadata for offspring
        data.offspring_metadata.w = self.genetic_value_to_fitness(
            MockDiploidGeneticValueToFitnessData(data)
        )

    @abc.abstractmethod
    def calculate_gvalue(self, data: MockDiploidGeneticValueData) -> float:
        """
        This function must populate self.gvalues
        and return a single value for the offspring's DiploidMetadata.g
        """
        raise NotImplementedError("calculate_gvalue not implemented")

    def genetic_value_to_fitness(
        self, data: MockDiploidGeneticValueToFitnessData
    ) -> float:
        return self.genetic_value_to_fitness_map(data)

    def noise(self, data: MockDiploidGeneticValueNoiseData) -> float:
        return self.noise_function(data)

    @abc.abstractmethod
    def update(self, pop: fwdpy11.DiploidPopulation) -> None:
        raise NotImplementedError("update not implemented")

To make things complete, a Python analogue of fwdpy11.GeneticValueToFitnessMap would look like this:

@attr.s(auto_attribs=True)
class MockGeneticValueToFitnessMap(abc.ABC):
    ndim: int

    @abc.abstractmethod
    def __call__(self, data: MockDiploidGeneticValueToFitnessData) -> float:
        raise NotImplementedError("__call__ not implemented")

    @abc.abstractmethod
    def update(self, pop: fwdpy11.DiploidPopulation) -> None:
        raise NotImplementedError("update not implemented")

Similarly, this is how we’d implement fwdpy11.GeneticValueNoise in Python:

class MockGeneticValueNoise(abc.ABC):
    @abc.abstractmethod
    def __call__(self, data: MockDiploidGeneticValueNoiseData) -> float:
        raise NotImplementedError("__call__ not implemented")

    @abc.abstractmethod
    def update(self, pop: fwdpy11.DiploidPopulation) -> None:
        raise NotImplementedError("update not implemented")

Behavior during a simulation#

During a simulation, the following operations happen:

gv.update(pop)
gv.genetic_value_to_fitness_map.update(pop)
gv.noise_function.update(pop)

#offspring_metadata is a list of fwdpy11.DiploidMetadata

: for o in offspring_metadata:

#Execute the calculations for each offspring

: gv(MockDiploidGeneticValueData(o, pop))

The definition of our classes and the way that update functions are applied implies that all three objects are updated independently from one another. This independence covers a large number of use cases. For example, a moving trait optimum (fwdpy11.GaussianStabilizingSelection) only needs acces to fwdpy11.DiploidPopulation.generation to know if it needs to update its internal state.

For some models, this independent updating doesn’t do the job. For example, if fitness depends on an individual’s genetic value in comparison to some or all other genetic values in the population (as may be the case for models of social interactions), then we don’t have an easy way to implement that using a class derived from fwdpy11.GeneticValueIsTrait.

To handle these more complex models, subclasses may simply override the functions genetic_value_to_fitness and/or noise as needed. “Snowdrift” models of social interactions are one example where this method is necessary (see here for a C++ implementation and here for a Python version).

The data objects#

The member functions of our mock classes take different argument types. These types hold things like the offspring metadata, parental metadata, etc., that may be needed to calculate return values.

For a genetic value object written in Python:

The data types closely match those used by the C++ API, which are described in the next section. The difference is that fwdpy11.PyDiploidGeneticValueData supports the buffer protocol in order to have write access to fwdpy11.DiploidGeneticValue.genetic_values. For the other two classes, their Python versions are exactly like their C++ back ends.

See here for details on writing Python genetic value classes.

The C++ side#

This section shows the exact C++ implementations behind fwdpy11 ABCs. For those users implementing custom models, subclassing these types is the way to maximize performance. Concrete examples of subclasses are shown below.

Note

The C++ classes described here have functions that interact with Python. A future version of fwdpy11 will likely remove these functions from the C++ side and move them to the Python-only side in order to promote thread safety.

The C++ back-end for fwdpy11.DiploidGeneticValue is found in the header file <fwdpy11/genetic_values/DiploidGeneticValue.hpp>:

#ifndef FWDPY11_DIPLOID_GENETIC_VALUE_HPP__
#define FWDPY11_DIPLOID_GENETIC_VALUE_HPP__

#include <cstdint>
#include <vector>
#include <fwdpy11/rng.hpp>
#include <fwdpy11/types/DiploidPopulation.hpp>
#include <fwdpy11/genetic_value_to_fitness/GeneticValueToFitnessMap.hpp>
#include <fwdpy11/genetic_value_to_fitness/GeneticValueIsFitness.hpp>
#include <fwdpy11/genetic_value_noise/NoNoise.hpp>
#include <fwdpy11/genetic_value_data/genetic_value_data.hpp>

namespace fwdpy11
{
    class DiploidGeneticValue
    /// API class
    ///
    /// Things to note:
    /// Any deme/geography-specific details must be handled by the derived class.
    {
      private:
        template <typename Base, typename Default, typename... Args>
        std::shared_ptr<Base>
        process_input(const Base* input, Args... args)
        {
            if (input == nullptr)
                {
                    return std::shared_ptr<Base>(new Default{args...});
                }
            return input->clone();
        }

      public:
        std::size_t total_dim;
        std::vector<double> gvalues;
        // Even though these are stored as shared_ptr,
        // this class is non-copyable because its state
        // may change over time via the various update
        // functions.
        std::shared_ptr<GeneticValueToFitnessMap> gv2w;
        std::shared_ptr<GeneticValueNoise> noise_fxn;

        DiploidGeneticValue(std::size_t ndim, const GeneticValueToFitnessMap* gv2w_,
                            const GeneticValueNoise* noise)
            : total_dim(ndim), gvalues(total_dim, 0.),
              gv2w{process_input<GeneticValueToFitnessMap, GeneticValueIsFitness,
                                 std::size_t>(gv2w_, ndim)},
              noise_fxn{process_input<GeneticValueNoise, NoNoise>(noise)}
        {
        }

        // The type is move-only
        virtual ~DiploidGeneticValue() = default;
        DiploidGeneticValue(const DiploidGeneticValue&) = delete;
        DiploidGeneticValue(DiploidGeneticValue&&) = default;
        DiploidGeneticValue& operator=(const DiploidGeneticValue&) = delete;
        DiploidGeneticValue& operator=(DiploidGeneticValue&&) = default;

        virtual double calculate_gvalue(const DiploidGeneticValueData data) = 0;

        virtual void update(const DiploidPopulation& pop) = 0;

        // To be called from w/in a simulation
        inline void
        operator()(DiploidGeneticValueData data)
        {
            data.offspring_metadata.get().g = calculate_gvalue(data);
            data.offspring_metadata.get().e = noise(DiploidGeneticValueNoiseData(data));
            data.offspring_metadata.get().w = genetic_value_to_fitness(
                DiploidGeneticValueToFitnessData(data, gvalues));
        }

        virtual double
        genetic_value_to_fitness(const DiploidGeneticValueToFitnessData data)
        {
            return gv2w->operator()(data);
        }

        virtual double
        noise(const DiploidGeneticValueNoiseData data) const
        {
            return noise_fxn->operator()(data);
        }
    };
} //namespace fwdpy11

#endif

Some notes:

  • Only functions marked virtual may be overridden by a subclass.

  • A function with the following form is a “pure virtual function”, which is the C++ analogue of abc.abstractmethod():

    virtual void member_function() = 0;
    

    Attempting to allocate an instance of a subclass that has not defined all pure virtual functions will fail to compile.

  • Classes like fwdpy11.Additive take instances of fwdpy11.GeneticValueIsTrait instead of fwdpy11.GeneticValueIsFitness. The reason is that most use cases requiring a non-None argument here are models of traits. For other cases, the default (fwdpy11.GeneticValueIsFitness) suffices. However, the C++ API allows you to subclass the base class however you wish.

The interface class for genetic value to fitness maps is <fwdpy11/headers/fwdpy11/genetic_value_to_fitness/GeneticValueToFitnessMap.hpp>:

#ifndef FWDPY11_GENETIC_VALUE_TO_FITNESS_MAP_HPP__
#define FWDPY11_GENETIC_VALUE_TO_FITNESS_MAP_HPP__

#include <memory>
#include <fwdpy11/types/DiploidPopulation.hpp>
#include <fwdpy11/genetic_values/default_update.hpp>
#include <fwdpp/util/named_type.hpp>
#include <fwdpy11/genetic_value_data/genetic_value_data.hpp>

namespace fwdpy11
{
    struct genetic_value_maps_to_fitness
    {
    };

    using maps_to_fitness
        = fwdpp::strong_types::named_type<bool, genetic_value_maps_to_fitness>;

    struct GeneticValueToFitnessMap
    {
        std::size_t total_dim;
        bool isfitness;
        explicit GeneticValueToFitnessMap(std::size_t ndim, const maps_to_fitness& m)
            : total_dim{ndim}, isfitness{m.get()}
        {
        }

        virtual ~GeneticValueToFitnessMap() = default;
        GeneticValueToFitnessMap(const GeneticValueToFitnessMap&) = delete;
        GeneticValueToFitnessMap(GeneticValueToFitnessMap&&) = default;
        GeneticValueToFitnessMap& operator=(const GeneticValueToFitnessMap&) = delete;
        GeneticValueToFitnessMap& operator=(GeneticValueToFitnessMap&&) = default;

        virtual double
        operator()(const DiploidGeneticValueToFitnessData /*data*/) const = 0;
        virtual void update(const DiploidPopulation& /*pop*/) = 0;
        virtual std::shared_ptr<GeneticValueToFitnessMap> clone() const = 0;
    };
} //namespace fwdpy11

#endif

The C++ definition of fwdpy11.GeneticValueIsTrait inherits from the above, and is another interface class (because it does not define any of the pure virtual functions):

#ifndef FWDPY11_GENETIC_VALUE_IS_TRAIT
#define FWDPY11_GENETIC_VALUE_IS_TRAIT

#include "GeneticValueToFitnessMap.hpp"

namespace fwdpy11
{
    struct GeneticValueIsTrait : public GeneticValueToFitnessMap
    /// Another ABC.  Effectively a type trait
    {
        explicit GeneticValueIsTrait(std::size_t ndim)
            : GeneticValueToFitnessMap(ndim, maps_to_fitness(false))
        {
        }
        virtual ~GeneticValueIsTrait() = default;
        GeneticValueIsTrait(const GeneticValueIsTrait&) = delete;
        GeneticValueIsTrait(GeneticValueIsTrait&&) = default;
        GeneticValueIsTrait& operator=(const GeneticValueIsTrait&) = delete;
        GeneticValueIsTrait& operator=(GeneticValueIsTrait&&) = default;
    };
} // namespace fwdpy11

#endif

The three data types MockDiploidGeneticValueData, MockDiploidGeneticValueNoiseData, and MockDiploidGeneticValueToFitnessData map to the following three classes, respectively, which hold references to objects from the simulation:

#ifndef FWDPY11_GENETIC_DATA_HPP__
#define FWDPY11_GENETIC_DATA_HPP__

#include <fwdpy11/rng.hpp>
#include <fwdpy11/types/DiploidPopulation.hpp>

namespace fwdpy11
{
    struct DiploidGeneticValueData
    {
        std::reference_wrapper<const fwdpy11::GSLrng_t> rng;
        std::reference_wrapper<const fwdpy11::DiploidPopulation> pop;
        std::reference_wrapper<const fwdpy11::DiploidMetadata> parent1_metadata,
            parent2_metadata;
        std::size_t metadata_index;
        std::reference_wrapper<fwdpy11::DiploidMetadata> offspring_metadata;

        DiploidGeneticValueData(const fwdpy11::GSLrng_t& rng_,
                                const fwdpy11::DiploidPopulation& pop_,
                                const fwdpy11::DiploidMetadata& parent1_metadata_,
                                const fwdpy11::DiploidMetadata& parent2_metadata_,
                                std::size_t metadata_index_,
                                fwdpy11::DiploidMetadata& offspring_metadata_)
            : rng(rng_), pop(pop_), parent1_metadata(parent1_metadata_),
              parent2_metadata(parent2_metadata_), metadata_index(metadata_index_),
              offspring_metadata(offspring_metadata_)
        {
        }
    };

    // NOTE: the next two structs may be collapsible into one?

    struct DiploidGeneticValueNoiseData
    {
        std::reference_wrapper<const fwdpy11::GSLrng_t> rng;
        std::reference_wrapper<const fwdpy11::DiploidMetadata> parent1_metadata,
            parent2_metadata, offspring_metadata;
        std::size_t metadata_index;

        explicit DiploidGeneticValueNoiseData(const DiploidGeneticValueData& data)
            : rng(data.rng), parent1_metadata(data.parent1_metadata),
              parent2_metadata(data.parent2_metadata),
              offspring_metadata(data.offspring_metadata),
              metadata_index(data.metadata_index)
        {
        }
    };

    struct DiploidGeneticValueToFitnessData
    {
        std::reference_wrapper<const fwdpy11::GSLrng_t> rng;
        std::reference_wrapper<const fwdpy11::DiploidMetadata> parent1_metadata,
            parent2_metadata, offspring_metadata;
        std::reference_wrapper<const std::vector<double>> gvalues;
        std::size_t metadata_index;

        explicit DiploidGeneticValueToFitnessData(const DiploidGeneticValueData& data,
                                                  const std::vector<double>& gvalues_)
            : rng(data.rng), parent1_metadata(data.parent1_metadata),
              parent2_metadata(data.parent2_metadata),
              offspring_metadata(data.offspring_metadata),
              gvalues(gvalues_),
              metadata_index(data.metadata_index)
        {
        }
    };
}

#endif

Example: strict additive effects#

This example comes from the test suite:

#include <algorithm> //for std::max
#include <pybind11/pybind11.h>
#include <fwdpy11/genetic_values/DiploidGeneticValue.hpp>
#include <fwdpy11/genetic_values/default_update.hpp>

struct additive : public fwdpy11::DiploidGeneticValue
{
    additive() : fwdpy11::DiploidGeneticValue{1, nullptr, nullptr}
    {
    }

    double
    calculate_gvalue(const fwdpy11::DiploidGeneticValueData data) override
    {
        const auto& pop = data.pop.get();
        const auto diploid_index = data.offspring_metadata.get().label;
        double sum = 0;
        for (auto m : pop.haploid_genomes[pop.diploids[diploid_index].first].smutations)
            {
                sum += pop.mutations[m].s;
            }
        for (auto m : pop.haploid_genomes[pop.diploids[diploid_index].second].smutations)
            {
                sum += pop.mutations[m].s;
            }
        gvalues[0] = std::max(0.0, 1.0 + sum);
        return gvalues[0];
    }

    pybind11::object
    pickle() const
    {
        return pybind11::bytes("custom_additive");
    }
    DEFAULT_DIPLOID_POP_UPDATE();
};

//Standard pybind11 stuff goes here
PYBIND11_MODULE(custom_additive, m)
{
    pybind11::object imported_custom_additive_base_class_type
        = pybind11::module::import("fwdpy11").attr("DiploidGeneticValue");
    pybind11::class_<additive, fwdpy11::DiploidGeneticValue>(m, "additive")
        .def(pybind11::init<>())
        .def(pybind11::pickle([](const additive& a) { return a.pickle(); },
                              [](pybind11::object o) {
                                  pybind11::bytes b(o);
                                  auto s = b.cast<std::string>();
                                  if (s != "custom_additive")
                                      {
                                          throw std::runtime_error(
                                              "invalid object state");
                                      }
                                  return additive();
                              }));
}

Example: a snowdrift model#

This example is based on [DHK04] and comes from the fwdpy11 test suite.

Todo

Describe in words the implementation details.

The low-level details in C++ are:

/* Implement a stateful fitness model.
 * We define a new C++ type that will be
 * wrapped as a fwdpy11.DiploidGeneticValue
 * object.
 */
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <fwdpy11/types/DiploidPopulation.hpp>
#include <fwdpp/fitness_models.hpp>
#include <fwdpy11/genetic_values/DiploidGeneticValue.hpp>

namespace py = pybind11;

struct snowdrift : public fwdpy11::DiploidGeneticValue
/* This is the low-level implementation of our
 * stateful fitness object.  It records the
 * model parameters and holds a
 * vector to track individual phenotypes.
 *
 * Here, we publicly inherit from fwdpy11::DiploidGeneticValue,
 * which is defined in the header included above.  It is
 * an abstract class in C++ terms, and is reflected
 * as a Python Abstract Base Class (ABC) called
 * fwdpy11.DiploidGeneticValue.
 *
 * The phenotypes get updated each generation during
 * the simulation.
 *
 * The phenotypes will be the simple additive model,
 * calculated using fwdpp's machinery.
 */
{
    const double b1, b2, c1, c2, slope, sig0;
    // This is our stateful data,
    // which is a record of the
    // additive genetic values of all
    // diploids
    std::vector<double> phenotypes;
    const fwdpp::additive_diploid additive;

    // This constructor is exposed to Python
    snowdrift(double b1_, double b2_, double c1_, double c2_, double slope, double p0)
        : fwdpy11::DiploidGeneticValue{1, nullptr, nullptr}, b1(b1_), b2(b2_), c1(c1_),
          c2(c2_), slope(slope), sig0(1. / slope * std::log(p0 / (1. - p0))),
          phenotypes(), additive{fwdpp::trait{2.0}}
    {
        if (p0 == 0 || p0 == 1.)
            {
                throw std::invalid_argument("p0 must be 0 < p0 < 1");
            }
    }

    double
    calculate_gvalue(const fwdpy11::DiploidGeneticValueData data) override
    // The call operator must return the genetic value of an individual
    {
        gvalues[0] = phenotypes[data.metadata_index];
        return gvalues[0];
    }

    double
    genetic_value_to_fitness(
        const fwdpy11::DiploidGeneticValueToFitnessData data) override
    // This function converts genetic value to fitness.
    {
        double zself = data.offspring_metadata.get().g;
        auto N = phenotypes.size();
        auto other = gsl_rng_uniform_int(data.rng.get().get(), N);
        while (other == data.offspring_metadata.get().label)
            {
                other = gsl_rng_uniform_int(data.rng.get().get(), N);
            }
        double zpair = zself + phenotypes[other];
        double a
            = 1. + b1 * zpair + b2 * zpair * zpair - c1 * zself - c2 * zself * zself;
        return std::max(a, 0.0);
    }

    void
    update(const fwdpy11::DiploidPopulation &pop) override
    // A stateful fitness model needs updating.
    {
        phenotypes.clear();
        for (auto &md : pop.diploid_metadata)
            {
                // A diploid tracks its index via
                // fwdpy11::DiploidMetadata::label
                auto g = additive(pop.diploids[md.label], pop.haploid_genomes,
                                  pop.mutations);
                phenotypes.push_back(1. / (1. + std::exp(-slope * (g + sig0))));
            }
    }
};

PYBIND11_MODULE(ll_snowdrift, m)
{
    m.doc() = "Example of custom stateful fitness model.";

    // We need to import the Python version of our base class:
    pybind11::object imported_snowdrift_base_class_type
        = pybind11::module::import("fwdpy11").attr("DiploidGeneticValue");

    // Create a Python class based on our new type
    py::class_<snowdrift, fwdpy11::DiploidGeneticValue>(m, "_ll_DiploidSnowdrift")
        .def(py::init<double, double, double, double, double, double>(), py::arg("b1"),
             py::arg("b2"), py::arg("c1"), py::arg("c2"), py::arg("slope"),
             py::arg("p0"))
        .def_readwrite("phenotypes", &snowdrift::phenotypes);
}

The user-facing Python class is implemented using attrs, which is handy because we don’t have to write C++ code to pickle/unpickle:

import attr

import fwdpy11.class_decorators
import ll_snowdrift


@fwdpy11.class_decorators.attr_class_to_from_dict
@attr.s()
class DiploidSnowdrift(ll_snowdrift._ll_DiploidSnowdrift):
    """
    This is the user-facing Python representation of
    a simple snowdrift model.
    """

    b1 = attr.ib()
    b2 = attr.ib()
    c1 = attr.ib()
    c2 = attr.ib()
    slope = attr.ib()
    p0 = attr.ib()

    def __attrs_post_init__(self):
        # Need to initialize the C++ base class
        super(DiploidSnowdrift, self).__init__(
            self.b1, self.b2, self.c1, self.c2, self.slope, self.p0
        )

    def __getstate__(self):
        # asdict is provided by the 2nd class
        # decorator (reading from bottom to top)
        # We pickle this class as a tuple.
        # Element 0 contains the kwargs + values
        # needed for __init__ and we send along
        # the current phenytpes from the base class
        return (self.asdict(), self.phenotypes)

    def __setstate__(self, t):
        # update the representation of the Python/attrs
        # class
        self.__dict__.update(t[0])
        # Failure to properly init the C++ base class
        # will lead to segmentation faults.
        super(DiploidSnowdrift, self).__init__(**t[0])
        # Reset the base class attribute
        self.phenotypes = t[1]