libsequence  1.9.5
Classes and functions related to simulating data under coalescent models

Classes

struct  Sequence::coalsim::segment
 A portion of a recombining chromosome. More...
 
struct  Sequence::coalsim::chromosome
 A chromosome is a container of segments. More...
 
struct  Sequence::coalsim::node
 A point on a marginal tree at which a coalescent event occurs. More...
 
struct  Sequence::coalsim::marginal
 The genealogy of a portion of a chromosome on which no recombination has occurred. More...
 
class  Sequence::coalsim::newick_stream_marginal_tree
 Class that provides a typecast-on-output of a marginal tree to a newick tree Example use: More...
 
class  Sequence::SimData
 Data from coalescent simulations. More...
 
class  Sequence::SimParams
 Parameters for Hudson's simulation program. More...
 

Typedefs

typedef std::pair< std::vector< double >, std::vector< std::string > > Sequence::coalsim::gamete_storage_type
 an object to store simulated gametes An object of this type will tend to exist in the calling environment of your program. If you are simulating a sample of n chromosomes, you would initialize the object as follows: More...
 
typedef std::list< marginalSequence::coalsim::arg
 Ancestral Recombination Graph. More...
 

Functions

template<typename uniform_generator , typename uniform01_generator , typename exponential_generator , typename poisson_generator >
Sequence::SimData Sequence::coalsim::neutral_sample (uniform_generator &uni, uniform01_generator &uni01, exponential_generator &expo, poisson_generator &poiss, const double &theta, const double &rho, const int &nsites, const int &nsam, std::vector< chromosome > *sample, arg *sample_history, unsigned *max_chromosomes=NULL, const unsigned &max_chromosomes_inc=0)
 A simple function to generate samples under a neutral equilibrium model. More...
 
bool Sequence::coalsim::isseg (chromosome::const_iterator seg, const unsigned &nsegs, const int &pos, unsigned *offset)
 ask if a chromosome beginning at seg and containing nsegs contains a segment containing the position pos More...
 
int Sequence::coalsim::coalesce (const double &time, const int &ttl_nsam, const int &current_nsam, const int &c1, const int &c2, const int &nsites, int *nlinks, std::vector< chromosome > *sample, arg *sample_history)
 Common ancestor routine for coalescent simulation. Merges chromosome segments and updates marginal trees. More...
 
int Sequence::coalsim::sample_length (const std::vector< std::pair< int, int > > &fragments)
 When simulating partially linked regions, return the total length of sample material that we are simulating. More...
 
int Sequence::coalsim::total_length (const std::vector< std::pair< int, int > > &fragments)
 When simulating partially linked regions, return the total length of the region. More...
 
void Sequence::coalsim::calculate_scales (const std::vector< std::pair< int, int > > &fragments, std::vector< std::pair< double, double > > *sample_scale, std::vector< std::pair< double, double > > *mutation_scale)
 This is a helper function that rescales physical distance in base pairs to continuous distance on the interval 0,1. More...
 
void Sequence::coalsim::rescale_mutation_positions (Sequence::SimData *d, const std::vector< std::pair< double, double > > &sample_scale, const std::vector< std::pair< double, double > > &mutation_scale)
 Rescales the positions of the mutations in d from the scale given in sample_scale to that given in mutation_scale. More...
 
void Sequence::coalsim::rescale_arg (arg *sample_history, const std::vector< std::pair< int, int > > &fragments)
 Rescales the beginnings of marginal trees in an ancestral recombination graph from a genetic scale to a physical scale.
 
double Sequence::coalsim::integrate_genetic_map (const std::vector< chromosome > &sample, const int &current_nsam, const std::vector< double > &genetic_map, std::vector< double > *reclens)
 When simulating non-uniform recombination rates, the probability of recombination at each point in the simulation needs to be obtained by integrating over the genetic map and the current sample configuration. This function does that. More...
 
std::vector< chromosomeSequence::coalsim::init_sample (const std::vector< int > &pop_config, const int &nsites)
 A simple function to initialize a sample of chromosomes. More...
 
marginal Sequence::coalsim::init_marginal (const int &nsam)
 Simple function to initialize a marginal tree. More...
 
std::pair< int, int > Sequence::coalsim::pick_uniform_spot (const double &random_01, const int &nlinks, std::vector< chromosome >::const_iterator sample_begin, const unsigned &current_nsam)
 Pick a crossover point for the model where recombination rates are constant across a recion. Picks a positions uniformly amongst all chromosomes at which a recombination event will occur. More...
 
int Sequence::coalsim::crossover (const int &current_nsam, const int &chromo, const int &pos, std::vector< chromosome > *sample, arg *sample_history)
 Recombination function. More...
 
double Sequence::coalsim::total_time (const marginal::const_iterator beg, const int &nsam)
 Calculate total time on a marginal tree. More...
 
int Sequence::coalsim::pick_branch (marginal::const_iterator beg, const int &nsam, const double &rtime)
 pick a random branch of a marginal tree More...
 
std::vector< int > Sequence::coalsim::get_all_descendants (marginal::const_iterator beg, const int &nsam, const int &branch)
 Find all the descendants of a branch on a marginal tree. More...
 
bool Sequence::coalsim::is_descendant (marginal::const_iterator beg, const int &ind, const int &branch)
 Ask if a tip of a tree is a descendant of a particular branch. More...
 
double Sequence::coalsim::total_time_on_arg (const Sequence::coalsim::arg &sample_history, const int &total_number_of_sites)
 Returns the total time on an ancestral recombination graph. More...
 

Variables

MAX_SEG_T Sequence::coalsim::MAX_SEGSITES
 controls allocation of simulated gametes You must define this in namespace Sequence in your program. A value of 200 works well.
 
MAX_SEG_T Sequence::coalsim::MAX_SEGS_INC
 controls (re)allocation of simulated gametes You must define this in namespace Sequence in your program. A value of 100 works well
 

Detailed Description

Typedef Documentation

◆ arg

typedef std::list<marginal> Sequence::coalsim::arg

Ancestral Recombination Graph.

An arg is an "ancestral recombination graph", which is a linked list of marginal histories.

Note
The implementation of the crossover function ensures that the marginal trees are sorted in ascending order determined by marginal::beg

Definition at line 217 of file SimTypes.hpp.

◆ gamete_storage_type

typedef std::pair< std::vector<double>, std::vector<std::string> > Sequence::coalsim::gamete_storage_type

an object to store simulated gametes An object of this type will tend to exist in the calling environment of your program. If you are simulating a sample of n chromosomes, you would initialize the object as follows:

gamete_storage_type gamete_bucket( std::vector<double>(MAX_SEGSITES,0.),
std::vector< std::string >(n,std::string(MAX_SEGSITES,'0')) );

Definition at line 42 of file Mutation.hpp.

Function Documentation

◆ calculate_scales()

void Sequence::coalsim::calculate_scales ( const std::vector< std::pair< int, int > > &  fragments,
std::vector< std::pair< double, double > > *  sample_scale,
std::vector< std::pair< double, double > > *  mutation_scale 
)

This is a helper function that rescales physical distance in base pairs to continuous distance on the interval 0,1.

Parameters
fragmentsA vector of pairs, representing physical distance in bp. For each pair, the first element is the distance to the next fragment, and the second element is the length of the fragment. For example, two 1kb fragments separated by 10kb would be represented by the pairs (0,1000) (10000,1000).
sample_scaleThis vector will be filled with values representing the positions of the fragments on the continuous interval, without any space betwen them. This is because we will actually do the simulation using a non-uniform genetic map to represent the high recombination rates between fragments
mutation_scaleThis is a direct mapping of the data contained in fragments to the continuous scale, and can be used to rescale the positions of mutations

Definition at line 64 of file CoalescentFragmentsRescaling.cc.

◆ coalesce()

int Sequence::coalsim::coalesce ( const double &  time,
const int &  ttl_nsam,
const int &  current_nsam,
const int &  c1,
const int &  c2,
const int &  nsites,
int *  nlinks,
std::vector< chromosome > *  sample,
arg sample_history 
)

Common ancestor routine for coalescent simulation. Merges chromosome segments and updates marginal trees.

Common ancestor routine for coalescent simulation. This routine performs the merging of two lineages by a coalescent event. Such merges usually require two sorts of operations. The first is an update to the segments contained in a chromosome, and the second is an update of the nodes on a marginal tree.

Parameters
timethe time at which the coalecent event is occuring
ttl_nsamthe total sample size being simulated
current_nsamthe current sample size in the simulation
c1the array index of the first chromosome involved in the coalescent event
c2the array index of the second chromosome involved in the coalescent event
nsitesthe total mutational length of the region begin simulated. In the language of Hudson (1983), this is the number of infinitely-many-alleles loci in the simulation.
nlinksa pointer to the number of "links" currently in the simulation. A link is the region between two sites, such that a chromosome currently with k sites has k-1 links
samplea pointer to the vector of chromosomes which makes up the sample
sample_historya pointer to the ancestral recombination graph
Returns
the decrease in current_nsam due to the coalescent event. Usually, the return value is 1. Sometimes, however, it is two, when the two chromosomes being merged have no ancestral material on the same marginal tree.

Definition at line 55 of file CoalescentCoalesce.cc.

◆ crossover()

int Sequence::coalsim::crossover ( const int &  current_nsam,
const int &  chromo,
const int &  pos,
std::vector< chromosome > *  sample,
arg sample_history 
)

Recombination function.

Parameters
current_nsamthe current sample size in the simulation
chromothe chromosome on which the crossover event is to occur
posthe crossover event happens between sites pos and pos+1 (0<= pos < nsites)
samplethe sample of chromosomes being simulated
sample_historythe genealogy of the sample
Returns
the number of links lost due to the crossover event
Note
as the type arg is based on std::list, and insertions into lists are done in constant time, this routine keeps the ancestral recombination graph sorted

Definition at line 84 of file CoalescentRecombination.cc.

◆ get_all_descendants()

std::vector< int > Sequence::coalsim::get_all_descendants ( marginal::const_iterator  beg,
const int &  nsam,
const int &  branch 
)

Find all the descendants of a branch on a marginal tree.

Parameters
begA pointer to the beginning of a marginal tree, i.e. the return value of marginal::begin()
nsamthe total sample size simulated
branchthe index of the branch of the tree whose descendants you want.
Note
branch must be <= 2*nsam-2, which is checked by assert

Definition at line 81 of file CoalescentTreeOperations.cc.

◆ init_marginal()

marginal Sequence::coalsim::init_marginal ( const int &  nsam)

Simple function to initialize a marginal tree.

Parameters
nsamthe total sample size (i.e. summed over all populations) that you want to simulate

Definition at line 62 of file CoalescentInitialize.cc.

◆ init_sample()

std::vector< chromosome > Sequence::coalsim::init_sample ( const std::vector< int > &  pop_config,
const int &  nsites 
)

A simple function to initialize a sample of chromosomes.

Parameters
pop_configFor a k-population model, this vector contains the sample size for each pop. Individuals are labeled as beloning to population 0 to k-1, in the order specified in this vector
nsitesThe number of sites at which mutations occur. For a k-site model, recombination occurs at any of the k-1 "links" between sites. Eaach chromosome is assigned a single segment starting at position 0 and ending at nsites-1.

Definition at line 31 of file CoalescentInitialize.cc.

◆ integrate_genetic_map()

double Sequence::coalsim::integrate_genetic_map ( const std::vector< chromosome > &  sample,
const int &  current_nsam,
const std::vector< double > &  genetic_map,
std::vector< double > *  reclens 
)

When simulating non-uniform recombination rates, the probability of recombination at each point in the simulation needs to be obtained by integrating over the genetic map and the current sample configuration. This function does that.

Parameters
samplethe vector containing the current state of all chromosomes in the sample
current_nsamthe current sample size in the simulation
genetic_mapa vector containing rho/"link" for each link in the sample. For the i-th base-pair in the chromosome, the "link" is the "space between" positions i and i+1. The value of genetic_map[i] is therefore 4Nr between site i and i+1 (sometimes called 4Nr/"site").
reclensa vector of doubles. This vector will be resized to current_nsam in this function, and filled with current_nsam values, each of which is the sum(genetic_map[beg],genetic_map[end-1]) for each chromosome in the sample, where beg and end are the first and last positions in each chromosome. These data are needed by the function pick_spot (Sequence/Coalescent/Recombination.hpp).
Returns
the cummulative recombination rate in the sample, which is obtained by integrating over the ancestral material in the sample and the genetic map.

Definition at line 168 of file CoalescentFragmentsRescaling.cc.

◆ is_descendant()

bool Sequence::coalsim::is_descendant ( marginal::const_iterator  beg,
const int &  ind,
const int &  branch 
)

Ask if a tip of a tree is a descendant of a particular branch.

Parameters
begA pointer to the beginning of a marginal tree, i.e. the return value of marginal::begin()
indthe index of the putative descendant node
branchthe index of the branch of the tree which may be the ancestor of ind
Note
This function does not check whether ind or branch go out of bounds, and so the programmer must ensure that both values are <= 2*nsam-2, where nsam is the total sample size simulated

Definition at line 105 of file CoalescentTreeOperations.cc.

◆ isseg()

bool Sequence::coalsim::isseg ( chromosome::const_iterator  seg,
const unsigned &  nsegs,
const int &  pos,
unsigned *  offset 
)

ask if a chromosome beginning at seg and containing nsegs contains a segment containing the position pos

Parameters
sega pointer to a segment of a chromosome (this should be the 1st segment, such as the return value of chromosome::begin())
nsegsthe number of segs in the chromosome pointed to by seg
offseta pointer to an integer. This integer is used for repeated pointer arithmetic, and should be initalized to 0 before the first call.
posa position along a chromosome. This function asks if pos is contained in the ancestral material of the chromosome whose segments begin at seg
Returns
true if a segment exists that contains the point pos
Note
only used by the function coalesce

Definition at line 30 of file CoalescentCoalesce.cc.

◆ neutral_sample()

template<typename uniform_generator , typename uniform01_generator , typename exponential_generator , typename poisson_generator >
Sequence::SimData Sequence::coalsim::neutral_sample ( uniform_generator &  uni,
uniform01_generator &  uni01,
exponential_generator &  expo,
poisson_generator &  poiss,
const double &  theta,
const double &  rho,
const int &  nsites,
const int &  nsam,
std::vector< chromosome > *  sample,
arg sample_history,
unsigned *  max_chromosomes = NULL,
const unsigned &  max_chromosomes_inc = 0 
)

A simple function to generate samples under a neutral equilibrium model.

A simple function to generate samples under a neutral equilibrium model with infinite-sites mutation and a constant recombination rate accross the region.

Parameters
unia function/object capable of returning a random double uniformly from [0,k)
uni01a function/object capable of returning a random probability uniformly from [0,1)
expoa function/object capable of returning an exponentially distributed random variable. The function must take a single double as an argument, which is the mean of the exponential distribution
poissa function/object capable of returning an poisson distributed random variable. The function must take a single double as an argument, which is the mean of the poisson distribution
theta4Nu, the coalescent-scaled mutation rate
rho4Nr, the recombination rate for the whole region
nsitesthe number of mutational sites to simulate. Recombination is equally likely between any two sites.
nsitesthe total sample size. (There is no population structure in this routine)
sampleA pointer to the sample of chromosomes you wish to simulate. This must be properly initialized, for example using the function init_sample in <Sequence/Coalescent/Initialize.hpp>
sample_historya pointer to the ancestral recombination graph. This must be initialized in the calling enviroment. In general, you can use init_marginal in <Sequence/Coalescent/Initialize.hpp>
max_chromosomesThis is a pointer to an integer in the calling environment which you can use to reserve memory in the array containing the sample of chromosomes. If the size of sample ever gets larger than this, max_chromosomes is incremented by max_chromosomes_inc
max_chromosomes_incthe amount by which to increment max_chromosomes
Note
This function does require a bit of work to use, although not much. Please see the example code that comes with the library, in particular ms–.cc

Definition at line 17 of file NeutralSample.hpp.

◆ pick_branch()

int Sequence::coalsim::pick_branch ( marginal::const_iterator  beg,
const int &  nsam,
const double &  rtime 
)

pick a random branch of a marginal tree

Parameters
begA pointer to the beginning of a marginal tree, i.e. the return value of marginal::begin()
nsamthe total sample size simulated
rtimea (preferably random) double between 0 and the total_time on the marginal tree from which beg is the iterator

Definition at line 54 of file CoalescentTreeOperations.cc.

◆ pick_uniform_spot()

std::pair< int, int > Sequence::coalsim::pick_uniform_spot ( const double &  random_01,
const int &  nlinks,
std::vector< chromosome >::const_iterator  sample_begin,
const unsigned &  current_nsam 
)

Pick a crossover point for the model where recombination rates are constant across a recion. Picks a positions uniformly amongst all chromosomes at which a recombination event will occur.

Parameters
random_01a random uniform deviate U[0,1)
nlinksthe number of links currently in the simulation
sample_beginan iterator pointing to the beginning of the sample
current_nsamthe current sample size in the simulation
Returns
a pair of integers containing the index of the recombinant chromosome (.first), and the position at which the crossover will occur (.second)

Definition at line 52 of file CoalescentRecombination.cc.

◆ rescale_mutation_positions()

void Sequence::coalsim::rescale_mutation_positions ( Sequence::SimData d,
const std::vector< std::pair< double, double > > &  sample_scale,
const std::vector< std::pair< double, double > > &  mutation_scale 
)

Rescales the positions of the mutations in d from the scale given in sample_scale to that given in mutation_scale.

Note
See documentation for calcualate_scales

Definition at line 107 of file CoalescentFragmentsRescaling.cc.

◆ sample_length()

int Sequence::coalsim::sample_length ( const std::vector< std::pair< int, int > > &  fragments)

When simulating partially linked regions, return the total length of sample material that we are simulating.

Returns
The sum of fragments[i].second for i=0 to i=fragments.size()-1

Definition at line 32 of file CoalescentFragmentsRescaling.cc.

◆ total_length()

int Sequence::coalsim::total_length ( const std::vector< std::pair< int, int > > &  fragments)

When simulating partially linked regions, return the total length of the region.

Returns
The sum of fragments[i].first + fragments[i].second for i=0 to i=fragments.size()-1

Definition at line 48 of file CoalescentFragmentsRescaling.cc.

◆ total_time()

double Sequence::coalsim::total_time ( const marginal::const_iterator  beg,
const int &  nsam 
)

Calculate total time on a marginal tree.

Parameters
begA pointer to the beginning of a marginal tree, i.e. the return value of marginal::begin()
nsamthe total sample size simulated
Returns
The total time on the tree.
Note
The scaling of time in the simulation is up to you

Definition at line 32 of file CoalescentTreeOperations.cc.

◆ total_time_on_arg()

double Sequence::coalsim::total_time_on_arg ( const Sequence::coalsim::arg sample_history,
const int &  total_number_of_sites 
)

Returns the total time on an ancestral recombination graph.

Parameters
sample_historyan ancestral recombination graph
total_number_of_sitesthe number of "sites" simulated on the ARG
Returns
The total time on an ancestral recombination graph.
Note
The time is in terms of whatever units are recorded on the nodes of the mariginals of the ARG
Exceptions
std::runtime_errorif the beginning of any marginal tree is >= total_number_of_sites

Definition at line 130 of file CoalescentTreeOperations.cc.