libsequence
1.9.5
|
Summary statistics and other analysis of Sequence::VariantMatrixSee Tutorial/overview. More...
Namespaces | |
Sequence::Recombination | |
Methods dealing with recombination. | |
Classes | |
struct | Sequence::PairwiseLDstats |
Pairwise linkage disequilibrium (LD) stats . More... | |
struct | Sequence::AlleleCounts |
struct | Sequence::GarudStats |
struct | Sequence::nSLiHS |
class | Sequence::PolySIM |
Analysis of coalescent simulation data. More... | |
class | Sequence::PolySNP |
Molecular population genetic analysis. More... | |
class | Sequence::PolyTableSlice< T > |
A container class for "sliding windows" along a polymorphism table. More... | |
class | Sequence::FST |
analysis of population structure using ![]() | |
Functions | |
std::pair< double, double > | Sequence::nSL (const std::size_t &core, const SimData &d, const std::unordered_map< double, double > &gmap=std::unordered_map< double, double >()) __attribute__((deprecated)) |
template<typename F > | |
void | Sequence::sstats_algo::aggregate_sites (const VariantMatrix &m, const F &f, const std::int8_t refstate) |
template<typename F > | |
void | Sequence::sstats_algo::aggregate_sites (const VariantMatrix &m, const F &f, const std::vector< std::int8_t > &refstates) |
std::vector< AlleleCounts > | Sequence::allele_counts (const AlleleCountMatrix &m) |
Count number of alleles at each site. More... | |
std::vector< AlleleCounts > | Sequence::non_reference_allele_counts (const AlleleCountMatrix &m, const std::int8_t refstate) |
Count number of non-reference alleles at each site. More... | |
std::vector< AlleleCounts > | Sequence::non_reference_allele_counts (const AlleleCountMatrix &m, const std::vector< std::int8_t > &refstates) |
Count number of non-reference alleles at each site. More... | |
double | Sequence::tajd (const AlleleCountMatrix &ac) |
Tajima's D. More... | |
double | Sequence::hprime (const AlleleCountMatrix &ac, const std::int8_t refstate) |
double | Sequence::hprime (const AlleleCountMatrix &m, const std::vector< std::int8_t > &refstates) |
double | Sequence::faywuh (const AlleleCountMatrix &ac, const std::int8_t refstate) |
Fay and Wu's H. More... | |
double | Sequence::faywuh (const AlleleCountMatrix &ac, const std::vector< std::int8_t > &refstates) |
Fay and Wu's H. More... | |
std::vector< std::int32_t > | Sequence::difference_matrix (const VariantMatrix &m) |
Calculate number of differences between all samples. More... | |
std::vector< std::int32_t > | Sequence::label_haplotypes (const VariantMatrix &m) |
Assign a unique label to each haplotype. More... | |
std::int32_t | Sequence::number_of_haplotypes (const VariantMatrix &m) |
Calculate the number of haplotypes in a sample. More... | |
double | Sequence::haplotype_diversity (const VariantMatrix &m) |
Calculate the haplotype diversity of a sample. More... | |
std::int32_t | Sequence::rmin (const VariantMatrix &m) |
std::uint32_t | Sequence::nvariable_sites (const AlleleCountMatrix &m) |
Number of polymorphic sites. More... | |
std::uint32_t | Sequence::nbiallelic_sites (const AlleleCountMatrix &m) |
Number of bi-allelic sites. More... | |
std::uint32_t | Sequence::total_number_of_mutations (const AlleleCountMatrix &m) |
Total number of mutations in the sample. More... | |
double | Sequence::thetah (const AlleleCountMatrix &ac, const std::int8_t refstate) |
Fay and Wu's ![]() | |
double | Sequence::thetah (const AlleleCountMatrix &m, const std::vector< std::int8_t > &refstates) |
Fay and Wu's ![]() | |
double | Sequence::thetal (const AlleleCountMatrix &ac, const std::int8_t refstate) |
Zeng et al. ![]() | |
double | Sequence::thetal (const AlleleCountMatrix &m, const std::vector< std::int8_t > &refstates) |
Zeng et al. ![]() | |
double | Sequence::thetapi (const AlleleCountMatrix &ac) |
Mean pairwise differences. More... | |
double | Sequence::thetaw (const AlleleCountMatrix &ac) |
Watterson's theta. More... | |
nSLiHS | Sequence::nsl (const VariantMatrix &m, const std::size_t core, const std::int8_t refstate) |
nSL and iHS statistics More... | |
Summary statistics and other analysis of Sequence::VariantMatrix
See Tutorial/overview.
|
inline |
Helper algorithm for implementing summary statistics.
Several common summary statistics are combinations of others. Examples include Tajima's D, Fay and Wu's H, etc.. If we take D as an example, it is tempting to use existing functions, such as Sequence::thetapi and Sequence::thetaw, as intermediate steps. However, doing so goes over the data multiple times.
Fortunately, these statistics are often easy enough to implement that we could calculate pi and Watterson's theta in one loop. This function helps you do that.
m | A VariantMatrix |
f | A function taking a const StateCounts & and returning nothing. |
refstate | The reference state. |
This function loops over m.nsites and passes the state counts on to the aggregator function f.
See the implementation of Sequence::tajd for an example.
Definition at line 18 of file algorithm.hpp.
|
inline |
Helper algorithm for implementing summary statistics.
Several common summary statistics are combinations of others. Examples include Tajima's D, Fay and Wu's H, etc.. If we take D as an example, it is tempting to use existing functions, such as Sequence::thetapi and Sequence::thetaw, as intermediate steps. However, doing so goes over the data multiple times.
Fortunately, these statistics are often easy enough to implement that we could calculate pi and Watterson's theta in one loop. This function helps you do that.
m | A VariantMatrix |
f | A function taking a const StateCounts & and returning nothing. |
refstates | Vector of reference states |
This function loops over m.nsites and passes the state counts on to the aggregator function f.
See the implementation of Sequence::hprime for an example.
Definition at line 56 of file algorithm.hpp.
count_type Sequence::allele_counts | ( | const AlleleCountMatrix & | m | ) |
Count number of alleles at each site.
m | An AlleleCountMatrix |
Definition at line 50 of file allele_counts.cc.
std::vector< std::int32_t > Sequence::difference_matrix | ( | const VariantMatrix & | m | ) |
Calculate number of differences between all samples.
m | A VariantMatrix |
For samples in m, the output contains
elements. More concretely, the elements are populated according to:
Missing data to not contribute to differences between sequences. Thus, low-quality data may lead to uninformative return values.
Included via Sequence/summstats.hpp or Sequence/summstats/classics.hpp
Definition at line 15 of file haplotype_statistics.cc.
double Sequence::faywuh | ( | const AlleleCountMatrix & | ac, |
const std::int8_t | refstate | ||
) |
Fay and Wu's H.
m | An AlleleCountMatrix |
refstate | The ancestral state. |
This function is included via Sequence/summstats.hpp, Sequence/summstats/classics.hpp Sequence/summstats/thetah.hpp
See [2] for details.
double Sequence::faywuh | ( | const AlleleCountMatrix & | ac, |
const std::vector< std::int8_t > & | refstates | ||
) |
Fay and Wu's H.
m | An AlleleCountMatrix |
refstates | The ancestral state at each site. |
This function is included via Sequence/summstats.hpp, or Sequence/summstats/classics.hpp
See [2] for details.
double Sequence::haplotype_diversity | ( | const VariantMatrix & | m | ) |
Calculate the haplotype diversity of a sample.
m | A VariantMatrix |
The "haplotype heterozygosity" is calculated by counting haplotype labels (see label_haplotypes).
Included via Sequence/summstats.hpp or Sequence/summstats/classics.hpp
See [1] for details.
Definition at line 134 of file haplotype_statistics.cc.
double Sequence::hprime | ( | const AlleleCountMatrix & | ac, |
const std::int8_t | refstate | ||
) |
The H' statistic
m | An AlleleCountMatrix |
refstate | How the ancestral state is encoded. |
See [10] for details.
Included via Sequence/summstats.hpp or Sequence/summstats/classics.hpp
double Sequence::hprime | ( | const AlleleCountMatrix & | m, |
const std::vector< std::int8_t > & | refstates | ||
) |
The H' statistic
m | An AlleleCountMatrix |
refstates | A vector of ancestral states, equal in length to m.sites |
See [10] for details.
Included via Sequence/summstats.hpp or Sequence/summstats/classics.hpp
std::vector< std::int32_t > Sequence::label_haplotypes | ( | const VariantMatrix & | m | ) |
Assign a unique label to each haplotype.
m | A VariantMatrix |
If there are unique samples in m, which represents a sample of size nsam, the return value contains nsam elements whose values are
for "good" input. Here, "bad" input means that some of the samples consist entirely of missing data. In that case, they are given the label of -1.
This function is implemented via a call to difference_matrix.
Included via Sequence/summstats.hpp or Sequence/summstats/classics.hpp
Definition at line 75 of file haplotype_statistics.cc.
std::uint32_t Sequence::nbiallelic_sites | ( | const AlleleCountMatrix & | m | ) |
Number of bi-allelic sites.
Return the number of sites with exactly two non-missing states.
m | An AlleleCountMatrix |
Definition at line 28 of file nvariablesites.cc.
count_type Sequence::non_reference_allele_counts | ( | const AlleleCountMatrix & | m, |
const std::int8_t | refstate | ||
) |
Count number of non-reference alleles at each site.
m | An AlleleCountMatrix |
m | refstate The reference state for all sites. |
Definition at line 80 of file allele_counts.cc.
count_type Sequence::non_reference_allele_counts | ( | const AlleleCountMatrix & | m, |
const std::vector< std::int8_t > & | refstates | ||
) |
Count number of non-reference alleles at each site.
m | An AlleleCountMatrix |
m | refstate The reference state at each site. |
Definition at line 62 of file allele_counts.cc.
nSLiHS Sequence::nsl | ( | const VariantMatrix & | m, |
const std::size_t | core, | ||
const std::int8_t | refstate | ||
) |
nSL and iHS statistics
m | A VariantMatrix |
core | The index of the core site |
refstate | The value of the reference/ancestral allelic state |
See nSL_from_ms.cc for example
See [3] for details.
pair< double, double > Sequence::nSL | ( | const std::size_t & | core, |
const SimData & | d, | ||
const std::unordered_map< double, double > & | gmap = std::unordered_map<double, double>() |
||
) |
The nSL statistic of Ferrer-Admetlla et al. doi: 10.1093/molbev/msu077.
core | The index of the "focal/core" SNP |
d | An object of type Sequence::SimData |
gmap | The positions of every marker in d on the genetic map. If std::unordered_map<double,double>() is passed, iHS is calculated using SNP positions. |
std::int32_t Sequence::number_of_haplotypes | ( | const VariantMatrix & | m | ) |
Calculate the number of haplotypes in a sample.
m | A VariantMatrix |
This returns the number of unique columns in m.
Include via Sequence/summstats.hpp or Sequence/summstats/classics.hpp
See [1] for details.
Definition at line 117 of file haplotype_statistics.cc.
std::uint32_t Sequence::nvariable_sites | ( | const AlleleCountMatrix & | m | ) |
Number of polymorphic sites.
Returns the number of sites with more than one non-missing state
m | An AlleleCountMatrix |
Definition at line 8 of file nvariablesites.cc.
std::int32_t Sequence::rmin | ( | const VariantMatrix & | m | ) |
Hudson and Kaplan's Rmin statistic
m | A VariantMatrix |
Included via Sequence/summstats.hpp or Sequence/summstats/classics.hpp
See [5] for details.
double Sequence::tajd | ( | const AlleleCountMatrix & | ac | ) |
Tajima's D.
m | An AlleleCountMatrix |
See [7] for details.
Included via Sequence/summstats.hpp or Sequence/summstats/classics.hpp
double Sequence::thetah | ( | const AlleleCountMatrix & | ac, |
const std::int8_t | refstate | ||
) |
Fay and Wu's .
m | An AlleleCountMatrix |
refstate | The ancestral state |
See [2] for details.
Definition at line 117 of file thetah_thetal.cc.
double Sequence::thetah | ( | const AlleleCountMatrix & | m, |
const std::vector< std::int8_t > & | refstates | ||
) |
Fay and Wu's .
m | a VariantMatrix |
refstate | Vector of ancestral states. |
See [2] for details.
Definition at line 123 of file thetah_thetal.cc.
double Sequence::thetal | ( | const AlleleCountMatrix & | ac, |
const std::int8_t | refstate | ||
) |
Zeng et al. .
m | An AlleleCountMatrix |
refstate | The ancestral state |
See [10] for details.
Definition at line 130 of file thetah_thetal.cc.
double Sequence::thetal | ( | const AlleleCountMatrix & | m, |
const std::vector< std::int8_t > & | refstates | ||
) |
Zeng et al. .
m | An AlleleCountMatrix |
refstate | Vector of ancestral states. |
See [10] for details.
Definition at line 136 of file thetah_thetal.cc.
double Sequence::thetapi | ( | const AlleleCountMatrix & | ac | ) |
Mean pairwise differences.
m | An AlleleCountMatrix |
This function is included via Sequence/summstats.hpp, Sequence/summstats/classics.hpp or Sequence/summstats/thetapi.hpp
See [6] for details.
Definition at line 7 of file thetapi.cc.
double Sequence::thetaw | ( | const AlleleCountMatrix & | ac | ) |
Watterson's theta.
m | An AlleleCountMatrix |
See [9] for details.
std::uint32_t Sequence::total_number_of_mutations | ( | const AlleleCountMatrix & | m | ) |
Total number of mutations in the sample.
Return where
is
if
, the number of states at the
site, is greater than one, and zero otherwise.
m | An AlleleCountMatrix |
Definition at line 48 of file nvariablesites.cc.