Example workflow ================ We'll work with pylibseq's wrapper to libsequence's SimData, which is used to process bi-allele data encoded as 0/1 = ancestral/derived, respectively .. ipython:: python from __future__ import print_function import libsequence Assigning to an object ---------------------- You may assign from a list of tuples. Each tuple is a site: (pos:genotypes) Here, there are 2 sites and a sample size of :math:`n=4` .. ipython:: python rawData1 = [(0.1,'0101'),(0.2,'1010')] #We can construct objects straight from these tuples sd = libsequence.SimData(rawData1) sd.size() sd.pos() sd.data() , you can assign from separate list of positions and haplotypes .. ipython:: python rawDataPos = [0.1,0.2] rawDataGenos = ['01','10','01','10'] sd.assign(rawDataPos,rawDataGenos) sd.numsites() sd.size() sd.pos() sd.data() Summary statistics ------------------ Let's calculate some basic summary statistics See :class:`libsequence.summstats.PolySIM` for more documentation .. ipython:: python #ms 10 1 -s 10 -I 2 5 5 0.05 rawDataPos=[0.0997, 0.2551, 0.3600, 0.4831, 0.5205, 0.5668, 0.5824, 0.6213, 0.7499, 0.9669] rawDataGenos=['0000001010', '1000000011', '1000001010', '1000000010', '1000001010', '1111010100', '1111010100', '1111110100', '1111010100', '1111010100'] sd.assign(rawDataPos,rawDataGenos) ps = libsequence.PolySIM(sd) ps.thetapi() ps.thetaw() ps.tajimasd() Filtering sites ------------------ You may also filter sites from your variation tables using :func:`libsequence.removeColumns`, which operates on objects of type :class:`libsequence.StateCounter`. Our data look like this right now: .. ipython:: python print(sd) Let's remove derived singletons: .. ipython:: python sd2 = libsequence.removeColumns(sd,lambda x : x.one != 1) print(sd2) Let's remove all singletons: .. ipython:: python sd3 = libsequence.removeColumns(sd,lambda x: x.one !=1 and x.zero != 1) print(sd3) .. note:: Yes, it is odd that the column removal function removes sites for which the lambda returns False. I'll fix that in a future release, which requires an upstream change to libsequence. Sliding windows --------------- .. ipython:: python w = libsequence.Windows(sd,window_size=0.1,step_len=0.05,starting_pos=0.,ending_pos=1.0) len(w) for i in range(len(w)): #Each window is a simData wi = w[i] pswi = libsequence.PolySIM(wi) print(pswi.thetaw()) Linkage disequilibrium ---------------------- The function ``libsequence.ld`` returns pairwise LD stats as a ``list`` of ``dict``\ s. The return value is easily coerced into a ``pandas.DataFrame``. :math:`F_{ST}` -------------- Let's pretend that our data are from two demes of sizes n/2 each. Note that most flavors of :math:`F_{ST}` are very similar to one another. See Charlesworth, B. (1998) Mol. Biol. Evol. 15(5): 538-543 for a great overview. .. ipython:: python import libsequence sd.size() f = libsequence.Fst(sd,[5,5]) #Hudson, Slatkin, and Maddison's FST: f.hsm() #Slatkin's f.slatkin() #Hudson, Boos, and Kaplan, which is also Nei's Gst: f.hbk() #Positions of snps shared b/w demes 0 and 1 f.shared(0,1) #Positions of private mutations in deme 0 and 1: f.priv(0,1) #Positions of fixed differences between demes 0 and 1: f.fixed(0,1)