Pipeline for the G2D test

Pipeline for the G2D test

Overview of the required steps for performing the “G2D” test (Nielsen et al. (2009) Genome Res. 19:838) on bidimensional site-frequency spectra.

The pipeline starts with SNP frequency data for each population. Here, read counts in pools were taken as crude estimators of allele frequencies. As all the loci (SNP) must have the same number of genes genotyped, the loci with depth < 40 per pool were discarded, and loci with depth > 40 were subsampled to obtain a uniform sample size of N = 40 for all loci.

Raw data were used to compute 2D-SFS (bidimensional site-frequency spectra) for pairs of populations. A maximum-likelihood methods (implemented in Fastsimcoal2.51) was then used to choose among alternative scenarii and to estimate parameters (see Simulations).

One-hundred fresh simulations were run based on the set of max-likelihood parameters through  Fastsimcoal2.51, with the same number of sequences, sequence lengths and expected number of SNP per sequence as in the raw contig data. These simulated data generated one-hundred sets of genome-wide 2D-SFS's and 100 * 2883 = 288300 contig-level 2D-SFS's. The G-statistic was computed for each contig, and provided the “empirical” distribution of G values (see EmpiricalGvalues), conditional upon the choosen demographic model. The experimental, genome-level and contig-level 2D-SFS's were used to compute experimental G values, that were compared against the empirical G-value distribution to compute their empirical P-values.

PipelineTestIS

Date de modification : 22 juin 2023 | Date de création : 16 juillet 2015 | Rédaction : Ivan Scotti