Population Genetic Inference From Resequencing Data

ABSTRACTPopulation-scale genomic datasets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g. only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNN are capable of outperforming expert-derived statistical methods, and offer a new path forward in cases where no likelihood approach exists.

Download Full-text

Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus)

PLoS ONE ◽

10.1371/journal.pone.0157809 ◽

2016 ◽

Vol 11 (6) ◽

pp. e0157809 ◽

Cited By ~ 8

Author(s):

Juan Wang ◽

Dong-Xiu Xue ◽

Bai-Dong Zhang ◽

Yu-Long Li ◽

Bing-Jian Liu ◽

...

Keyword(s):

Population Genetic ◽

Sea Bass ◽

Snp Discovery ◽

Genome Wide ◽

Lateolabrax Maculatus ◽

Population Genetic Inference ◽

Genetic Inference

Download Full-text

Population-Genetic Inference from Pooled-Sequencing Data

Genome Biology and Evolution ◽

10.1093/gbe/evu085 ◽

2014 ◽

Vol 6 (5) ◽

pp. 1210-1218 ◽

Cited By ~ 59

Author(s):

Michael Lynch ◽

Darius Bost ◽

Sade Wilson ◽

Takahiro Maruki ◽

Scott Harrison

Keyword(s):

Population Genetic ◽

Sequencing Data ◽

Pooled Sequencing ◽

Population Genetic Inference ◽

Genetic Inference

Download Full-text

Population genetic inference using a fixed number of segregating sites: a reassessment

Genetics Research ◽

10.1017/s0016672307008877 ◽

2007 ◽

Vol 89 (4) ◽

pp. 231-244 ◽

Cited By ~ 9

Author(s):

SEBASTIÁN E. RAMOS-ONSINS ◽

SYLVAIN MOUSSET ◽

THOMAS MITCHELL-OLDS ◽

WOLFGANG STEPHAN

Keyword(s):

Population Genetic ◽

Standard Procedure ◽

Fixed Number ◽

Single Locus ◽

Coalescent Theory ◽

Evolutionary Models ◽

Neutral Model ◽

Population Genetic Inference ◽

Genetic Inference ◽

Segregating Sites

SummaryCoalescent theory is commonly used to perform population genetic inference at the nucleotide level. Here, we examine the procedure that fixes the number of segregating sites (henceforth the FS procedure). In this approach a fixed number of segregating sites (S) are placed on a coalescent tree (independently of the total and internode lengths of the tree). Thus, although widely used, the FS procedure does not strictly follow the assumptions of coalescent theory and must be considered an approximation of (i) the standard procedure that uses a fixed population mutation parameter θ, and (ii) procedures that condition on the number of segregating sites. We study the differences in the false positive rate for nine statistics by comparing the FS procedure with the procedures (i) and (ii), using several evolutionary models with single-locus and multilocus data. Our results indicate that for single-locus data the FS procedure is accurate for the equilibrium neutral model, but problems arise under the alternative models studied; furthermore, for multilocus data, the FS procedure becomes inaccurate even for the standard neutral model. Therefore, we recommend a procedure that fixes the θ value (or alternatively, procedures that condition on S and take into account the uncertainty of θ) for analysing evolutionary models with multilocus data. With single-locus data, the FS procedure should not be employed for models other than the standard neutral model.

Download Full-text