Allele frequency-free inference of close familial relationships from genotypes or low depth sequencing data

Mapping Intimacies ◽

10.1101/260497 ◽

2018 ◽

Author(s):

Ryan K Waples ◽

Anders Albrechtsen ◽

Ida Moltke

Keyword(s):

Allele Frequency ◽

Allele Frequencies ◽

Model Organisms ◽

Human Populations ◽

Genotype Data ◽

Sequencing Data ◽

Diverse Range ◽

Genomic Position ◽

Familial Relationships ◽

Similar Accuracy

AbstractKnowledge of how individuals are related is important in many areas of research and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data is limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants, and accurate genotype data. But in studies of non-model organisms or ancient human samples, such data is not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied to both genotype data and to low-depth sequencing data where genotypes cannot be accurately called. We evaluate it using data from SNP arrays and low-depth sequencing from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment, which is important for application to a diverse range of populations and species.

Accurate allele frequencies from ultra-low coverage pool-seq samples in evolve-and-resequence experiments

10.1101/244004 ◽

2018 ◽

Author(s):

Susanne Tilk ◽

Alan Bergland ◽

Aaron Goodman ◽

Paul Schmidt ◽

Dmitri Petrov ◽

...

Keyword(s):

Allele Frequency ◽

Model Organism ◽

Software Tool ◽

Allele Frequencies ◽

Model Organisms ◽

Sequencing Data ◽

High Coverage ◽

Next Generation Sequencing Technology ◽

Low Coverage ◽

Pooled Samples

AbstractEvolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of Drosophila melanogaster, we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.

A simple method to estimate the in-house limit of detection for genetic mutations with low allele frequencies in whole-exome sequencing analysis by next-generation sequencing

BMC Genomic Data ◽

10.1186/s12863-020-00956-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Takumi Miura ◽

Satoshi Yasuda ◽

Yoji Sato

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Somatic Mutations ◽

Limit Of Detection ◽

Allele Frequencies ◽

Genetic Mutations ◽

Sequencing Data ◽

Simple Method ◽

Whole Exome ◽

Generation Sequencing

Abstract Background Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency. Results Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%. Conclusions For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.

Decimation by sea star wasting disease and rapid genetic change in a keystone species, Pisaster ochraceus

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1800285115 ◽

2018 ◽

Vol 115 (27) ◽

pp. 7069-7074 ◽

Cited By ~ 12

Author(s):

Lauren M. Schiebelhut ◽

Jonathan B. Puritz ◽

Michael N Dawson

Keyword(s):

Population Dynamics ◽

Allele Frequency ◽

Keystone Species ◽

Allele Frequencies ◽

Sea Star ◽

Sequencing Data ◽

Wasting Disease ◽

Genomic Change ◽

Study Region ◽

Pisaster Ochraceus

Standing genetic variation enables or restricts a population’s capacity to respond to changing conditions, including the extreme disturbances expected to increase in frequency and intensity with continuing anthropogenic climate change. However, we know little about how populations might respond to extreme events with rapid genetic shifts, or how population dynamics may influence and be influenced by population genomic change. We use a range-wide epizootic, sea star wasting disease, that onset in mid-2013 and caused mass mortality in Pisaster ochraceus to explore how a keystone marine species responded to an extreme perturbation. We integrated field surveys with restriction site-associated DNA sequencing data to (i) describe the population dynamics of mortality and recovery, and (ii) compare allele frequencies in mature P. ochraceus before the disease outbreak with allele frequencies in adults and new juveniles after the outbreak, to identify whether selection may have occurred. We found P. ochraceus suffered 81% mortality in the study region between 2012 and 2015, and experienced a concurrent 74-fold increase in recruitment beginning in late 2013. Comparison of pre- and postoutbreak adults revealed significant allele frequency changes at three loci, which showed consistent changes across the large majority of locations. Allele frequency shifts in juvenile P. ochraceus (spawned from premortality adults) were consistent with those seen in adult survivors. Such parallel shifts suggest detectable signals of selection and highlight the potential for persistence of this change in subsequent generations, which may influence the resilience of this keystone species to future outbreaks.

Allele frequency‐free inference of close familial relationships from genotypes or low‐depth sequencing data

Molecular Ecology ◽

10.1111/mec.14954 ◽

2019 ◽

Vol 28 (1) ◽

pp. 35-48 ◽

Cited By ~ 15

Author(s):

Ryan K. Waples ◽

Anders Albrechtsen ◽

Ida Moltke

Keyword(s):

Allele Frequency ◽

Sequencing Data ◽

Familial Relationships

Can we distinguish modes of selective interactions using linkage disequilibrium?

10.1101/2021.03.25.437004 ◽

2021 ◽

Author(s):

Aaron P Ragsdale

Keyword(s):

Linkage Disequilibrium ◽

Allele Frequency ◽

Data Interpretation ◽

Numerical Approach ◽

Interactive Effects ◽

Whole Genome Sequencing Data ◽

Human Populations ◽

Missense Mutations ◽

Sequencing Data ◽

Selective Interactions

Selected mutations interfere and interact with evolutionary processes at nearby loci, distorting allele frequency trajectories and correlations between pairs of mutations. A number of recent studies have used patterns of linkage disequilibrium (LD) between selected variants to test for selective interference and epistatic interactions, with some disagreement over interpreting observations from data. Interpretation is hindered by the relative lack of analytic or even numerical expectations for patterns of variation between pairs of loci under the combined effects of selection, dominance, epistasis, and demography. Here, I develop a numerical approach to compute the expected two-locus sampling distribution under diploid selection with arbitrary epistasis and dominance, recombination, and variable population size. I use this to explore how epistasis and dominance affect expected signed LD, including for non-steady-state demography relevant to human populations. Finally, I use whole-genome sequencing data from humans to assess how well we can differentiate modes of selective interactions in practice. I find that positive LD between missense mutations within genes is driven by strong positive allele-frequency correlations between pairs of mutations that fall within the same conserved domain, pointing to compensatory mutations or antagonistic epistasis as the prevailing mode of interaction within but not outside of conserved genic elements. The heterogeneous landscape of both mutational fitness effects and selective interactions within protein-coding genes calls for more refined inferences of the joint distribution of fitness and interactive effects, and the methods presented here should prove useful in that pursuit.

Maximum Likelihood Estimation of Biological Relatedness from Low Coverage Sequencing Data

10.1101/023374 ◽

2015 ◽

Cited By ~ 27

Author(s):

Mikhail Lipatov ◽

Komal Sanjeev ◽

Rob Patro ◽

Krishna Veeramah

Keyword(s):

Second Generation ◽

Sequence Data ◽

Likelihood Estimation ◽

Allele Frequencies ◽

Human Populations ◽

Sequencing Data ◽

Dna Sequence Data ◽

Second Generation Sequencing ◽

Low Coverage ◽

Generation Sequencing

The inference of biological relatedness from DNA sequence data has a wide array of applications, such as in the study of human disease, anthropology and ecology. One of the most common analytical frameworks for performing this inference is to genotype individuals for large numbers of independent genomewide markers and use population allele frequencies to infer the probability of identity-by-descent (IBD) given observed genotypes. Current implementations of this class of methods assume genotypes are known without error. However, with the advent of second generation sequencing data there are now an increasing number of situations where the confidence attached to a particular genotype may be poor because of low coverage. Such scenarios may lead to biased estimates of the kinship coefficient, Φ. We describe an approach that utilizes genotype likelihoods rather than a single observed best genotype to estimate Φ and demonstrate that we can accurately infer relatedness in both simulated and real second generation sequencing data from a wide variety of human populations down to at least the third degree when coverage is as low as 2x for both individuals, while other commonly used methods such as PLINK exhibit large biases in such situations. In addition the method appears to be robust when the assumed population allele frequencies are diverged from the true frequencies for realistic levels of genetic drift. This approach has been implemented in the C++ software lcmlkin.

Estimation of Cry3Bb1 resistance allele frequency in field populations of western corn rootworm using a genetic marker

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkaa013 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Alan Willse ◽

Lex Flagel ◽

Graham Head

Keyword(s):

Allele Frequency ◽

Western Corn Rootworm ◽

Allele Frequencies ◽

Chromosome 8 ◽

Corn Belt ◽

Corn Rootworm ◽

Resistance Allele ◽

Response To Selection ◽

The Us ◽

Causal Allele

Abstract Following the discovery of western corn rootworm (WCR; Diabrotica virgifera virgifera) populations resistant to the Bacillus thuringiensis (Bt) protein Cry3Bb1, resistance was genetically mapped to a single locus on WCR chromosome 8 and linked SNP markers were shown to correlate with the frequency of resistance among field-collected populations from the US Corn Belt. The purpose of this paper is to further investigate the relationship between one of these resistance-linked markers and the causal resistance locus. Using data from laboratory bioassays and field experiments, we show that one allele of the resistance-linked marker increased in frequency in response to selection, but was not perfectly linked to the causal resistance allele. By coupling the response to selection data with a genetic model of the linkage between the marker and the causal allele, we developed a model that allowed marker allele frequencies to be mapped to causal allele frequencies. We then used this model to estimate the resistance allele frequency distribution in the US Corn Belt based on collections from 40 populations. These estimates suggest that chromosome 8 Cry3Bb1 resistance allele frequency was generally low (<10%) for 65% of the landscape, though an estimated 13% of landscape has relatively high (>25%) resistance allele frequency.

Molecular and phenotypic analysis of rodent models reveals conserved and species-specific modulators of human sarcopenia

Communications Biology ◽

10.1038/s42003-021-01723-z ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Anastasiya Börsch ◽

Daniel J. Ham ◽

Nitish Mittal ◽

Lionel A. Tintignac ◽

Eugenia Migliavacca ◽

...

Keyword(s):

Muscle Mass ◽

Inflammatory Responses ◽

Molecular Data ◽

Model Organisms ◽

Sequencing Data ◽

Phenotypic Analysis ◽

Age Related ◽

Analogous Data ◽

Species Specific ◽

And Function

AbstractSarcopenia, the age-related loss of skeletal muscle mass and function, affects 5–13% of individuals aged over 60 years. While rodents are widely-used model organisms, which aspects of sarcopenia are recapitulated in different animal models is unknown. Here we generated a time series of phenotypic measurements and RNA sequencing data in mouse gastrocnemius muscle and analyzed them alongside analogous data from rats and humans. We found that rodents recapitulate mitochondrial changes observed in human sarcopenia, while inflammatory responses are conserved at pathway but not gene level. Perturbations in the extracellular matrix are shared by rats, while mice recapitulate changes in RNA processing and autophagy. We inferred transcription regulators of early and late transcriptome changes, which could be targeted therapeutically. Our study demonstrates that phenotypic measurements, such as muscle mass, are better indicators of muscle health than chronological age and should be considered when analyzing aging-related molecular data.

The Dynamics of Gynodioecy in Plantago lanceolatu L. II. Mode of Action and Frequencies of Restorer Alleles

Genetics ◽

10.1093/genetics/147.3.1317 ◽

1997 ◽

Vol 147 (3) ◽

pp. 1317-1328

Author(s):

Anita A de Haan ◽

Hans P Koelewijn ◽

Maria P J Hundscheid ◽

Jos M M Van Damme

Keyword(s):

Cytoplasmic Male Sterility ◽

Male Sterility ◽

Allele Frequency ◽

Mode Of Action ◽

Male Fertility ◽

Fertility Restoration ◽

Plantago Lanceolata ◽

Allele Frequencies ◽

Male Sterile ◽

Male Fertility Restoration

Male fertility in Plantago lanceolata is controlled by the interaction of cytoplasmic and nuclear genes. Different cytoplasmic male sterility (CMS) types can be either male sterile or hermaphrodite, depending on the presence of nuclear restorer alleles. In three CMS types of P. lanceolata (CMSI, CMSIIa, and CMSIIb) the number of loci involved in male fertility restoration was determined. In each CMS type, male fertility was restored by multiple genes with either dominant or recessive action and capable either of restoring male fertility independently or in interaction with each other (epistasis). Restorer allele frequencies for CMSI, CMSIIa and CMSIIb were determined by crossing hermaphrodites with “standard” male steriles. Segregation of male steriles vs. non-male steriles was used to estimate overall restorer allele frequency. The frequency of restorer alleles was different for the CMS types: restorer alleles for CMSI were less frequent than for CMSIIa and CMSIIb. On the basis of the frequencies of male steriles and the CMS types an “expected” restorer allele frequency could be calculated. The correlation between estimated and expected restorer allele frequency was significant.

Population-specific genome graphs improve high-throughput sequencing data analysis: A case study on the Pan-African genome

10.1101/2021.03.19.436173 ◽

2021 ◽

Author(s):

H. Serhat Tetikol ◽

Kubra Narci ◽

Deniz Turgut ◽

Gungor Budak ◽

Ozem Kalay ◽

...

Keyword(s):

High Throughput Sequencing ◽

Information Overload ◽

African Ancestry ◽

Sample Selection ◽

Variant Calling ◽

Population Diversity ◽

Human Populations ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Graph Augmentation

ABSTRACTGraph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference for capturing the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based bioinformatics toolkits, how to curate genomic variants and subsequently construct genome graphs remains an understudied problem that inevitably determines the effectiveness of the end-to-end bioinformatics pipeline. In this study, we discuss major obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and test the proposed approach on the whole-genome samples of African ancestry. Our results show that, as more representative alternatives to linear or generic graph references, population-specific graphs can achieve significantly lower read mapping errors, increased variant calling sensitivity and provide the improvements of joint variant calling without the need of computationally intensive post-processing steps.