allele frequency spectrum
Recently Published Documents


TOTAL DOCUMENTS

39
(FIVE YEARS 11)

H-INDEX

13
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Raquel Dias ◽  
Doug Evans ◽  
Shang-Fu Chen ◽  
Kai-Yu Chen ◽  
Leslie Chan ◽  
...  

AbstractGenotype imputation is a foundational tool for population genetics. Standard statistical imputation approaches rely on the co-location of large whole-genome sequencing-based reference panels, powerful computing environments, and potentially sensitive genetic study data. This results in computational resource and privacy-risk barriers to access to cutting-edge imputation techniques. Moreover, the accuracy of current statistical approaches is known to degrade in regions of low and complex linkage disequilibrium.Artificial neural network-based imputation approaches may overcome these limitations by encoding complex genotype relationships in easily portable inference models. Here we demonstrate an autoencoder-based approach for genotype imputation, using a large, commonly-used reference panel, and spanning the entirety of human chromosome 22. Our autoencoder-based genotype imputation strategy achieved superior imputation accuracy across the allele-frequency spectrum and across genomes of diverse ancestry, while delivering at least 4-fold faster inference run time relative to standard imputation tools.


Author(s):  
Simon H Martin ◽  
William Amos

Abstract The detection of introgression from genomic data is transforming our view of species and the origins of adaptive variation. Among the most widely used approaches to detect introgression is the so-called ABBA–BABA test or D-statistic, which identifies excess allele sharing between nonsister taxa. Part of the appeal of D is its simplicity, but this also limits its informativeness, particularly about the timing and direction of introgression. Here we present a simple extension, D frequency spectrum or DFS, in which D is partitioned according to the frequencies of derived alleles. We use simulations over a large parameter space to show how DFS carries information about various factors. In particular, recent introgression reliably leads to a peak in DFS among low-frequency derived alleles, whereas violation of model assumptions can lead to a lack of signal at low frequencies. We also reanalyze published empirical data from six different animal and plant taxa, and interpret the results in the light of our simulations, showing how DFS provides novel insights. We currently see DFS as a descriptive tool that will augment both simple and sophisticated tests for introgression, but in the future it may be usefully incorporated into probabilistic inference frameworks.


2020 ◽  
Author(s):  
Simon H. Martin ◽  
William Amos

ABSTRCTThe detection of introgression from genomic data is transforming our view of species and the origins of adaptive variation. Among the most widely used approaches to detect introgression is the so-called ABBA BABA test or D statistic, which identifies excess allele sharing between non-sister taxa. Part of the appeal of D is its simplicity, but this also limits its informativeness, particularly about the timing and direction of introgression. Here we present a simple extension, D frequency spectrum or DFS, in which D is partitioned according to the frequencies of derived alleles. We use simulations over a large parameter space to show how DFS caries information about various factors. In particular, recent introgression reliably leads to a peak in DFS among low-frequency derived alleles, whereas violation of model assumptions can lead to a lack of signal at low-frequencies. We also reanalyse published empirical data from six different animal and plant taxa, and interpret the results in the light of our simulations, showing how DFS provides novel insights. We currently see DFS as a descriptive tool that will augment both simple and sophisticated tests for introgression, but in the future it may be usefully incorporated into probabilistic inference frameworks.


Author(s):  
Bornali Bhattacharjee ◽  
Bhaswati Pandit

AbstractThe first Indian cases of COVID-19 caused by SARS-Cov-2 were reported in February 29, 2020 with a history of travel from Wuhan, China and so far above 4500 deaths have been attributed to this pandemic. The objectives of this study were to characterize Indian SARS-CoV-2 genome-wide nucleotide variations, trace ancestries using phylogenetic networks and correlate state-wise distribution of viral haplotypes with differences in mortality rates. A total of 305 whole genome sequences from 19 Indian states were downloaded from GISAID. Sequences were aligned using the ancestral Wuhan-Hu genome sequence (NC_045512.2). A total of 633 variants resulting in 388 amino acid substitutions were identified. Allele frequency spectrum, and nucleotide diversity (π) values revealed the presence of higher proportions of low frequency variants and negative Tajima’s D values across ORFs indicated the presence of population expansion. Network analysis highlighted the presence of two major clusters of viral haplotypes, namely, clade G with the S:D614G, RdRp: P323L variants and a variant of clade L [Lv] having the RdRp:A97V variant. Clade G genomes were found to be evolving more rapidly into multiple sub-clusters including clade GH and GR and were also found in higher proportions in three states with highest mortality rates namely, Gujarat, Madhya Pradesh and West Bengal.


GigaScience ◽  
2020 ◽  
Vol 9 (3) ◽  
Author(s):  
Ekaterina Noskova ◽  
Vladimir Ulyantsev ◽  
Klaus-Peter Koepfli ◽  
Stephen J O’Brien ◽  
Pavel Dobrynin

Abstract Background The demographic history of any population is imprinted in the genomes of the individuals that make up the population. One of the most popular and convenient representations of genetic information is the allele frequency spectrum (AFS), the distribution of allele frequencies in populations. The joint AFS is commonly used to reconstruct the demographic history of multiple populations, and several methods based on diffusion approximation (e.g., ∂a∂i) and ordinary differential equations (e.g., moments) have been developed and applied for demographic inference. These methods provide an opportunity to simulate AFS under a variety of researcher-specified demographic models and to estimate the best model and associated parameters using likelihood-based local optimizations. However, there are no known algorithms to perform global searches of demographic models with a given AFS. Results Here, we introduce a new method that implements a global search using a genetic algorithm for the automatic and unsupervised inference of demographic history from joint AFS data. Our method is implemented in the software GADMA (Genetic Algorithm for Demographic Model Analysis, https://github.com/ctlab/GADMA). Conclusions We demonstrate the performance of GADMA by applying it to sequence data from humans and non-model organisms and show that it is able to automatically infer a demographic model close to or even better than the one that was previously obtained manually. Moreover, GADMA is able to infer multiple demographic models at different local optima close to the global one, providing a larger set of possible scenarios to further explore demographic history.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Ludovica Molinaro ◽  
Francesco Montinaro ◽  
Burak Yelmen ◽  
Davide Marnetto ◽  
Doron M. Behar ◽  
...  

AbstractThe presence of genomic signatures of Eurasian origin in contemporary Ethiopians has been reported by several authors and estimated to have arrived in the area from 3000 years ago. Several studies reported plausible source populations for such a signature, using haplotype based methods on modern data or single-site methods on modern or ancient data. These studies did not reach a consensus and suggested an Anatolian or Sardinia-like proxy, broadly Levantine or Neolithic Levantine as possible sources. We demonstrate, however, that the deeply divergent, autochthonous African component which accounts for ~50% of most contemporary Ethiopian genomes, affects the overall allele frequency spectrum to an extent that makes it hard to control for it and, at once, to discern between subtly different, yet important, Eurasian sources (such as Anatolian or Levant Neolithic ones). Here we re-assess pattern of allele sharing between the Eurasian component of Ethiopians (here called “NAF” for Non African) and ancient and modern proxies. Our results unveil a genomic legacy that may connect the Eurasian genetic component of contemporary Ethiopians with Sea People and with population movements that affected the Mediterranean area and the Levant after the fall of the Minoan civilization.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Robert Fragoza ◽  
Jishnu Das ◽  
Shayne D. Wierbowski ◽  
Jin Liang ◽  
Tina N. Tran ◽  
...  

Abstract Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual’s genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations.


Sign in / Sign up

Export Citation Format

Share Document