scholarly journals Detecting archaic introgression without archaic reference genomes

2018 ◽  
Author(s):  
Laurits Skov ◽  
Ruoyun Hui ◽  
Asger Hobolth ◽  
Aylwyn Scally ◽  
Mikkel Heide Schierup ◽  
...  

AbstractHuman populations out of Africa have experienced at least two bouts of introgression from archaic humans, from Neanderthals and Denisovans. In Papuans there is prior evidence of both these introgressions. Here we present a new approach to detect segments of individual genomes of archaic origin without using an archaic reference genome. The approach is based on a hidden Markov model that identifies genomic regions with a high density of single nucleotide variants (SNVs) not seen in unadmixed populations. We show using simulations that this provides a powerful approach to identifying segments of archaic introgression with a small rate of false detection. Furthermore our approach is able to accurately infer admixture proportions and divergence time of human and archaic populations.We apply the model to detect archaic introgression in 89 Papuans and show how the identified segments can be assigned to likely Neanderthal or Denisovan origin. We report more Denisovan admixture than previous studies and directly find a shift in size distribution of fragments of Neanderthal and Denisovan origin that is compatible with a difference in admixture time. Furthermore, we identify small amounts of Denisova ancestry in West Eurasians, South East Asians and South Asians.

ESC CardioMed ◽  
2018 ◽  
pp. 669-671
Author(s):  
Eric Schulze-Bahr

The human genome consists of approximately 3 billion (3 × 109) base pairs of DNA (around 20,000 genes), organized as 23 chromosomes (diploid parental set), and a small mitochondrial genome (37 genes, including 13 proteins; 16,589 base pairs) of maternal origin. Most human genetic variation is natural, that is, common or rare (minor allele frequency >0.1%) and does not cause disease—apart from every true disease-causing (bona fide) mutation each individual genome harbours more than 3.5 million single nucleotide variants (including >10,000 non-synonymous changes causing amino acid substitutions) and 200–300 large structural or copy number variants (insertions/deletions, up to several thousands of base-pairs) that are non-disease-causing variations and scattered throughout coding and non-coding genomic regions.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Robert Fragoza ◽  
Jishnu Das ◽  
Shayne D. Wierbowski ◽  
Jin Liang ◽  
Tina N. Tran ◽  
...  

Abstract Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual’s genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations.


2017 ◽  
Author(s):  
Matthew A. Field ◽  
Gaetan Burgio ◽  
Jalila Al Shekaili ◽  
Simon J. Foote ◽  
Matthew C. Cook ◽  
...  

AbstractIdentification of sequence variation from short-read sequence data is subject to common-yet-intermittent miscalling that occurs in a sequence intrinsic manner. We identify that recurrent false positive single nucleotide variants are strongly present in databases of human sequence variation and demonstrate how each individual sample generates a unique set of recurrent false positive variants. These recurrent miscalls result from known difficulties aligning short-read sequence data between redundant genomic regions. We could replicate, catalogue and remove three quarters of these recurrent miscalls for any given exome with as little as ten rounds of read resampling, realignment and recalling. The removal of such misleading variants reduces the search space for identification of disease causing variants.List of AbbreviationsSNVsingle nucleotide variantRFPrecurrent false positiveENUN-ethyl-N-nitrosourea


Science ◽  
2019 ◽  
Vol 366 (6463) ◽  
pp. eaax2083 ◽  
Author(s):  
PingHsun Hsieh ◽  
Mitchell R. Vollger ◽  
Vy Dang ◽  
David Porubsky ◽  
Carl Baker ◽  
...  

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jonathan P. A. Gardner ◽  
Catarina N. S. Silva ◽  
Craig R. Norrie ◽  
Brendon J. Dunphy

AbstractThe New Zealand green-lipped mussel aquaculture industry is largely dependent on the supply of young mussels that wash up on Ninety Mile Beach (so-called Kaitaia spat), which are collected and trucked to aquaculture farms. The locations of source populations of Kaitaia spat are unknown and this lack of knowledge represents a major problem because spat supply may be irregular. We combined genotypic (microsatellite) and phenotypic (shell geochemistry) data in a geospatial framework to determine if this new approach can help identify source populations of mussels collected from two spat-collecting and four non-spat-collecting sites further south. Genetic analyses resolved differentiated clusters (mostly three clusters), but no obvious source populations. Shell geochemistry analyses resolved six differentiated clusters, as did the combined genotypic and phenotypic data. Analyses revealed high levels of spatial and temporal variability in the geochemistry signal. Whilst we have not been able to identify the source site(s) of Kaitaia spat our analyses indicate that geospatial testing using combined genotypic and phenotypic data is a powerful approach. Next steps should employ analyses of single nucleotide polymorphism markers with shell geochemistry and in conjunction with high resolution physical oceanographic modelling to resolve the longstanding question of the origin of Kaitaia spat.


2018 ◽  
Author(s):  
Paul Guilhamon ◽  
Mathieu Lupien

AbstractMotivationSingle Nucleotide Variants (SNVs), including somatic point mutations and Single Nucleotide Polymorphisms (SNPs), in noncoding cis-regulatory elements (CREs) can affect gene regulation and lead to disease development (Zhou et al., 2016; Zhang et al., 2014). Others have previously developed methods to identify important clusters of somatic point mutations based on proximity (Weinhold et al., 2014) or the enrichment of inherited risk-SNPs at CREs (Ahmed et al., 2017). Here, we present SMuRF (Significantly Mutated Region Finder), a user-friendly command-line tool to identify these significantly mutated regions from user-defined genomic intervals and SNVs.ResultsSMuRF identified 72 significantly mutated CREs in liver cancer, including known mutated gene promoters as well as previously unreported regions.AvailabilityThe source code for SMuRF is open-source and freely available on GitHub (https://github.com/LupienLabOrganization/SMuRF) under the GNU GPLv3 license. SMuRF is implemented in Bash and R; it runs on any platform with Bash (≥4.1.2), R (≥3.3.0) and BEDTools (≥2.26.0). It requires the following R packages: GenomicRanges, gtools, gplots, ggplot2, data.table, psych, and dplyr.Supplementary InformationSupplementary information available at Bioinformatics [email protected]; [email protected]


2021 ◽  
Author(s):  
Jason A Rothman ◽  
Theresa B Loveless ◽  
Joseph Kapcia ◽  
Eric D Adams ◽  
Joshua A Steele ◽  
...  

Abstract: Municipal wastewater provides an integrated sample of a diversity of human-associated microbes across a sewershed, including viruses. Wastewater-based epidemiology (WBE) is a promising strategy to detect pathogens and may serve as an early-warning system for disease outbreaks. Notably, WBE has garnered substantial interest during the COVID-19 pandemic to track disease burden through analyses of SARS-CoV-2 RNA. Throughout the COVID-19 outbreak, tracking SARS-CoV-2 in wastewater has been an important tool for understanding the spread of the virus. Unlike traditional sequencing of SARS-CoV-2 isolated from clinical samples, which adds testing burden to the healthcare system, in this study, metatranscriptomics was used to sequence virus directly from wastewater. Here, we present a study in which we explored RNA viral diversity through sequencing 94 wastewater influent samples across seven treatment plants (WTPs), collected August 2020 - January 2021, representing approximately 16 million people in Southern California. Enriched viral libraries identified a wide diversity of RNA viruses that differed between WTPs and over time, with detected viruses including coronaviruses, influenza A, and noroviruses. Furthermore, single nucleotide variants (SNVs) of SARS-CoV-2 were identified in wastewater and we measured proportions of overall virus and SNVs across several months. We detected several SNVs that are markers for clinically-important SARS-CoV-2 variants, along with SNVs of unknown function, prevalence, or epidemiological consequence. Our study shows the potential of WBE to detect viruses in wastewater and to track the diversity and spread of viral variants in urban and suburban locations, which may aid public health efforts to monitor disease outbreaks. Importance: Wastewater based epidemiology (WBE) can detect pathogens across sewersheds, which represents the collective waste of human populations. As there is a wide diversity of RNA viruses in wastewater, monitoring the presence of these viruses is useful for public health, industry, and ecological studies. Specific to public health, WBE has proven valuable during the COVID-19 pandemic to track the spread of SARS-CoV-2 without adding burden to healthcare systems. In this study, we used metatranscriptomics and RT-ddPCR to assay RNA viruses across Southern California wastewater from August 2020 - January 2021, representing approximately 16 million people from Los Angeles, Orange, and San Diego counties. We found that SARS-CoV-2 quantification in wastewater correlates well with county-wide COVID-19 case data, and that we can detect SARS-CoV-2 single nucleotide variants through sequencing. Likewise, WTPs harbored different viromes, and we detected other human pathogens such as noroviruses and adenoviruses, furthering our understanding of wastewater viral ecology.


Author(s):  
Jason A. Rothman ◽  
Theresa B. Loveless ◽  
Joseph Kapcia ◽  
Eric D. Adams ◽  
Joshua A. Steele ◽  
...  

Municipal wastewater provides an integrated sample of a diversity of human-associated microbes across a sewershed, including viruses. Wastewater-based epidemiology (WBE) is a promising strategy to detect pathogens and may serve as an early-warning system for disease outbreaks. Notably, WBE has garnered substantial interest during the COVID-19 pandemic to track disease burden through analyses of SARS-CoV-2 RNA. Throughout the COVID-19 outbreak, tracking SARS-CoV-2 in wastewater has been an important tool for understanding the spread of the virus. Unlike traditional sequencing of SARS-CoV-2 isolated from clinical samples, which adds testing burden to the healthcare system, in this study, metatranscriptomics was used to sequence virus directly from wastewater. Here, we present a study in which we explored RNA viral diversity through sequencing 94 wastewater influent samples across seven treatment plants (WTPs), collected August 2020 – January 2021, representing approximately 16 million people in Southern California. Enriched viral libraries identified a wide diversity of RNA viruses that differed between WTPs and over time, with detected viruses including coronaviruses, influenza A, and noroviruses. Furthermore, single nucleotide variants (SNVs) of SARS-CoV-2 were identified in wastewater and we measured proportions of overall virus and SNVs across several months. We detected several SNVs that are markers for clinically-important SARS-CoV-2 variants, along with SNVs of unknown function, prevalence, or epidemiological consequence. Our study shows the potential of WBE to detect viruses in wastewater and to track the diversity and spread of viral variants in urban and suburban locations, which may aid public health efforts to monitor disease outbreaks. Importance: Wastewater based epidemiology (WBE) can detect pathogens across sewersheds, which represents the collective waste of human populations. As there is a wide diversity of RNA viruses in wastewater, monitoring the presence of these viruses is useful for public health, industry, and ecological studies. Specific to public health, WBE has proven valuable during the COVID-19 pandemic to track the spread of SARS-CoV-2 without adding burden to healthcare systems. In this study, we used metatranscriptomics and RT-ddPCR to assay RNA viruses across Southern California wastewater from August 2020 – January 2021, representing approximately 16 million people from Los Angeles, Orange, and San Diego counties. We found that SARS-CoV-2 quantification in wastewater correlates well with county-wide COVID-19 case data, and that we can detect SARS-CoV-2 single nucleotide variants through sequencing. Likewise, WTPs harbored different viromes, and we detected other human pathogens such as noroviruses and adenoviruses, furthering our understanding of wastewater viral ecology.


2020 ◽  
Vol 6 (50) ◽  
pp. eabd9230
Author(s):  
Yuta Suzuki ◽  
Eugene W. Myers ◽  
Shinichi Morishita

Our understanding of centromere sequence variation across human populations is limited by its extremely long nested repeat structures called higher-order repeats that are challenging to sequence. Here, we analyzed chromosomes 11, 17, and X using long-read sequencing data for 36 individuals from diverse populations including a Han Chinese trio and 21 Japanese. We revealed substantial structural diversity with many previously unidentified variant higher-order repeats specific to individuals characterizing rapid, haplotype-specific evolution of human centromeric arrays, while frequent single-nucleotide variants are largely conserved. We found a characteristic pattern shared among prevalent variants in human and chimpanzee. Our findings pave the way for studying sequence evolution in human and primate centromeres.


Sign in / Sign up

Export Citation Format

Share Document