scholarly journals Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes

Science ◽  
2019 ◽  
Vol 366 (6463) ◽  
pp. eaax2083 ◽  
Author(s):  
PingHsun Hsieh ◽  
Mitchell R. Vollger ◽  
Vy Dang ◽  
David Porubsky ◽  
Carl Baker ◽  
...  

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.

2018 ◽  
Author(s):  
Whitney Whitford ◽  
Klaus Lehnert ◽  
Russell G. Snell ◽  
Jessie C. Jacobsen

AbstractBackgroundThe popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However the additional source of information from read balance, defined as relative proportion of reads of each allele at each position, has been underutilised in the existing applications.ResultsWe present Read Balance Validator (RBV), a bioinformatic tool which uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report.ConclusionsRBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV


2008 ◽  
Vol 123 (1-4) ◽  
pp. 224-233 ◽  
Author(s):  
N. Takahashi ◽  
Y. Satoh ◽  
M. Kodaira ◽  
H. Katayama

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Whitney Whitford ◽  
Klaus Lehnert ◽  
Russell G. Snell ◽  
Jessie C. Jacobsen

AbstractThe popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However, the additional source of information from read balance (defined as relative proportion of reads of each allele at each position) has been underutilised in the existing applications. Here we present Read Balance Validator (RBV), a bioinformatic tool that uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report. RBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV.


2016 ◽  
Vol 94 (suppl_5) ◽  
pp. 146-146
Author(s):  
D. M. Bickhart ◽  
L. Xu ◽  
J. L. Hutchison ◽  
J. B. Cole ◽  
D. J. Null ◽  
...  

Author(s):  
Pamela Wiener ◽  
Christelle Robert ◽  
Abulgasim Ahbara ◽  
Mazdak Salavati ◽  
Ayele Abebe ◽  
...  

Abstract Great progress has been made over recent years in the identification of selection signatures in the genomes of livestock species. This work has primarily been carried out in commercial breeds for which the dominant selection pressures, are associated with artificial selection. As agriculture and food security are likely to be strongly affected by climate change, a better understanding of environment-imposed selection on agricultural species is warranted. Ethiopia is an ideal setting to investigate environmental adaptation in livestock due to its wide variation in geo-climatic characteristics and the extensive genetic and phenotypic variation of its livestock. Here, we identified over three million single nucleotide variants across 12 Ethiopian sheep populations and applied landscape genomics approaches to investigate the association between these variants and environmental variables. Our results suggest that environmental adaptation for precipitation-related variables is stronger than that related to altitude or temperature, consistent with large-scale meta-analyses of selection pressure across species. The set of genes showing association with environmental variables was enriched for genes highly expressed in human blood and nerve tissues. There was also evidence of enrichment for genes associated with high-altitude adaptation although no strong association was identified with hypoxia-inducible-factor (HIF) genes. One of the strongest altitude-related signals was for a collagen gene, consistent with previous studies of high-altitude adaptation. Several altitude-associated genes also showed evidence of adaptation with temperature, suggesting a relationship between responses to these environmental factors. These results provide a foundation to investigate further the effects of climatic variables on small ruminant populations.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Leandro de Araújo Lima ◽  
Ana Cecília Feio-dos-Santos ◽  
Sintia Iole Belangero ◽  
Ary Gadelha ◽  
Rodrigo Affonseca Bressan ◽  
...  

Abstract Many studies have attempted to investigate the genetic susceptibility of Attention-Deficit/Hyperactivity Disorder (ADHD), but without much success. The present study aimed to analyze both single-nucleotide and copy-number variants contributing to the genetic architecture of ADHD. We generated exome data from 30 Brazilian trios with sporadic ADHD. We also analyzed a Brazilian sample of 503 children/adolescent controls from a High Risk Cohort Study for the Development of Childhood Psychiatric Disorders, and also previously published results of five CNV studies and one GWAS meta-analysis of ADHD involving children/adolescents. The results from the Brazilian trios showed that cases with de novo SNVs tend not to have de novo CNVs and vice-versa. Although the sample size is small, we could also see that various comorbidities are more frequent in cases with only inherited variants. Moreover, using only genes expressed in brain, we constructed two “in silico” protein-protein interaction networks, one with genes from any analysis, and other with genes with hits in two analyses. Topological and functional analyses of genes in this network uncovered genes related to synapse, cell adhesion, glutamatergic and serotoninergic pathways, both confirming findings of previous studies and capturing new genes and genetic variants in these pathways.


2020 ◽  
Author(s):  
Stephen Cristiano ◽  
David McKean ◽  
Jacob Carey ◽  
Paige Bracci ◽  
Paul Brennan ◽  
...  

AbstractGermline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging. We developed an approach called CNPBayes to identify latent batch effects, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. We demonstrate this approach in a Pancreatic Cancer Case Control study of 7,598 participants where the major sources of technical variation were not captured by study site and varied across the genome. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Supressor Candidate 3 (TUSC3). This study provides a robust Bayesian inferential framework for estimating copy number and evaluating the role of copy number in heritable diseases.


2020 ◽  
Author(s):  
Paras Garg ◽  
Alejandro Martin-Trujillo ◽  
Oscar L. Rodriguez ◽  
Scott J. Gies ◽  
Bharati Jadhav ◽  
...  

ABSTRACTVariable Number Tandem Repeats (VNTRs) are composed of large tandemly repeated motifs, many of which are highly polymorphic in copy number. However, due to their large size and repetitive nature, they remain poorly studied. To investigate the regulatory potential of VNTRs, we used read-depth data from Illumina whole genome sequencing to perform association analysis between copy number of ~70,000 VNTRs (motif size ≥10bp) with both gene expression (404 samples in 48 tissues) and DNA methylation (235 samples in peripheral blood), identifying thousands of VNTRs that are associated with local gene expression (eVNTRs) and DNA methylation levels (mVNTRs). Using large-scale replication analysis in an independent cohort we validated 73-80% of signals observed in the two discovery cohorts, providing robust evidence to support that these represent genuine associations. Further, conditional analysis indicated that many eVNTRs and mVNTRs act as QTLs independently of other local variation. We also observed strong enrichments of eVNTRs and mVNTRs for regulatory features such as enhancers and promoters. Using the Human Genome Diversity Panel, we defined sets of VNTRs that show highly divergent copy numbers among human populations, show that these are enriched for regulatory effects on gene expression and epigenetics, and preferentially associate with genes that have been linked with human phenotypes through GWAS. Our study provides strong evidence supporting functional variation at thousands of VNTRs, and defines candidate sets of VNTRs, copy number variation of which potentially plays a role in numerous human phenotypes.


2021 ◽  
Vol 12 ◽  
Author(s):  
Kuan Yao ◽  
Narjol González-Escalona ◽  
Maria Hoffmann

Plasmids play a major role in bacterial adaptation to environmental stress and often contribute to antibiotic resistance and disease virulence. Although the complete sequence of each plasmid is essential for studying plasmid biology, most antibiotic resistance and virulence plasmids in Salmonella are present only in a low copy number, making extraction and sequencing difficult. Long read sequencing technologies require higher concentrations of DNA to provide optimal results. To resolve this problem, we assessed the sufficiency of multiple displacement amplification (MDA) for replicating Salmonella plasmid DNA to a satisfactory concentration for accurate sequencing and multiplexing. Nine Salmonella enterica isolates, representing nine different serovars carrying plasmids for which sequence data are already available at NCBI, were cultured and their plasmids isolated using an alkaline lysis extraction protocol. We then used the Phi29 polymerase to perform MDA, thereby obtaining enough plasmid DNA for long read sequencing. These amplified plasmids were multiplexed and sequenced on one single molecule, real-time (SMRT) cell with the Pacific Biosciences (Pacbio) Sequel sequencer. We were able to close all Salmonella plasmids (sizes ranged from 38 to 166 Kb) with sequencing coverage from 24 to 2,582X. This protocol, consisting of plasmid isolation, MDA, and multiplex sequencing, is an effective and fast method for closing high-molecular weight and low-copy-number plasmids. This high throughput protocol reduces the time and cost of plasmid closure.


ESC CardioMed ◽  
2018 ◽  
pp. 669-671
Author(s):  
Eric Schulze-Bahr

The human genome consists of approximately 3 billion (3 × 109) base pairs of DNA (around 20,000 genes), organized as 23 chromosomes (diploid parental set), and a small mitochondrial genome (37 genes, including 13 proteins; 16,589 base pairs) of maternal origin. Most human genetic variation is natural, that is, common or rare (minor allele frequency >0.1%) and does not cause disease—apart from every true disease-causing (bona fide) mutation each individual genome harbours more than 3.5 million single nucleotide variants (including >10,000 non-synonymous changes causing amino acid substitutions) and 200–300 large structural or copy number variants (insertions/deletions, up to several thousands of base-pairs) that are non-disease-causing variations and scattered throughout coding and non-coding genomic regions.


Sign in / Sign up

Export Citation Format

Share Document