scholarly journals CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data

2021 ◽  
Author(s):  
Arda Soylev ◽  
Sevim Seda Cokoglu ◽  
Dilek Koptekin ◽  
Can Alkan ◽  
Mehmet Somel

To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by (a) most published genomes being <1x coverage, (ii) ancient DNA fragments being typically <80 bps. These characteristics preclude state-of-the-art CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, an algorithm tailored for genotyping deletion and duplication events in genomes with low depths of coverage. Simulations show that CONGA can genotype deletions and duplications >1 Kbps with F-scores >0.77 and >0.82, respectively at >=0.5x. Further, down-sampling experiments using published ancient BAM files reveal that >1 Kbps deletions could be genotyped at F-score >0.75 at >=1x coverage. Using CONGA, we analyse deletion events at 10,018 loci in 56 ancient human genomes spanning the last 50,000 years, with coverages 0.4x-26x. We find inter-individual genetic diversity measured using deletions and SNPs to be highly correlated, suggesting that deletion frequencies broadly reflect demographic history. We also identify signatures of purifying selection on deletions, such as an excess of singletons compared to those in SNPs. CONGA paves the way for systematic studies of drift, mutation load, and adaptation in ancient and modern-day gene pools through the lens of CNVs.

2019 ◽  
Author(s):  
Matthieu Falque ◽  
Kamel Jebreen ◽  
Etienne Paux ◽  
Carsten Knaak ◽  
Sofiane Mezmouk ◽  
...  

AbstractSingle nucleotide polymorphisms (SNPs) are widely used for detecting quantitative trait loci or for searching for causal variants of diseases. Nevertheless, structural variations such as copy-number variants (CNVs) represent a large part of natural genetic diversity and contribute significantly to trait variation. Over the past decade, numerous methods and softwares have been developed to detect CNVs. Such approaches are based on exploiting sequencing data or SNP arrays, but they bypass a wealth of information such as genotyping data from segregating populations, produced e.g. for QTL mapping. Here we propose an original method to both detect and genetically map CNVs using mapping panels. Specifically, we exploit the apparent heterozygous state of duplicated loci: peaks in appropriately defined genome-wide allelic profiles provide highly specific signatures that identify the nature and position of the CNVs. Our original method and software can detect and map automatically up to 33 different predefined types of CNVs based on segregation data only. We validate this approach on simulated and experimental bi-parental mapping panels in two maize and one wheat populations. Most of the events found correspond to having just one extra copy in one of the parental lines but the corresponding allelic value can be that of either parent. We also find cases with two or more additional copies, especially in wheat where these copies locate to homeologues. More generally, our computational tool can be used to give additional value, at no cost, to many datasets produced over the past decade from genetic mapping panels.


Author(s):  
Alexander Charney ◽  
Pamela Sklar

Schizophrenia and bipolar disorder are the classic psychotic disorders. Both diseases are strongly familial, but have proven recalcitrant to genetic methodologies for identifying the etiology until recently. There is now convincing genetic evidence that indicates a contribution of many DNA changes to the risk of becoming ill. For schizophrenia, there are large contributions of rare copy number variants and common single nucleotide variants, with an overall highly polygenic genetic architecture. For bipolar disorder, the role of copy number variation appears to be much less pronounced. Specific common single nucleotide polymorphisms are associated, and there is evidence for polygenicity. Several surprises have emerged from the genetic data that indicate there is significantly more molecular overlap in copy number variants between autism and schizophrenia, and in common variants between schizophrenia and bipolar disorder.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 981
Author(s):  
Jichun Xia ◽  
Dong Wang ◽  
Yuzhou Peng ◽  
Wenning Wang ◽  
Qianqian Wang ◽  
...  

The YABBY family of plant-specific transcription factors play important regulatory roles during the development of leaves and floral organs, but their functions in Brassica species are incompletely understood. Here, we identified 79 YABBY genes from Arabidopsis thaliana and five Brassica species (B. rapa, B. nigra, B. oleracea, B. juncea, and B. napus). A phylogenetic analysis of YABBY proteins separated them into five clusters (YAB1–YAB5) with representatives from all five Brassica species, suggesting a high degree of conservation and similar functions within each subfamily. We determined the gene structure, chromosomal location, and expression patterns of the 21 BnaYAB genes identified, revealing extensive duplication events and gene loss following polyploidization. Changes in exon–intron structure during evolution may have driven differentiation in expression patterns and functions, combined with purifying selection, as evidenced by Ka/Ks values below 1. Based on transcriptome sequencing data, we selected nine genes with high expression at the flowering stage. qRT-PCR analysis further indicated that most BnaYAB family members are tissue-specific and exhibit different expression patterns in various tissues and organs of B. napus. This preliminary study of the characteristics of the YABBY gene family in the Brassica napus genome provides theoretical support and reference for the later functional identification of the family genes.


2019 ◽  
Author(s):  
Yue Xing ◽  
Alan R. Dabney ◽  
Xiao Li ◽  
Guosong Wang ◽  
Clare A. Gill ◽  
...  

AbstractCopy number variants are insertions and deletions of 1 kb or larger in a genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested SECNVs (Simulator of Exome Copy Number Variants), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Michael D. Linderman ◽  
Davin Chia ◽  
Forrest Wallace ◽  
Frank A. Nothaft

Abstract Background XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parameter space to obtain the best possible results. Results DECA is a horizontally scalable implementation of the XHMM algorithm using the ADAM framework and Apache Spark that incorporates novel algorithmic optimizations to eliminate unneeded computation. DECA parallelizes XHMM on both multi-core shared memory computers and large shared-nothing Spark clusters. We performed CNV discovery from the read-depth matrix in 2535 exomes in 9.3 min on a 16-core workstation (35.3× speedup vs. XHMM), 12.7 min using 10 executor cores on a Spark cluster (18.8× speedup vs. XHMM), and 9.8 min using 32 executor cores on Amazon AWS’ Elastic MapReduce. We performed CNV discovery from the original BAM files in 292 min using 640 executor cores on a Spark cluster. Conclusions We describe DECA’s performance, our algorithmic and implementation enhancements to XHMM to obtain that performance, and our lessons learned porting a complex genome analysis application to ADAM and Spark. ADAM and Apache Spark are a performant and productive platform for implementing large-scale genome analyses, but efficiently utilizing large clusters can require algorithmic optimizations and careful attention to Spark’s configuration parameters.


Sign in / Sign up

Export Citation Format

Share Document