scholarly journals Correspondence of aCGH and long-read genome assembly for detection of copy number differences: A proof-of-concept with cichlid genomes

PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258193
Author(s):  
Gabriel A. Preising ◽  
Joshua J. Faber-Hammond ◽  
Suzy C. P. Renn

Copy number variation is an important source of genetic variation, yet data are often lacking due to technical limitations for detection given the current genome assemblies. Our goal is to demonstrate the extent to which an array-based platform (aCGH) can identify genomic loci that are collapsed in genome assemblies that were built with short-read technology. Taking advantage of two cichlid species for which genome assemblies based on Illumina and PacBio are available, we show that inter-species aCGH log2 hybridization ratios correlate more strongly with inferred copy number differences based on PacBio-built genome assemblies than based on Illumina-built genome assemblies. With regard to inter-species copy number differences of specific genes identified by each platform, the set identified by aCGH intersects to a greater extent with the set identified by PacBio than with the set identified by Illumina. Gene function, according to Gene Ontology analysis, did not substantially differ among platforms, and platforms converged on functions associated with adaptive phenotypes. The results of the current study further demonstrate that aCGH is an effective platform for identifying copy number variable sequences, particularly those collapsed in short read genome assemblies.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
PingHsun Hsieh ◽  
Vy Dang ◽  
Mitchell R. Vollger ◽  
Yafei Mao ◽  
Tzu-Hsueh Huang ◽  
...  

AbstractTRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. Here, our results of TCAF copy number expansion, selection signals in hominins, and differential TCAF2 expression between haplogroups and high TCAF2 and TRPM8 expression in liver and prostate in modern-day humans imply TCAF diversification among hominins potentially in response to cold or dietary adaptations.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Xin Shao ◽  
Ning Lv ◽  
Jie Liao ◽  
Jinbo Long ◽  
Rui Xue ◽  
...  

Abstract Background Cancer is a heterogeneous disease with many genetic variations. Lines of evidence have shown copy number variations (CNVs) of certain genes are involved in development and progression of many cancers through the alterations of their gene expression levels on individual or several cancer types. However, it is not quite clear whether the correlation will be a general phenomenon across multiple cancer types. Methods In this study we applied a bioinformatics approach integrating CNV and differential gene expression mathematically across 1025 cell lines and 9159 patient samples to detect their potential relationship. Results Our results showed there is a close correlation between CNV and differential gene expression and the copy number displayed a positive linear influence on gene expression for the majority of genes, indicating that genetic variation generated a direct effect on gene transcriptional level. Another independent dataset is utilized to revalidate the relationship between copy number and expression level. Further analysis show genes with general positive linear influence on gene expression are clustered in certain disease-related pathways, which suggests the involvement of CNV in pathophysiology of diseases. Conclusions This study shows the close correlation between CNV and differential gene expression revealing the qualitative relationship between genetic variation and its downstream effect, especially for oncogenes and tumor suppressor genes. It is of a critical importance to elucidate the relationship between copy number variation and gene expression for prevention, diagnosis and treatment of cancer.


2020 ◽  
Author(s):  
Lauren Coombe ◽  
Vladimir Nikolić ◽  
Justin Chu ◽  
Inanc Birol ◽  
René L. Warren

AbstractSummaryThe ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short read assembly with a draft long read assembly, and a draft assembly with an assembly from a closely-related species. When scaffolding a human short read assembly using the reference human genome or a long read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using less than 11 GB of RAM. Compared to existing reference-guided assemblers, ntJoin generates highly contiguous assemblies faster and using less memory.Availability and implementationntJoin is written in C++ and Python, and is freely available at https://github.com/bcgsc/[email protected]


2018 ◽  
Author(s):  
Thomas A. Sasani ◽  
Kelsey R. Cone ◽  
Aaron R. Quinlan ◽  
Nels C. Elde

AbstractLarge DNA viruses rapidly evolve to defeat host defenses. Poxvirus adaptation can involve combinations of recombination-driven gene copy number variation and beneficial single nucleotide variants (SNVs) at the same locus, yet how these distinct mechanisms of genetic diversification might simultaneously facilitate adaptation to immune blocks is unknown. We performed experimental evolution with a vaccinia virus population harboring a SNV in a gene actively undergoing copy number amplification. Comparisons of virus genomes using the Oxford Nanopore Technologies sequencing platform allowed us to phase SNVs within large gene copy arrays for the first time, and uncovered a mechanism of adaptive SNV homogenization reminiscent of gene conversion, which is actively driven by selection. Our work reveals a new mechanism for the fluid gain of beneficial mutations in genetic regions undergoing active recombination in viruses, and illustrates the value of long read sequencing technologies for investigating complex genome dynamics in diverse biological systems.


2021 ◽  
Vol 12 ◽  
Author(s):  
Manuela Moraru ◽  
Adriana Perez-Portilla ◽  
Karima Al-Akioui Sanz ◽  
Alfonso Blazquez-Moreno ◽  
Antonio Arnaiz-Villena ◽  
...  

Fcγ receptors (FcγR), cell-surface glycoproteins that bind antigen-IgG complexes, control both humoral and cellular immune responses. The FCGR locus on chromosome 1q23.3 comprises five homologous genes encoding low-affinity FcγRII and FcγRIII, and displays functionally relevant polymorphism that impacts on human health. Recurrent events of non-allelic homologous recombination across the FCGR locus result in copy-number variation of ~82.5 kbp-long fragments known as copy-number regions (CNR). Here, we characterize a recently described deletion that we name CNR5, which results in loss of FCGR3A, FCGR3B, and FCGR2C, and generation of a recombinant FCGR3B/A gene. We show that the CNR5 recombination spot lies at the beginning of the third FCGR3 intron. Although the FCGR3B/A-encoded hybrid protein CD16B/A reaches the plasma membrane in transfected cells, its possible natural expression, predictably restricted to neutrophils, could not be demonstrated in resting or interferon γ-stimulated cells. As the CNR5-deletion was originally described in an Ecuadorian family from Llano Grande (an indigenous community in North-Eastern Quito), we characterized the FCGR genetic variation in two populations from the highlands of Ecuador. Our results reveal that CNR5-deletion is relatively frequent in Llano Grande (5 carriers out of 36 donors). Furthermore, we found a high frequency of two strong-phagocytosis variants: the FCGR3B-NA1 haplotype and the CNR1 duplication, which translates into an increased FCGR3B and FCGR2C copy-number. CNR1 duplication was particularly increased in Llano Grande, 77.8% of the studied sample carrying at least one such duplication. In contrast, an extended haplotype CD16A-176V – CD32C-ORF+2B.2 – CD32B-2B.4 including strong activating and inhibitory FcγR variants was absent in Llano Grande and found at a low frequency (8.6%) in Ecuador highlands. This particular distribution of FCGR polymorphism, possibly a result of selective pressures, further confirms the importance of a comprehensive, joint analysis of all genetic variations in the locus and warrants additional studies on their putative clinical impact. In conclusion, our study confirms important ethnic variation at the FCGR locus; it shows a distinctive FCGR polymorphism distribution in Ecuador highlands; provides a molecular characterization of a novel CNR5-deletion associated with CD16A and CD16B deficiency; and confirms its presence in that population.


2020 ◽  
Vol 2 (3) ◽  
Author(s):  
Cheng He ◽  
Guifang Lin ◽  
Hairong Wei ◽  
Haibao Tang ◽  
Frank F White ◽  
...  

Abstract Genome sequences provide genomic maps with a single-base resolution for exploring genetic contents. Sequencing technologies, particularly long reads, have revolutionized genome assemblies for producing highly continuous genome sequences. However, current long-read sequencing technologies generate inaccurate reads that contain many errors. Some errors are retained in assembled sequences, which are typically not completely corrected by using either long reads or more accurate short reads. The issue commonly exists, but few tools are dedicated for computing error rates or determining error locations. In this study, we developed a novel approach, referred to as k-mer abundance difference (KAD), to compare the inferred copy number of each k-mer indicated by short reads and the observed copy number in the assembly. Simple KAD metrics enable to classify k-mers into categories that reflect the quality of the assembly. Specifically, the KAD method can be used to identify base errors and estimate the overall error rate. In addition, sequence insertion and deletion as well as sequence redundancy can also be detected. Collectively, KAD is valuable for quality evaluation of genome assemblies and, potentially, provides a diagnostic tool to aid in precise error correction. KAD software has been developed to facilitate public uses.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Marc-André Lemay ◽  
Davoud Torkamaneh ◽  
Guillem Rigaill ◽  
Brian Boyle ◽  
Adrian O. Stec ◽  
...  

Genes ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 141 ◽  
Author(s):  
Feichen Shen ◽  
Jeffrey M. Kidd

Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.


Blood ◽  
2009 ◽  
Vol 113 (19) ◽  
pp. 4512-4520 ◽  
Author(s):  
Deborah French ◽  
Wenjian Yang ◽  
Cheng Cheng ◽  
Susana C. Raimondi ◽  
Charles G. Mullighan ◽  
...  

Abstract Methotrexate polyglutamates (MTXPGs) determine in vivo efficacy in acute lymphoblastic leukemia (ALL). MTXPG accumulation differs by leukemic subtypes, but genomic determinants of MTXPG variation in ALL remain unclear. We analyzed 3 types of whole genome variation: leukemia cell gene expression and somatic copy number variation, and inherited single nucleotide polymorphism (SNP) genotypes and determined their association with MTXPGs in leukemia cells. Seven genes (FHOD3, IMPA2, ME2, RASSF4, SLC39A6, SMAD2, and SMAD4) displayed all 3 types of genomic variation associated with MTXPGs (P < .05 for gene expression, P < .01 for copy number variation and SNPs): 6 on chromosome 18 and 1 on chromosome 10. Increased chromosome 18 (P = .002) or 10 (P = .036) copy number was associated with MTXPGs even after adjusting for ALL subtype. The expression of the top 7 genes in leukemia cells accounted for more variation in MTXPGs (46%) than did the expression of the top 7 genes in normal HapMap cell lines (20%). The top 7 inherited SNPs in patients accounted for approximately the same degree of variation (17%) in MTXPGs as did the top 7 SNP genotypes in HapMap cell lines (20%). We conclude that acquired genetic variation in leukemia cells has a stronger influence on MTXPG accumulation than inherited genetic variation.


Sign in / Sign up

Export Citation Format

Share Document