scholarly journals Integrating genomic correlation structure improves copy number variations detection

Author(s):  
Xizhi Luo ◽  
Fei Qin ◽  
Guoshuai Cai ◽  
Feifei Xiao

Abstract Motivation Copy number variation plays important roles in human complex diseases. The detection of copy number variants (CNVs) is identifying mean shift in genetic intensities to locate chromosomal breakpoints, the step of which is referred to as chromosomal segmentation. Many segmentation algorithms have been developed with a strong assumption of independent observations in the genetic loci, and they assume each locus has an equal chance to be a breakpoint (i.e. boundary of CNVs). However, this assumption is violated in the genetics perspective due to the existence of correlation among genomic positions, such as linkage disequilibrium (LD). Our study showed that the LD structure is related to the location distribution of CNVs, which indeed presents a non-random pattern on the genome. To generate more accurate CNVs, we proposed a novel algorithm, LDcnv, that models the CNV data with its biological characteristics relating to genetic dependence structure (i.e. LD). Results We theoretically demonstrated the correlation structure of CNV data in SNP array, which further supports the necessity of integrating biological structure in statistical methods for CNV detection. Therefore, we developed the LDcnv that integrated the genomic correlation structure with a local search strategy into statistical modeling of the CNV intensities. To evaluate the performance of LDcnv, we conducted extensive simulations and analyzed large-scale HapMap datasets. We showed that LDcnv presented high accuracy, stability and robustness in CNV detection and higher precision in detecting short CNVs compared to existing methods. This new segmentation algorithm has a wide scope of potential application with data from various high-throughput technology platforms. Availability and implementation https://github.com/FeifeiXiaoUSC/LDcnv. Supplementary information Supplementary data are available at Bioinformatics online.

2017 ◽  
Author(s):  
Zilu Zhou ◽  
Weixin Wang ◽  
Li-San Wang ◽  
Nancy Ruonan Zhang

AbstractMotivationCopy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous SNP-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads.ResultsWe propose a statistical framework, integrated Copy Number Variation detection algorithm (iCNV), which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a Hidden Markov Model (HMM). We compare integrated two-platform CNV detection using iCNV to naive intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only, and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods.Availabilityhttps://github.com/zhouzilu/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Erin Zampaglione ◽  
Benyam Kinde ◽  
Emily M. Place ◽  
Daniel Navarro-Gomez ◽  
Matthew Maher ◽  
...  

ABSTRACTPurposeCurrent sequencing strategies can genetically solve 55-60% of inherited retinal degeneration (IRD) cases, despite recent progress in sequencing. This can partially be attributed to elusive pathogenic variants (PVs) in known IRD genes, including copy number variations (CNVs), which we believe are a major contributor to unsolved IRD cases.MethodsFive hundred IRD patients were analyzed with targeted next generation sequencing (NGS). The NGS data was used to detect CNVs with ExomeDepth and gCNV and the results were compared to CNV detection with a SNP-Array. Likely causal CNV predictions were validated by quantitative (q)PCR.ResultsLikely disease-causing single nucleotide variants (SNVs) and small indels were found in 55.8% of subjects. PVs in USH2A (11.6%), RPGR (4%) and EYS (4%) were the most common. Likely causal CNVs were found in an additional 8.8% of patients. Of the three CNV detection methods, gCNV showed the highest accuracy. Approximately 30% of unsolved subjects had a single likely PV in a recessive IRD gene.ConclusionsCNV detection using NGS-based algorithms is a reliable method that greatly increases the genetic diagnostic rate of IRDs. Experimentally validating CNVs helps estimate the rate at which IRDs might be solved by a CNV plus a more elusive variant.


2020 ◽  
Author(s):  
Xizhi Luo ◽  
Fei Qin ◽  
Guoshuai Cai ◽  
Feifei Xiao

AbstractCopy number variation plays important roles in human complex diseases. The detection of copy number variants (CNVs) is identifying mean shift in genetic intensities to locate chromosomal breakpoints, the step of which is referred to as chromosomal segmentation. Many segmentation algorithms have been developed with a strong assumption of independent observations in the genetic loci, and they assume each locus has an equal chance to be a breakpoint (i.e., boundary of CNVs). However, this assumption is violated in the genetics perspective due to the existence of correlation among genomic positions such as linkage disequilibrium (LD). Our study showed that the LD structure is related to the location distribution of CNVs which indeed presents a non-random pattern on the genome. To generate more accurate CNVs, we therefore proposed a novel algorithm, LDcnv, that models the CNV data with its biological characteristics relating to genetic correlation (i.e., LD). To evaluate the performance of LDcnv, we conducted extensive simulations and analyzed large-scale HapMap datasets. We showed that LDcnv presents high accuracy, stability and robustness in CNV detection and higher precision in detecting short CNVs compared to existing methods. We also theoretically demonstrated the correlation structure of CNV data, which further supports the necessity of integrating biological structure in statistical methods for CNV detection. This new segmentation algorithm has a wide scope of application with next-generation sequencing data analysis and single-cell sequencing analysis.Author SummaryCopy number variants (CNVs) refers to gains or losses of the DNA segments in comparison to a reference genome. CNVs have garnered extensive interests in recent years as they play an important role susceptibility to disorders and diseases such as autism, schizophrenia and cancer [1-7]. Although innovation in modern technology is promoting the discoveries related to CNVs, the methodology for CNV detection is still lagging, which limits the novel discoveries regarding the role of CNVs in complex diseases. In this study, we are proposing a novel segmentation algorithm, LDcnv, to accurately locate the breakpoints or boundaries of CNVs in the human genome. Instead of utilizing an independent assumption of the signal intensities as has been used in traditional segmentation algorithms, LDcnv models the correlation structure in the genome in a change-point CNV detection model, which allows for accurate and fast computation with a whole genome scan. Our study showed strong theoretical evidence of the existence of correlation structure in real CNV data, and we believe that taking this evidence into consideration will improve the power of CNV detection. Extensive simulation studies have demonstrated the advantage of the LDcnv algorithm in stability, robustness and accuracy over existing methods. We also used high-quality CNV profiles to further support the superior performance of the LDcnv algorithm over existing methods. The development of the LDcnv algorithm provides great insights for new directions in developing CNV detection tools.


2020 ◽  
Vol 36 (12) ◽  
pp. 3890-3891
Author(s):  
Linjie Wu ◽  
Han Wang ◽  
Yuchao Xia ◽  
Ruibin Xi

Abstract Motivation Whole-genome sequencing (WGS) is widely used for copy number variation (CNV) detection. However, for most bacteria, their circular genome structure and high replication rate make reads more enriched near the replication origin. CNV detection based on read depth could be seriously influenced by such replication bias. Results We show that the replication bias is widespread using ∼200 bacterial WGS data. We develop CNV-BAC (CNV-Bacteria) that can properly normalize the replication bias and other known biases in bacterial WGS data and can accurately detect CNVs. Simulation and real data analysis show that CNV-BAC achieves the best performance in CNV detection compared with available algorithms. Availability and implementation CNV-BAC is available at https://github.com/XiDsLab/CNV-BAC. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Marcel Kucharik ◽  
Jaroslav Budis ◽  
Michaela Hyblova ◽  
Gabriel Minarik ◽  
Tomas Szemes

Copy number variations (CNVs) are a type of structural variant involving alterations in the number of copies of specific regions of DNA, which can either be deleted or duplicated. CNVs contribute substantially to normal population variability; however, abnormal CNVs cause numerous genetic disorders. Nowadays, several methods for CNV detection are used, from the conventional cytogenetic analysis through microarray-based methods (aCGH) to next-generation sequencing (NGS). We present GenomeScreen - NGS based CNV detection method based on a previously described CNV detection algorithm used for non-invasive prenatal testing (NIPT). We determined theoretical limits of its accuracy and confirmed it with extensive in-silico study and already genotyped samples. Theoretically, at least 6M uniquely mapped reads are required to detect CNV with a length of 100 kilobases (kb) or more with high confidence (Z-score > 7). In practice, the in-silico analysis showed the requirement at least 8M to obtain >99% accuracy (for 100 kb deviations). We compared GenomeScreen with one of the currently used aCGH methods in diagnostic laboratories, which has a 200 kb mean resolution. GenomeScreen and aCGH both detected 59 deviations, GenomeScreen furthermore detected 134 other (usually) smaller variations. Furthermore, the overall cost per sample is about 2-3x lower in the case of GenomeScreen.


2020 ◽  
Vol 66 (3) ◽  
pp. 463-473 ◽  
Author(s):  
Sebastián Blesa ◽  
María D Olivares ◽  
Andy S Alic ◽  
Alicia Serrano ◽  
Verónica Lendinez ◽  
...  

Abstract Background The specific characteristics of copy number variations (CNVs) require specific methods of detection and characterization. We developed the Easy One-Step Amplification and Labeling procedure for CNV detection (EOSAL-CNV), a new method based on proportional amplification and labeling of amplicons in 1 PCR. Methods We used tailed primers for specific amplification and a pair of labeling probes (only 1 labeled) for amplification and labeling of all amplicons in just 1 reaction. Products were loaded directly onto a capillary DNA sequencer for fragment sizing and quantification. Data obtained could be analyzed by Microsoft Excel spreadsheet or EOSAL-CNV analysis software. We developed the protocol using the LDLR (low density lipoprotein receptor) gene including 23 samples with 8 different CNVs. After optimizing the protocol, it was used for genes in the following multiplexes: BRCA1 (BRCA1 DNA repair associated), BRCA2 (BRCA2 DNA repair associated), CHEK2 (checkpoint kinase 2), MLH1 (mutL homolog 1) plus MSH6 (mutS homolog 6), MSH2 (mutS homolog 2) plus EPCAM (epithelial cell adhesion molecule) and chromosome 17 (especially the TP53 [tumor protein 53] gene). We compared our procedure with multiplex ligation-dependent probe amplification (MLPA). Results The simple procedure for CNV detection required 150 min, with <10 min of handwork. After analyzing >240 samples, EOSAL-CNV excluded the presence of CNVs in all controls, and in all cases, results were identical using MLPA and EOSAL-CNV. Analysis of the 17p region in tumor samples showed 100% similarity between fluorescent in situ hybridization and EOSAL-CNV. Conclusions EOSAL-CNV allowed reliable, fast, easy detection and characterization of CNVs. It provides an alternative to targeted analysis methods such as MLPA.


Blood ◽  
2007 ◽  
Vol 110 (11) ◽  
pp. 230-230
Author(s):  
Ilaria Iacobucci ◽  
E. Ottaviani ◽  
A. Astolfi ◽  
S. Soverini ◽  
N. Testoni ◽  
...  

Abstract The Ph chromosome is the most frequent cytogenetic aberration associated with ALL and it represents the single most significant adverse prognostic marker. Despite the encouraging results achieved with imatinib, resistance develops rapidly and is quickly followed by disease progression. Some mechanisms of resistance have been widely described but the full knowledge of contributing factors driving both the disease and resistance remains to be defined. In order to identify at submicroscopic level genetic lesions driving leukemogenesis and resistance, we profiled until now the genomes of 18 patients, out of 55 Ph+ ALL patients treated in our institute, at diagnosis (n=11) or at the time of haematological relapse (n=7) during therapy with imatinib or dasatinib. 250 ng of genomic DNA were processed on 500K single nucleotide polymorphism (SNP) array according to protocols provided by the manufacturer (Affymetrix Inc., Santa Clara, CA, USA). The median SNP call rate of analysed samples was 96%. Raw signal data were analyzed by BRLMM algorithm and copy number state was calculated with respect to a set of 48 Hapmap normal individuals and a diploid reference set of samples obtained from acute leukaemia cases in remission. Regions of amplification and deletion were visualized by Integrated Genome Browser and mapped to RefSeq to identify the specific genes involved in the lesion. Our analysis identified multiple copy number alterations per case, with deletions outnumbering amplification almost 3:1. Lesions varied from loss or gain of complete chromosome arms (trisomy 4, monosomy 7, loss of 9p, 10q, 14q, 16q and gain of 1q and 17q) to microdeletions and microduplications targeting genomic intervals. The recurring microdeletions that we detected in at least 50% of patients (both at diagnosis and at relapse) included 1p36.21 (PRAMEF), 3q29 (TFCR), 7p14.1 (AMPH), 8p23 (DEFB105A), 14q11.2 (DAD1), 16p13.11 (PDXDC1, NTAN1, RRN3), 16p11.2 (SNP) and 19p13.2 (CARM1, SMARCA4). A common microamplification was 4q13.2 (TMPRSS11E) and 17q21.31. Some genomic alterations were identified in genes regulating B-lymphocyte differentiation, such as PAX5 (n=3), BLNK (n=1) and VPREB1 (n=6) and in genes with an established role in leukemogenesis, such as MDS, BTG1, MLLT3 and RUNX1. Furthermore, many of the deletions detected included genes encoded for phosphatase proteins (e.g. PTPRD, PPP1R9B, PTPN18) and for zinc-finger proteins without any difference between diagnosis and resistance. It is noteworthy that some lesions felt in regions lacking annotated genes (loss: 2p11.2, 3p12.3, 7q11.21 and 14q32.33; gain: 8q23.3 and 13q21.1). Using high-resolution genome wide approach we showed that Ph+ ALL is a more complex disease characterized by multiple genomic anomalies which may provide new insights into the mechanisms underlying leukemogenesis and may be used as targets for existing or novel drugs. Supported by: European LeukemiaNet, COFIN 2003, Novartis Oncology Clinical Development, AIL.


2021 ◽  
Vol 8 ◽  
Author(s):  
Meiying Cai ◽  
Hailong Huang ◽  
Liangpu Xu ◽  
Na Lin

Applying single nucleotide polymorphism (SNP) array to identify the etiology of fetal central nervous system (CNS) abnormality, and exploring its association with chromosomal abnormalities, copy number variations, and obstetrical outcome. 535 fetuses with CNS abnormalities were analyzed using karyotype analysis and SNP array. Among the 535 fetuses with CNS abnormalities, chromosomal abnormalities were detected in 36 (6.7%) of the fetuses, which were consistent with karyotype analysis. Further, additional 41 fetuses with abnormal copy number variations (CNVs) were detected using SNP array (the detection rate of additional abnormal CNVs was 7.7%). The rate of chromosomal abnormalities, but not that of pathogenic CNVs in CNS abnormalities with other ultrasound abnormalities was significantly higher than that in isolated CNS abnormalities. The rates of chromosomal abnormalities and pathogenic CNVs in fetuses with spine malformation (50%), encephalocele (50%), subependymal cyst (20%), and microcephaly (16.7%) were higher than those with other isolated CNS abnormalities. The pregnancies for 36 cases with chromosomal abnormalities, 18 cases with pathogenic CNVs, and three cases with VUS CNVs were terminated. SNP array should be used in the prenatal diagnosis of fetuses with CNS abnormalities, which can enable better prenatal assessment and genetic counseling, and affect obstetrical outcomes.


2020 ◽  
Author(s):  
Meiying Cai ◽  
Na Lin ◽  
Liangpu Xu ◽  
hailong huang

Abstract Background: Some ultrasonic soft markers can be found during ultrasound examination. However, the etiology of the fetuses with ultrasonic soft markers is still unknown. This study aimed to evaluate the genetic etiology and clinical value of chromosomal abnormalities and copy number variations (CNVs) in fetuses with ultrasonic soft markers.Methods: Among 1131 fetuses, 729 had single ultrasonic soft marker, 322 had two ultrasonic soft markers, and 80 had three or more ultrasonic soft markers. All fetuses underwent conventional karyotyping, followed by single nucleotide polymorphism (SNP) array analysis. Results: Among 1131 fetuses with ultrasonic soft markers, 46 had chromosomal abnormalities. In addition to the 46 fetuses with chromosomal abnormalities consistent with the results of the karyotyping analysis, the SNP array identified additional 6.1% (69/1131) abnormal CNVs. The rate of abnormal CNVs in fetuses with ultrasonic soft marker, two ultrasonic soft markers, three or more ultrasonic soft markers were 6.2%, 6.2%, and 5.0%, respectively. No significant difference was found in the rate of abnormal CNVs among the groups.Conclusions: Genetic abnormalities affect obstetrical outcomes. The SNP array can fully complement conventional karyotyping in fetuses with ultrasonic soft markers, improve detection rate of chromosomal abnormalities, and affect obstetrical outcomes.


2021 ◽  
Author(s):  
Masaki Kaibori ◽  
Kazuko Sakai ◽  
Hideyuki Matsushima ◽  
Hisashi Kosaka ◽  
Kosuke Matsui ◽  
...  

Abstract Background/purpose of the study Tumor heterogeneity based on copy number variations is associated with the evolution of cancer and its clinical grade. Clonal composition (CC) represents the number of clones based on the distribution of B-allele frequency (BAF) obtained from a genome-wide single nucleotide polymorphism (SNP) array. A higher CC number represents a high degree of heterogeneity. We hypothesized and evaluated that the CC number in hepatocellular carcinoma (HCC) tissues might be associated with the clinical outcomes of patients. Methods Somatic mutation, whole transcriptome, and copy number variations of 36 frozen tissue samples of operably resected HCC tissues by targeted deepsequencing, RNAseq and SNP array. Results The samples were classified the heterogeneous tumors as poly-CC (n = 26) and the homogeneous tumors as mono-CC (n = 8). The patients with poly-CC had a higher rate of early recurrence and a significantly shorter recurrence-free survival period than the mono-CC patients (7.0 vs. not reached, p = 0.0084). No differences in pathogenic non-synonymous mutations, such as TP53, were observed between the two groups when targeted deep sequencing was applied. A transcriptome analysis showed that cell cycle-related pathways were enriched in the poly-CC tumors, compared with the mono-CC tumors. poly-CC HCC is highly proliferative and has a high risk of early recurrence. Conclusion CC is a candidate biomarker for predicting the risk of early postoperative recurrence and warrants further investigation.


Sign in / Sign up

Export Citation Format

Share Document