Algorithm Implementation for CNV Discovery Using Affymetrix and Illumina SNP Array Data

Author(s):  
Laura Winchester ◽  
Jiannis Ragoussis
2017 ◽  
Author(s):  
Zilu Zhou ◽  
Weixin Wang ◽  
Li-San Wang ◽  
Nancy Ruonan Zhang

AbstractMotivationCopy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous SNP-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads.ResultsWe propose a statistical framework, integrated Copy Number Variation detection algorithm (iCNV), which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a Hidden Markov Model (HMM). We compare integrated two-platform CNV detection using iCNV to naive intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only, and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods.Availabilityhttps://github.com/zhouzilu/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (17) ◽  
pp. 2924-2931
Author(s):  
Mark R Zucker ◽  
Lynne V Abruzzo ◽  
Carmen D Herling ◽  
Lynn L Barron ◽  
Michael J Keating ◽  
...  

Abstract Motivation Clonal heterogeneity is common in many types of cancer, including chronic lymphocytic leukemia (CLL). Previous research suggests that the presence of multiple distinct cancer clones is associated with clinical outcome. Detection of clonal heterogeneity from high throughput data, such as sequencing or single nucleotide polymorphism (SNP) array data, is important for gaining a better understanding of cancer and may improve prediction of clinical outcome or response to treatment. Here, we present a new method, CloneSeeker, for inferring clinical heterogeneity from sequencing data, SNP array data, or both. Results We generated simulated SNP array and sequencing data and applied CloneSeeker along with two other methods. We demonstrate that CloneSeeker is more accurate than existing algorithms at determining the number of clones, distribution of cancer cells among clones, and mutation and/or copy numbers belonging to each clone. Next, we applied CloneSeeker to SNP array data from samples of 258 previously untreated CLL patients to gain a better understanding of the characteristics of CLL tumors and to elucidate the relationship between clonal heterogeneity and clinical outcome. We found that a significant majority of CLL patients appear to have multiple clones distinguished by copy number alterations alone. We also found that the presence of multiple clones corresponded with significantly worse survival among CLL patients. These findings may prove useful for improving the accuracy of prognosis and design of treatment strategies. Availability and implementation Code available on R-Forge: https://r-forge.r-project.org/projects/CloneSeeker/ Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Jie Huang ◽  
Stefano Pallotti ◽  
Qianling Zhou ◽  
Marcus Kleber ◽  
Xiaomeng Xin ◽  
...  

Abstract The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/*4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE*1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.


Blood ◽  
2005 ◽  
Vol 106 (11) ◽  
pp. 1563-1563
Author(s):  
Paola E. Leone ◽  
Brian A. Walker ◽  
David Gonzalez ◽  
Matthew Jenner ◽  
Fiona M. Ross ◽  
...  

Abstract Deletions on chromosome 13 are thought to be one of the most important prognostic features in Multiple Myeloma (MM). The biology underlying this is, however, uncertain. Chromosome 13 abnormalities have been evaluated conventionally by FISH using probes for 13q14, covering the retinoblastoma gene (RB1) region. Typically, for recurrent regions of loss of heterozygosity (LOH) it is possible to map a minimally deleted region within which an important gene may be located. This should be the case with 13q−, or alternatively there may be linkage with another genetic lesion, which could be contributing to the poor prognosis. Following the implementation of high-density single-nucleotide polymorphism (SNP) array, it is now possible to genotype the whole human genome with a mapping resolution of less than 50 Kb. Thus, the SNP array approach offers an opportunity to analyze both copy number abnormalities and LOH simultaneously. The aim of this study was to determine the numerical alterations, LOH and changes in the gene expression profile of the chromosome 13 in MM, and its possible association with other genetic events. For this purpose, we analyzed 17 patients included on the Myeloma IX trial with deletion of 13q14 compared with 22 samples without deletion, using Affymetrix 50K SNP arrays and Affymetrix U133 Plus 2 expression array. IGH translocations and 13q deletion were determined by FISH. dChipSNP and WGSA programs were used to analyze the data. With respect to 13q14, there was 100% correlation between FISH and SNP array results. 16 out of 17 cases with deletion of the RB1 gene by FISH analysis showed loss of 13q arm by SNP array, demonstrating that loss of the whole chromosome 13 is responsible for 13q deletions found in MM in >90% of cases, with only one case showing a defined region of deletion of chromosome 13 (13q14.11–13q21.2). Using gene expression arrays we could not define a specific pattern characteristic of expression loss in genes at 13q. Lower RB1 expression levels were not only restricted to cases with del(13). However, samples containing IGH translocations (t(11;14) and t(4;14)) without del(13) showed up to 4 times more RB1 expression, suggesting that MM evolution in cases containing IGH translocations is independent of RB1 expression. Interestingly, the hyperdiploid cases with and without del(13) expressed similar level of RB1. We also investigated whether other key cell cycle regulatory genes were associated with del(13); in particular, 4 cases showed 9p21 LOH by SNP array and no different gene expression levels, which suggest that LOH does not seem to be a mechanism of lost of expression of CDKN2A, CDKN2B and p14/ARF. We could not find any significant correlation with del(13) and expression of cell cycle regulatory genes, apart from 8/17 samples with del(13) that had low expression of p53 gene, including 6 t(4;14) cases and 2 t(11;14) cases. Also, 2 cases without monosomy 13 (1 with t(4;14) and 1 with t(11;14)), showed low p53 expression levels. However, SNP array data did not show any deletion at 17p in 38 cases, with the exception of a case with monosomy 13 and t(11;14) in which SNP array data showed loss at 17pter-17q21.2 and FISH detected p53 deletion. Further investigation between the association of p53 and del(13) are ongoing and maybe useful in defining the biology of this poor subgroup of patients.


PLoS ONE ◽  
2009 ◽  
Vol 4 (6) ◽  
pp. e6057 ◽  
Author(s):  
Hanna Göransson ◽  
Karolina Edlund ◽  
Maria Rydåker ◽  
Markus Rasmussen ◽  
Johan Winquist ◽  
...  

Biostatistics ◽  
2006 ◽  
Vol 8 (2) ◽  
pp. 485-499 ◽  
Author(s):  
B. Carvalho ◽  
H. Bengtsson ◽  
T. P. Speed ◽  
R. A. Irizarry
Keyword(s):  

BMC Genomics ◽  
2012 ◽  
Vol 13 (1) ◽  
pp. 326 ◽  
Author(s):  
Gaëlle Marenne ◽  
Francisco X Real ◽  
Nathaniel Rothman ◽  
Benjamin Rodríguez-Santiago ◽  
Luis Pérez-Jurado ◽  
...  

2011 ◽  
Vol 11 ◽  
pp. CIN.S8026 ◽  
Author(s):  
Hong Liu ◽  
Asher Zilberstein ◽  
Pascal Pannier ◽  
Frederic Fleche ◽  
Christopher Arendt ◽  
...  

Somatic cell genetic alterations are a hallmark of tumor development and progression. Although various technologies have been developed and utilized to identify genetic aberrations, identifying genetic translocations at the chromosomal level is still a challenging task. High density SNP microarrays are useful to measure DNA copy number variation (CNV) across the genome. Utilizing SNP array data of cancer cell lines and patient samples, we evaluated the CNV and copy number breakpoints for several known fusion genes implicated in tumorigenesis. This analysis demonstrated the potential utility of SNP array data for the prediction of genetic aberrations via translocations based on identifying copy number breakpoints within the target genes. Genome-wide analysis was also performed to identify genes harboring copy number breakpoints across 820 cancer cell lines. Candidate oncogenes were identified that are linked to potential translocations in specific cancer cell lines.


Sign in / Sign up

Export Citation Format

Share Document