scholarly journals gcn.MOPS: Accelerating cn.MOPS with GPU

10.29007/hb5r ◽  
2019 ◽  
Author(s):  
Mohammad Alkhamis ◽  
Amirali Baniasadi

cn.MOPS is a frequently cited model-based algorithm used to quantitatively detect copy-number variations in next-generation, DNA-sequencing data. Previous work has implemented the algorithm as an R package and has achieved considerable yet limited performance improvement by employing multi-CPU parallelism (maximum achievable speedup was experimentally determined to be 9.24). In this paper, we propose an alternative mechanism of process acceleration. Using one CPU core and a GPU device in the proposed solution, gcn.MOPS, we achieve a speedup factor of 159 and reduce memory usage by more than half compared to cn.MOPS running on one CPU core.

2017 ◽  
Author(s):  
Yuchao Jiang ◽  
Rujin Wang ◽  
Eugene Urrutia ◽  
Ioannis N. Anastopoulos ◽  
Katherine L. Nathanson ◽  
...  

AbstractHigh-throughput DNA sequencing enables detection of copy number variations (CNVs) on the genome-wide scale with finer resolution compared to array-based methods, but suffers from biases and artifacts that lead to false discoveries and low sensitivity. We describe CODEX2, a statistical framework for full-spectrum CNV profiling that is sensitive for variants with both common and rare population frequencies and that is applicable to study designs with and without negative control samples. We demonstrate and evaluate CODEX2 on whole-exome and targeted sequencing data, where biases are the most prominent. CODEX2 outperforms existing methods and, in particular, significantly improves sensitivity for common CNVs.


2014 ◽  
Vol 13 ◽  
pp. CIN.S19519 ◽  
Author(s):  
Oscar Krijgsman ◽  
Christian Benner ◽  
Gerrit A. Meijer ◽  
Mark A. van de Wiel ◽  
Bauke Ylstra

In order to identify somatic focal copy number aberrations (CNAs) in cancer specimens and to distinguish them from germ-line copy number variations (CNVs), we developed the software package FocalCall. FocalCall enables user-defined size cutoffs to recognize focal aberrations and builds on established array comparative genomic hybridization segmentation and calling algorithms. To distinguish CNAs from CNVs, the algorithm uses matched patient normal signals as references or, if this is not available, a list with known CNVs in a population. Furthermore, FocalCall differentiates between homozygous and heterozygous deletions as well as between gains and amplifications and is applicable to high-resolution array and sequencing data. AVAILABILITY AND IMPLEMENTATION: FocalCall is available as an R-package from: https://github.com/OscarKrijgsman/focalCall . The R-package will be available in Bioconductor.org as of release 3.0.


Author(s):  
Kun Xie ◽  
Kang Liu ◽  
Haque A K Alvi ◽  
Yuehui Chen ◽  
Shuzhen Wang ◽  
...  

Copy number variation (CNV) is a well-known type of genomic mutation that is associated with the development of human cancer diseases. Detection of CNVs from the human genome is a crucial step for the pipeline of starting from mutation analysis to cancer disease diagnosis and treatment. Next-generation sequencing (NGS) data provides an unprecedented opportunity for CNVs detection at the base-level resolution, and currently, many methods have been developed for CNVs detection using NGS data. However, due to the intrinsic complexity of CNVs structures and NGS data itself, accurate detection of CNVs still faces many challenges. In this paper, we present an alternative method, called KNNCNV (K-Nearest Neighbor based CNV detection), for the detection of CNVs using NGS data. Compared to current methods, KNNCNV has several distinctive features: 1) it assigns an outlier score to each genome segment based solely on its first k nearest-neighbor distances, which is not only easy to extend to other data types but also improves the power of discovering CNVs, especially the local CNVs that are likely to be masked by their surrounding regions; 2) it employs the variational Bayesian Gaussian mixture model (VBGMM) to transform these scores into a series of binary labels without a user-defined threshold. To evaluate the performance of KNNCNV, we conduct both simulation and real sequencing data experiments and make comparisons with peer methods. The experimental results show that KNNCNV could derive better performance than others in terms of F1-score.


PeerJ ◽  
2015 ◽  
Vol 3 ◽  
pp. e1419 ◽  
Author(s):  
Jose E. Kroll ◽  
Jihoon Kim ◽  
Lucila Ohno-Machado ◽  
Sandro J. de Souza

Motivation.Alternative splicing events (ASEs) are prevalent in the transcriptome of eukaryotic species and are known to influence many biological phenomena. The identification and quantification of these events are crucial for a better understanding of biological processes. Next-generation DNA sequencing technologies have allowed deep characterization of transcriptomes and made it possible to address these issues. ASEs analysis, however, represents a challenging task especially when many different samples need to be compared. Some popular tools for the analysis of ASEs are known to report thousands of events without annotations and/or graphical representations. A new tool for the identification and visualization of ASEs is here described, which can be used by biologists without a solid bioinformatics background.Results.A software suite namedSplicing Expresswas created to perform ASEs analysis from transcriptome sequencing data derived from next-generation DNA sequencing platforms. Its major goal is to serve the needs of biomedical researchers who do not have bioinformatics skills.Splicing Expressperforms automatic annotation of transcriptome data (GTF files) using gene coordinates available from the UCSC genome browser and allows the analysis of data from all available species. The identification of ASEs is done by a known algorithm previously implemented in another tool namedSplooce. As a final result,Splicing Expresscreates a set of HTML files composed of graphics and tables designed to describe the expression profile of ASEs among all analyzed samples. By using RNA-Seq data from the Illumina Human Body Map and the Rat Body Map, we show thatSplicing Expressis able to perform all tasks in a straightforward way, identifying well-known specific events.Availability and Implementation.Splicing Expressis written in Perl and is suitable to run only in UNIX-like systems. More details can be found at:http://www.bioinformatics-brazil.org/splicingexpress.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12564
Author(s):  
Taifu Wang ◽  
Jinghua Sun ◽  
Xiuqing Zhang ◽  
Wen-Jing Wang ◽  
Qing Zhou

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.


2021 ◽  
pp. 1-8
Author(s):  
Jian-Chun He ◽  
Shao-Ying Li ◽  
Wen-Zhi He ◽  
Jia-Jia Xian ◽  
Xiao-Yan Ma ◽  
...  

At present, low-pass whole-genome sequencing (WGS) is frequently used in clinical research and in the screening of copy number variations (CNVs). However, there are still some challenges in the detection of triploids. Restriction site-associated DNA sequencing (RAD-Seq) technology is a reduced-representation genome sequencing technology developed based on next-generation sequencing. Here, we verified whether RAD-Seq could be employed to detect CNVs and triploids. In this study, genomic DNA of 11 samples was extracted employing a routine method and used to build libraries. Five cell lines of known karyotypes and 6 triploid abortion tissue samples were included for RAD-Seq testing. The triploid samples were confirmed by STR analysis and also tested by low-pass WGS. The accuracy and efficiency of detecting CNVs and triploids by RAD-Seq were then assessed, compared with low-pass WGS. In our results, RAD-Seq detected 11 out of 11 (100%) chromosomal abnormalities, including 4 deletions and 1 aneuploidy in the purchased cell lines and all triploid samples. By contrast, these triploids were missed by low-pass WGS. Furthermore, RAD-Seq showed a higher resolution and more accurate allele frequency in the detection of triploids than low-pass WGS. Our study shows that, compared with low-pass WGS, RAD-Seq has relatively higher accuracy in CNV detection at a similar cost and is capable of identifying triploids. Therefore, the application of this technique in medical genetics has a significant potential value.


2020 ◽  
Author(s):  
Getiria Onsongo ◽  
Ham Ching Lam ◽  
Matthew Bower ◽  
Bharat Thyagarajan

Abstract Objective : Detection of small copy number variations (CNVs) in clinically relevant genes is routinely being used to aid diagnosis. We recently developed a tool, CNV-RF , capable of detecting small clinically relevant CNVs. CNV-RF was designed for small gene panels and did not scale well to large gene panels. On large gene panels, CNV-RF routinely failed due to memory limitations. When successful, it took about 2 days to complete a single analysis, making it impractical for routinely analyzing large gene panels. We need a reliable tool capable of detecting CNVs in the clinic that scales well to large gene panels. Results : We have developed Hadoop-CNV-RF, a scalable implementation of CNV-RF . Hadoop-CNV-RF is a freely available tool capable of rapidly analyzing large gene panels. It takes advantage of Hadoop, a big data framework developed to analyze large amounts of data. Preliminary results show it reduces analysis time from about 2 days to less than 4 hours and can seamlessly scale to large gene panels. Hadoop-CNV-RF has been clinically validated for targeted capture data and is currently being used in a CLIA molecular diagnostics laboratory. Its availability and usage instructions are publicly available at: https://github.com/getiria-onsongo/hadoop-cnvrf-public .


2020 ◽  
Vol 16 (7) ◽  
pp. e1008012 ◽  
Author(s):  
Xian F. Mallory ◽  
Mohammadamin Edrisi ◽  
Nicholas Navin ◽  
Luay Nakhleh

Sign in / Sign up

Export Citation Format

Share Document