scholarly journals Benchmark of tools for CNV detection from NGS panel data in a genetic diagnostics context

2019 ◽  
Author(s):  
José Marcos Moreno-Cabrera ◽  
Jesús del Valle ◽  
Elisabeth Castellanos ◽  
Lidia Feliubadaló ◽  
Marta Pineda ◽  
...  

ABSTRACTMotivationAlthough germline copy number variants (CNVs) are the genetic cause of multiple hereditary diseases, detecting them from targeted next-generation sequencing data (NGS) remains a challenge. Existing tools perform well for large CNVs but struggle with single and multi-exon alterations. The aim of this work is to evaluate CNV calling tools working on gene panel NGS data with CNVs up to single-exon resolution and their suitability as a screening step before orthogonal confirmation in genetic diagnostics strategies.ResultsFive tools (DECoN, CoNVaDING, panelcn.MOPS, ExomeDepth and CODEX2) were tested against four genetic diagnostics datasets (495 samples, 231 CNVs), using the default and sensitivity-optimized parameters. Most tools were highly sensitive and specific, but the performance was dataset-dependant. In our in-house datasets, DECoN and panelcn.MOPS with optimized parameters showed enough sensitivity to be used as screening methods in genetic diagnostics.AvailabilityBenchmarking-optimization code is freely available at https://github.com/TranslationalBioinformaticsIGTP/CNVbenchmarkeR.

2020 ◽  
Vol 28 (12) ◽  
pp. 1645-1655 ◽  
Author(s):  
José Marcos Moreno-Cabrera ◽  
Jesús del Valle ◽  
Elisabeth Castellanos ◽  
Lidia Feliubadaló ◽  
Marta Pineda ◽  
...  

AbstractAlthough germline copy-number variants (CNVs) are the genetic cause of multiple hereditary diseases, detecting them from targeted next-generation sequencing data (NGS) remains a challenge. Existing tools perform well for large CNVs but struggle with single and multi-exon alterations. The aim of this work is to evaluate CNV calling tools working on gene panel NGS data and their suitability as a screening step before orthogonal confirmation in genetic diagnostics strategies. Five tools (DECoN, CoNVaDING, panelcn.MOPS, ExomeDepth, and CODEX2) were tested against four genetic diagnostics datasets (two in-house and two external) for a total of 495 samples with 231 single and multi-exon validated CNVs. The evaluation was performed using the default and sensitivity-optimized parameters. Results showed that most tools were highly sensitive and specific, but the performance was dataset dependant. When evaluating them in our diagnostics scenario, DECoN and panelcn.MOPS detected all CNVs with the exception of one mosaic CNV missed by DECoN. However, DECoN outperformed panelcn.MOPS specificity achieving values greater than 0.90 when using the optimized parameters. In our in-house datasets, DECoN and panelcn.MOPS showed the highest performance for CNV screening before orthogonal confirmation. Benchmarking and optimization code is freely available at https://github.com/TranslationalBioinformaticsIGTP/CNVbenchmarkeR.


Author(s):  
Anne Krogh Nøhr ◽  
Kristian Hanghøj ◽  
Genis Garcia Erill ◽  
Zilong Li ◽  
Ida Moltke ◽  
...  

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.


2017 ◽  
Vol 2 ◽  
pp. 35 ◽  
Author(s):  
Shazia Mahamdallie ◽  
Elise Ruark ◽  
Shawn Yost ◽  
Emma Ramsay ◽  
Imran Uddin ◽  
...  

Detection of deletions and duplications of whole exons (exon CNVs) is a key requirement of genetic testing. Accurate detection of this variant type has proved very challenging in targeted next-generation sequencing (NGS) data, particularly if only a single exon is involved. Many different NGS exon CNV calling methods have been developed over the last five years. Such methods are usually evaluated using simulated and/or in-house data due to a lack of publicly-available datasets with orthogonally generated results. This hinders tool comparisons, transparency and reproducibility. To provide a community resource for assessment of exon CNV calling methods in targeted NGS data, we here present the ICR96 exon CNV validation series. The dataset includes high-quality sequencing data from a targeted NGS assay (the TruSight Cancer Panel) together with Multiplex Ligation-dependent Probe Amplification (MLPA) results for 96 independent samples. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes. The dataset includes 46 exon CNVs in BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM or PTEN, giving excellent representation of the cancer predisposition genes most frequently tested in clinical practice. Moreover, the validated exon CNVs include 25 single exon CNVs, the most difficult type of exon CNV to detect. The FASTQ files for the ICR96 exon CNV validation series can be accessed through the European-Genome phenome Archive (EGA) under the accession number EGAS00001002428.


2018 ◽  
Vol 16 (05) ◽  
pp. 1850018 ◽  
Author(s):  
Sanjeev Kumar ◽  
Suneeta Agarwal ◽  
Ranvijay

Genomic data nowadays is playing a vital role in number of fields such as personalized medicine, forensic, drug discovery, sequence alignment and agriculture, etc. With the advancements and reduction in the cost of next-generation sequencing (NGS) technology, these data are growing exponentially. NGS data are being generated more rapidly than they could be significantly analyzed. Thus, there is much scope for developing novel data compression algorithms to facilitate data analysis along with data transfer and storage directly. An innovative compression technique is proposed here to address the problem of transmission and storage of large NGS data. This paper presents a lossless non-reference-based FastQ file compression approach, segregating the data into three different streams and then applying appropriate and efficient compression algorithms on each. Experiments show that the proposed approach (WBFQC) outperforms other state-of-the-art approaches for compressing NGS data in terms of compression ratio (CR), and compression and decompression time. It also has random access capability over compressed genomic data. An open source FastQ compression tool is also provided here ( http://www.algorithm-skg.com/wbfqc/home.html ).


2017 ◽  
pp. 1-17 ◽  
Author(s):  
Sumit Middha ◽  
Liying Zhang ◽  
Khedoudja Nafa ◽  
Gowtham Jayakumaran ◽  
Donna Wong ◽  
...  

Purpose Microsatellite instability (MSI)/mismatch repair (MMR) status is increasingly important in the management of patients with cancer to predict response to immune checkpoint inhibitors. We determined MSI status from large-panel clinical targeted next-generation sequencing (NGS) data across various solid cancer types. Methods The MSI statuses of 12,288 advanced solid cancers consecutively sequenced with Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets clinical NGS assay were inferred by using MSIsensor, a program that reports the percentage of unstable microsatellites as a score. Cutoff score determination and sensitivity/specificity were based on MSI polymerase chain reaction (PCR) and MMR immunohistochemistry. Results By using an MSIsensor score ≥ 10 to define MSI high (MSI-H), 83 (8%) of 996 colorectal cancers (CRCs) and 42 (16%) of 260 uterine endometrioid cancers (UECs) were MSI-H. Validation against MSI PCR and/or MMR immunohistochemistry performed for 138 (24 MSI-H, 114 microsatellite stable [MSS]) CRCs, and 40 (15 MSI-H, 25 MSS) UECs showed a concordance of 99.4%. MSIsensor also identified 68 MSI-H/MMR-deficient (MMR-D) non-CRC/UECs. Of 9,591 non-CRC/UEC tumors with MSS MSIsensor status, 456 (4.8%) had slightly elevated scores (≥ 3 and < 10) of which 96.6% with available material were confirmed to be MSS by MSI PCR. MSI-H was also detected and confirmed in three non-CRC/UECs with low exonic mutation burden (< 20). MSIsensor correctly scored all 15 polymerase ε ultra-mutated cancers as negative for MSI. Conclusion MSI status can be reliably inferred by MSIsensor from large-panel targeted NGS data. Concurrent MSI testing by NGS is resource efficient, is potentially more sensitive for MMR-D than MSI PCR, and allows identification of MSI-H across various cancers not typically screened, as highlighted by the finding that 35% (68 of 193) of all MSI-H tumors were non-CRC/UEC.


2020 ◽  
Vol 42 (11) ◽  
pp. 1311-1317
Author(s):  
Dong-Jun Lee ◽  
Taesoo Kwon ◽  
Chang-Kug Kim ◽  
Young-Joo Seol ◽  
Dong-Suk Park ◽  
...  

Abstract Background Sequence variations such as single nucleotide polymorphisms are markers for genetic diseases and breeding. Therefore, identifying sequence variations is one of the main objectives of several genome projects. Although most genome project consortiums provide standard operation procedures for sequence variation detection methods, there may be differences in the results because of human selection or error. Objective To standardize the procedure for sequence variation detection and help researchers who are not formally trained in bioinformatics, we developed the NGS_SNPAnalyzer, a desktop software and fully automated graphical pipeline. Methods The NGS_SNPAnalyzer is implemented using JavaFX (version 1.8); therefore, it is not limited to any operating system (OS). The tools employed in the NGS_SNPAnalyzer were compiled on Microsoft Windows (version 7, 10) and Ubuntu Linux (version 16.04, 17.0.4). Results The NGS_SNPAnalyzer not only includes the functionalities for variant calling and annotation but also provides quality control, mapping, and filtering details to support all procedures from next-generation sequencing (NGS) data to variant visualization. It can be executed using pre-set pipelines and options and customized via user-specified options. Additionally, the NGS_SNPAnalyzer provides a user-friendly graphical interface and can be installed on any OS that supports JAVA. Conclusions Although there are several pipelines and visualization tools available for NGS data analysis, we developed the NGS_SNPAnalyzer to provide the user with an easy-to-use interface. The benchmark test results indicate that the NGS_SNPAnayzer achieves better performance than other open source tools.


2019 ◽  
Author(s):  
Tingting Gong ◽  
Vanessa M Hayes ◽  
Eva KF Chan

AbstractSomatic structural variants (SVs) play a significant role in cancer development and evolution, but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS, and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of eight commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the eight SV callers examined in this paper. As the importance of large structural variants become increasingly recognised in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection and guidance on selecting an appropriate SV caller.


Sign in / Sign up

Export Citation Format

Share Document