Reliable Pan-Cancer Microsatellite Instability Assessment by Using Targeted Next-Generation Sequencing Data

Purpose Microsatellite instability (MSI)/mismatch repair (MMR) status is increasingly important in the management of patients with cancer to predict response to immune checkpoint inhibitors. We determined MSI status from large-panel clinical targeted next-generation sequencing (NGS) data across various solid cancer types. Methods The MSI statuses of 12,288 advanced solid cancers consecutively sequenced with Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets clinical NGS assay were inferred by using MSIsensor, a program that reports the percentage of unstable microsatellites as a score. Cutoff score determination and sensitivity/specificity were based on MSI polymerase chain reaction (PCR) and MMR immunohistochemistry. Results By using an MSIsensor score ≥ 10 to define MSI high (MSI-H), 83 (8%) of 996 colorectal cancers (CRCs) and 42 (16%) of 260 uterine endometrioid cancers (UECs) were MSI-H. Validation against MSI PCR and/or MMR immunohistochemistry performed for 138 (24 MSI-H, 114 microsatellite stable [MSS]) CRCs, and 40 (15 MSI-H, 25 MSS) UECs showed a concordance of 99.4%. MSIsensor also identified 68 MSI-H/MMR-deficient (MMR-D) non-CRC/UECs. Of 9,591 non-CRC/UEC tumors with MSS MSIsensor status, 456 (4.8%) had slightly elevated scores (≥ 3 and < 10) of which 96.6% with available material were confirmed to be MSS by MSI PCR. MSI-H was also detected and confirmed in three non-CRC/UECs with low exonic mutation burden (< 20). MSIsensor correctly scored all 15 polymerase ε ultra-mutated cancers as negative for MSI. Conclusion MSI status can be reliably inferred by MSIsensor from large-panel targeted NGS data. Concurrent MSI testing by NGS is resource efficient, is potentially more sensitive for MMR-D than MSI PCR, and allows identification of MSI-H across various cancers not typically screened, as highlighted by the finding that 35% (68 of 193) of all MSI-H tumors were non-CRC/UEC.

Download Full-text

Detect and visualize tumor microsatellite instability status from next-generation sequencing data by simulating PCR techniques.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e13052 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e13052-e13052

Author(s):

Shifu Chen ◽

Hongyue Qu ◽

Bo Yang ◽

Tanxiao Huang ◽

Xiaoni Zhang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Microsatellite Instability ◽

Multiplex Pcr ◽

Microsatellite Locus ◽

Pcr Primers ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sample Amount ◽

Ngs Data ◽

Generation Sequencing

e13052 Background: Although multiplex PCR is still the golden standard technology for detecting microsatellite instability (MSI), it has some disadvantages. Firstly, its results are not quantized, and the interpretation of results can be varying for different operators. Secondly, PCR requires extra sample amount and may affect the sample amount for next-generation sequencing (NGS). So we developed a method to detect MSI status from NGS data. Methods: We developed a tool called VisualMSI, which simulates the PCR behaviors to detect the instability level of microsatellite loci. For each microsatellite locus, a pair of PCR primers are simulated by extracting the sequence from reference genome. The read pairs covering this microsatellite locus will be merged first, and then the simulated PCR primers will be mapped to the merged sequences to evaluate the inserted length. For paired tumor/normal samples, VisualMSI evaluates the earth mover's distance (EMD) between the inserted length distributions of tumor and normal. MSI status is determined by the EMD value, and is also visualized on a HTML page for manual validation. As a comparison, the MSI was also evaluated using a multiplex PCR comprising 5 loci (NR27, NR21, NR24, BAT25, and BAT26). Results: To evaluate the concordance of VisualMSI results and PCR-based results, we enrolled a group of 92 patients (39 lung cancers, 33 colorectal cancers and 20 others). For each patient, a tumor tissue sample and a blood sample were collected. White blood cells from the blood samples were also sequenced as normal control. For each patient, the 5 major MSI loci, including BAT-25, BAT-26, NR-21, NR-24 and NR-27 were evaluated by VisualMSI from NGS data, and by PCR. MSI status was categorized as MSI-H or MSI-L for each MSI locus. Finally, for the total 460 MSI loci, the VisualMSI and PCR results were concordant for 425 of them (92.39%), and were not concordant for the rest 35 (7.61%). The sensitivity is 89% (95% CI) and the specificity is 95% (95% CI), indicating that they were highly consistent. Conclusions: We developed a tool for detecting and visualizing MSI status from NGS data. The data shows that results of VisualMSI and PCR are highly concordant. This tool is now open-sourced at: https://github.com/OpenGene/VisualMSI.

Download Full-text

NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab174 ◽

2021 ◽

Author(s):

Anne Krogh Nøhr ◽

Kristian Hanghøj ◽

Genis Garcia Erill ◽

Zilong Li ◽

Ida Moltke ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Research ◽

Likelihood Estimation ◽

Software Tool ◽

Estimation Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data ◽

Generation Sequencing

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.

Download Full-text

Mutation Burden and I Index for Detection of Microsatellite Instability in Colorectal Cancer by Targeted Next-Generation Sequencing

Journal of Molecular Diagnostics ◽

10.1016/j.jmoldx.2018.09.005 ◽

2019 ◽

Vol 21 (2) ◽

pp. 241-250 ◽

Cited By ~ 8

Author(s):

Jeong E. Kim ◽

Sung-Min Chun ◽

Yong S. Hong ◽

Kyu-pyo Kim ◽

Sun Y. Kim ◽

...

Keyword(s):

Colorectal Cancer ◽

Next Generation Sequencing ◽

Microsatellite Instability ◽

Next Generation ◽

Targeted Next Generation Sequencing ◽

Mutation Burden ◽

Generation Sequencing

Download Full-text

Detection of Mismatch Repair Deficiency and Microsatellite Instability in Colorectal Adenocarcinoma by Targeted Next-Generation Sequencing

Journal of Molecular Diagnostics ◽

10.1016/j.jmoldx.2016.07.010 ◽

2017 ◽

Vol 19 (1) ◽

pp. 84-91 ◽

Cited By ~ 66

Author(s):

Jonathan A. Nowak ◽

Matthew B. Yurgelun ◽

Jacqueline L. Bruce ◽

Vanesa Rojas-Rudilla ◽

Dimity L. Hall ◽

...

Keyword(s):

Next Generation Sequencing ◽

Microsatellite Instability ◽

Mismatch Repair ◽

Colorectal Adenocarcinoma ◽

Next Generation ◽

Mismatch Repair Deficiency ◽

Targeted Next Generation Sequencing ◽

Repair Deficiency ◽

Generation Sequencing

Download Full-text

WBFQC: A new approach for compressing next-generation sequencing data splitting into homogeneous streams

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001850018x ◽

2018 ◽

Vol 16 (05) ◽

pp. 1850018 ◽

Cited By ~ 1

Author(s):

Sanjeev Kumar ◽

Suneeta Agarwal ◽

Ranvijay

Keyword(s):

Next Generation Sequencing ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Compression Technique ◽

Compression Algorithms ◽

Ngs Data ◽

And Storage ◽

Generation Sequencing

Genomic data nowadays is playing a vital role in number of fields such as personalized medicine, forensic, drug discovery, sequence alignment and agriculture, etc. With the advancements and reduction in the cost of next-generation sequencing (NGS) technology, these data are growing exponentially. NGS data are being generated more rapidly than they could be significantly analyzed. Thus, there is much scope for developing novel data compression algorithms to facilitate data analysis along with data transfer and storage directly. An innovative compression technique is proposed here to address the problem of transmission and storage of large NGS data. This paper presents a lossless non-reference-based FastQ file compression approach, segregating the data into three different streams and then applying appropriate and efficient compression algorithms on each. Experiments show that the proposed approach (WBFQC) outperforms other state-of-the-art approaches for compressing NGS data in terms of compression ratio (CR), and compression and decompression time. It also has random access capability over compressed genomic data. An open source FastQ compression tool is also provided here ( http://www.algorithm-skg.com/wbfqc/home.html ).

Download Full-text

NGS_SNPAnalyzer: a desktop software supporting genome projects by identifying and visualizing sequence variations from next-generation sequencing data

Genes & Genomics ◽

10.1007/s13258-020-00997-7 ◽

2020 ◽

Vol 42 (11) ◽

pp. 1311-1317

Author(s):

Dong-Jun Lee ◽

Taesoo Kwon ◽

Chang-Kug Kim ◽

Young-Joo Seol ◽

Dong-Suk Park ◽

...

Keyword(s):

Next Generation Sequencing ◽

Sequence Variation ◽

Detection Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Sequence Variations ◽

Ngs Data ◽

Generation Sequencing ◽

Genome Projects

Abstract Background Sequence variations such as single nucleotide polymorphisms are markers for genetic diseases and breeding. Therefore, identifying sequence variations is one of the main objectives of several genome projects. Although most genome project consortiums provide standard operation procedures for sequence variation detection methods, there may be differences in the results because of human selection or error. Objective To standardize the procedure for sequence variation detection and help researchers who are not formally trained in bioinformatics, we developed the NGS_SNPAnalyzer, a desktop software and fully automated graphical pipeline. Methods The NGS_SNPAnalyzer is implemented using JavaFX (version 1.8); therefore, it is not limited to any operating system (OS). The tools employed in the NGS_SNPAnalyzer were compiled on Microsoft Windows (version 7, 10) and Ubuntu Linux (version 16.04, 17.0.4). Results The NGS_SNPAnalyzer not only includes the functionalities for variant calling and annotation but also provides quality control, mapping, and filtering details to support all procedures from next-generation sequencing (NGS) data to variant visualization. It can be executed using pre-set pipelines and options and customized via user-specified options. Additionally, the NGS_SNPAnalyzer provides a user-friendly graphical interface and can be installed on any OS that supports JAVA. Conclusions Although there are several pipelines and visualization tools available for NGS data analysis, we developed the NGS_SNPAnalyzer to provide the user with an easy-to-use interface. The benchmark test results indicate that the NGS_SNPAnayzer achieves better performance than other open source tools.

Download Full-text

DeviCNV: detection and visualization of exon-level copy number variants in targeted next-generation sequencing data

BMC Bioinformatics ◽

10.1186/s12859-018-2409-6 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 2

Author(s):

Yeeok Kang ◽

Seong-Hyeuk Nam ◽

Kyung Sun Park ◽

Yoonjung Kim ◽

Jong-Won Kim ◽

...

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Targeted Next Generation Sequencing ◽

Exon Level ◽

Generation Sequencing

Download Full-text

OpenContami: a web-based application for detecting microbial contaminants in next-generation sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btab101 ◽

2021 ◽

Author(s):

Sung-Joon Park ◽

Kenta Nakai

Keyword(s):

Next Generation Sequencing ◽

Cell Biology ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Easy Access ◽

Next Generation ◽

Web Based ◽

Ngs Data ◽

The Impact ◽

Generation Sequencing

Abstract Summary Microorganisms infect and contaminate eukaryotic cells during the course of biological experiments. Because microbes influence host cell biology and may therefore lead to erroneous conclusions, a computational platform that facilitates decontamination is indispensable. Recent studies show that next-generation sequencing (NGS) data can be used to identify the presence of exogenous microbial species. Previously, we proposed an algorithm to improve detection of microbes in NGS data. Here, we developed an online application, OpenContami, which allows researchers easy access to the algorithm via interactive web-based interfaces. We have designed the application by incorporating a database comprising analytical results from a large-scale public dataset and data uploaded by users. The database serves as a reference for assessing user data and provides a list of genera detected from negative blank controls as a ‘blacklist’, which is useful for studying human infectious diseases. OpenContami offers a comprehensive overview of exogenous species in NGS datasets; as such, it will increase our understanding of the impact of microbial contamination on biological and pathological traits. Availability and implementation OpenContami is freely available at: https://openlooper.hgc.jp/opencontami/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detection of somatic structural variants from short-read next-generation sequencing data

10.1101/840751 ◽

2019 ◽

Author(s):

Tingting Gong ◽

Vanessa M Hayes ◽

Eva KF Chan

Keyword(s):

Next Generation Sequencing ◽

Cancer Genomics ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Structural Variants ◽

Sequencing Data ◽

Short Read ◽

Factors Affecting ◽

Ngs Data ◽

Generation Sequencing

AbstractSomatic structural variants (SVs) play a significant role in cancer development and evolution, but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS, and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of eight commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the eight SV callers examined in this paper. As the importance of large structural variants become increasingly recognised in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection and guidance on selecting an appropriate SV caller.

Download Full-text

VisVariant: A java program to visualise genetic variants in next-generation sequencing data

10.1101/2021.02.12.431037 ◽

2021 ◽

Author(s):

King Wai Lau ◽

Michelle Kleeman ◽

Caroline Reuter ◽

Attila Lorincz

Keyword(s):

Next Generation Sequencing ◽

Genetic Variants ◽

Variant Calling ◽

Next Generation Sequencing Data ◽

Sequence Information ◽

Next Generation ◽

Sequencing Data ◽

Java Program ◽

Ngs Data ◽

Generation Sequencing

AbstractSummaryExtremely large datasets are impossible or very difficult for humans to comprehend by standard mental approaches. Intuitive visualization of genetic variants in genomic sequencing data could help in the review and confirmation process of variants called by automated variant calling programs. To help facilitate interpretation of genetic variant next-generation sequencing (NGS) data we developed VisVariant, a customizable visualization tool that creates a figure showing the overlapping sequence information of thousands of individual reads including the variant and flanking regions.Availability and implementationDetailed information on how to download, install and run VisVariant together with an example is available on our github website [https://github.com/hugging-biorxiv/visvariant].

Download Full-text