scholarly journals Uncovering missed indels by leveraging unmapped reads

2018 ◽  
Author(s):  
Mohammad Shabbir Hasan ◽  
Xiaowei Wu ◽  
Liqing Zhang

AbstractIn current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic mutations. While most short reads can be mapped to the reference genome accurately by existing alignment tools, a significant number remain unmapped and excluded from downstream analyses thus potentially discarding important biological information hidden in the unmapped reads. This paper describes Genesis-indel, a computational pipeline that explores the unmapped reads to identify novel indels that are initially missed in the alignment procedure. Genesis-indel is applied to the unmapped reads of 30 Breast Cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-indel is able to leverage the unmapped reads to identify 72,997 small to large novel high-quality indels previously not found in the original alignments and among them, 16,141 have not been annotated in the widely used mutation database. Statistical analysis shows that these new indels mostly altered the oncogenes and tumor suppressor genes. Functional annotation further reveals that these indels are strongly correlated to pathways of cancer and can have high to moderate impact on protein functions. Additionally, these indels overlap with the genes that are missed in the indels from the originally mapped reads and contribute to the tumorigenesis in multiple carcinomas.

2015 ◽  
Author(s):  
Farzana Rahman ◽  
Mehedi Hassan ◽  
Alona Kryshchenko ◽  
Inna Dubchak ◽  
Tatiana V Tatarinova ◽  
...  

In the last decade a number of algorithms and associated software were developed to align next generation sequencing (NGS) reads to relevant reference genomes. The results of these programs may vary significantly, especially when the NGS reads are contain mutations not found in the reference genome. Yet there is no standard way to compare these programs and assess their biological relevance. We propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. In this paper we outline the method and also present a short report of an experiment performed on five popular alignment tools .


2018 ◽  
Author(s):  
Veronika N. Laine ◽  
Toni I. Gossmann ◽  
Kees van Oers ◽  
Marcel E. Visser ◽  
Martien A.M. Groenen

AbstractBackgroundA widely used approach in next-generation sequencing projects is the alignment of reads to a reference genome. A significant percentage of reads, however, frequently remain unmapped despite improvements in the methods and hardware, which have enhanced the efficiency and accuracy of alignments. Usually unmapped reads are discarded from the analysis process, but significant biological information and insights can be uncovered from this data. We explored the unmapped DNA (normal and bisulfite treated) and RNA sequence reads of the great tit (Parus major) reference genome individual. From the unmapped reads we generated de novo assemblies. The generated sequence contigs were then aligned to the NCBI non-redundant nucleotide database using BLAST, identifying the closest known matching sequence.ResultsMany of the aligned contigs showed sequence similarity to sequences from different bird species and genes that were absent in the great tit reference assembly. Furthermore, there were also contigs that represented known P. major pathogenic species. Most interesting were several species of blood parasites such as Plasmodium and Trypanosoma.ConclusionsOur analyses revealed that meaningful biological information can be found when further exploring unmapped reads. It is possible to discover sequences that are either absent or misassembled in the reference genome and sequences that indicate infection or sample contamination. In this study we also propose strategies to aid the capture and interpretation of this information from unmapped reads.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Chih-Yang Wang ◽  
Yung-Chieh Chang ◽  
Yao-Lung Kuo ◽  
Kuo-Ting Lee ◽  
Pai-Sheng Chen ◽  
...  

Abstract Breast cancer is the most common cancer in women, and some patients develop recurrence after standard therapy. Effective predictors are urgently needed to detect recurrence earlier. The activation of Hedgehog signaling in breast cancer is correlated with poor prognosis. PTCH1 is an essential membrane receptor of Hedgehog. However, there are few reports about mutations in Hedgehog genes in breast cancer. We conducted a comprehensive study via an experimental and bioinformatics approach to detect mutated genes in breast cancer. Twenty-two breast cancer patients who developed recurrence within 24 months postoperatively were enrolled with 22 control cancer patients. Targeted deep sequencing was performed to assess the mutations among individuals with breast cancer using a panel of 143 cancer-associated genes. Bioinformatics and public databases were used to predict the protein functions of the mutated genes. Mutations were identified in 44 breast cancer specimens, and the most frequently mutated genes were BRCA2, APC, ATM, BRCA1, NF1, TET2, TSC1, TSC2, NOTCH1, MSH2, PTCH1, TP53, PIK3CA, FBXW7, and RB1. Mutation of these genes was correlated with protein phosphorylation and autophosphorylation, such as peptidyl-tyrosine and protein kinase C phosphorylation. Among these highly mutated genes, mutations of PTCH1 were associated with poor prognosis and increased recurrence of breast cancer, especially mutations in exons 22 and 23. The public sequencing data from the COSMIC database were exploited to predict the functions of the mutations. Our findings suggest that mutation of PTCH1 is correlated with early recurrence of breast cancer patients and will become a powerful predictor for recurrence of breast cancer.


2021 ◽  
Vol 22 (13) ◽  
pp. 6993
Author(s):  
Desiree Loreth ◽  
Moritz Schuette ◽  
Jenny Zinke ◽  
Malte Mohme ◽  
Andras Piffko ◽  
...  

Up to 40% of advance lung, melanoma and breast cancer patients suffer from brain metastases (BM) with increasing incidence. Here, we assessed whether circulating tumor cells (CTCs) in peripheral blood can serve as a disease surrogate, focusing on CD44 and CD74 expression as prognostic markers for BM. We show that a size-based microfluidic approach in combination with a semi-automated cell recognition system are well suited for CTC detection in BM patients and allow further characterization of tumor cells potentially derived from BM. CTCs were found in 50% (7/14) of breast cancer, 50% (9/18) of non-small cell lung cancer (NSCLC) and 36% (4/11) of melanoma patients. The next-generation sequencing (NGS) analysis of nine single CTCs from one breast cancer patient revealed three different CNV profile groups as well as a resistance causing ERS1 mutation. CD44 and CD74 were expressed on most CTCs and their expression was strongly correlated, whereas matched breast cancer BM tissues were much less frequently expressing CD44 and CD74 (negative in 46% and 54%, respectively). Thus, plasticity of CD44 and CD74 expression during trafficking of CTCs in the circulation might be the result of adaptation strategies.


2015 ◽  
Author(s):  
Farzana Rahman ◽  
Mehedi Hassan ◽  
Alona Martin Kryshchenko ◽  
Inna Dubchak ◽  
Nikolai Nickolai Alexandrov ◽  
...  

In the last decade a number of algorithms and associated software were developed to align next generation sequencing (NGS) reads to relevant reference genomes. The results of these programs may vary significantly, especially when the NGS reads are contain mutations not found in the reference genome. Yet there is no standard way to compare these programs and assess their biological relevance. We propose a benchmark to assess accuracy of the short reads mapping based on the precomputed global alignment of closely related genome sequences. In this paper we outline the method and also present a short report of an experiment performed on five popular alignment tools.


2015 ◽  
Author(s):  
Farzana Rahman ◽  
Mehedi Hassan ◽  
Alona Kryshchenko ◽  
Inna Dubchak ◽  
Tatiana V Tatarinova ◽  
...  

In the last decade a number of algorithms and associated software were developed to align next generation sequencing (NGS) reads to relevant reference genomes. The results of these programs may vary significantly, especially when the NGS reads are contain mutations not found in the reference genome. Yet there is no standard way to compare these programs and assess their biological relevance. We propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. In this paper we outline the method and also present a short report of an experiment performed on five popular alignment tools .


Cancers ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 1286 ◽  
Author(s):  
Concetta Santonocito ◽  
Roberta Rizza ◽  
Ida Paris ◽  
Laura De Marchis ◽  
Carmela Paolillo ◽  
...  

Pathogenic variants (PVs) carriers in BRCA1 or BRCA2 are associated with an elevated lifetime risk of developing breast cancer (BC) and/or ovarian cancer (OC). The prevalence of BRCA1 and BRCA2 germline alterations is extremely variable among different ethnic groups. Particularly, the rate of variants in Italian BC and/or OC families is rather controversial and ranges from 8% to 37%, according to different reports. By In Vitro Diagnostic (IVD) next generation sequencing (NGS)-based pipelines, we routinely screened thousands of patients with either sporadic or cancer family history. By NGS, we identified new PVs and some variants of uncertain significance (VUS) which were also evaluated in silico using dedicated tools. We report in detail data regarding BRCA1/2 variants identified in 517 out of 2351 BC and OC patients. The aim of this study was to report the incidence and spectrum of BRCA1/2 variants observed in BC and/or OC patients, tested in at Policlinico Gemelli Foundation Hospital, the origin of which is mainly from Central and Southern Italy. This study provides an overview of the variant frequency in these geographic areas of Italy and provides data that could be used in the clinical management of patients.


2017 ◽  
Author(s):  
Inna Y. Gong ◽  
Natalie S. Fox ◽  
Paul C. Boutros

AbstractBackgroundBiomarkers are a key component of precision medicine. However, full clinical integration of biomarkers has been met with challenges, partly attributed to analytical difficulties. It has been shown that biomarker reproducibility is susceptible to data preprocessing approaches. Here, we systematically evaluated machine-learning ensembles of preprocessing methods as a general strategy to improve biomarker performance for prediction of survival from early breast cancer.ResultsWe risk stratified breast cancer patients into either low-risk or high-risk groups based on four published hypoxia signatures (Buffa, Winter, Hu, and Sorensen), using 24 different preprocessing approaches for microarray normalization. The 24 binary risk profiles determined for each hypoxia signature were combined using a random forest to evaluate the efficacy of a preprocessing ensemble classifier. We demonstrate that the best way of merging preprocessing methods varies from signature to signature, and that there is likely no ‘best’ preprocessing pipeline that is universal across datasets, highlighting the need to evaluate ensembles of preprocessing algorithms. Further, we developed novel signatures for each preprocessing method and the risk classifications from each were incorporated in a meta-random forest model. Interestingly, the classification of these biomarkers and its ensemble show striking consistency, demonstrating that similar intrinsic biological information are being faithfully represented. As such, these classification patterns further confirm that there is a subset of patients whose prognosis is consistently challenging to predict.ConclusionsPerformance of different prognostic signatures varies with pre-processing method. A simple classifier by unanimous voting of classifications is a reliable way of improving on single preprocessing methods. Future signatures will likely require integration of intrinsic and extrinsic clinico-pathological variables to better predict disease-related outcomes.AbbreviationsAUCarea under the receiver operating characteristic curveGCRMAGeneChip Robust Multi-array AverageHG-U133AAffymetrix Human Genome U133AHG-U133 Plus 2.0Affymetrix Human Genome Plus 2.0HRhazard ratioMAS5MicroArray Suite 5.0MBEIModel-base Expression IndexNSCLCNon-small cell lung cancerRFRandom forestROCreceiver operator characteristicRMARobust Multi-array Average


2021 ◽  
Vol 11 (8) ◽  
pp. 816
Author(s):  
Paola Fuso ◽  
Mariantonietta Di Salvatore ◽  
Concetta Santonocito ◽  
Donatella Guarino ◽  
Chiara Autilio ◽  
...  

Background: The aim of this study is to identify miRNAs able to predict the outcomes in breast cancer patients after neoadjuvant chemotherapy (NAC). Patients and methods: We retrospectively analyzed 24 patients receiving NAC and not reaching pathologic complete response (pCR). miRNAs were analyzed using an Illumina Next-Generation-Sequencing (NGS) system. Results: Event-free survival (EFS) and overall survival (OS) were significantly higher in patients with up-regulation of let-7a-5p (EFS p = 0.006; OS p = 0.0001), mirR-100-5p (EFS s p = 0.01; OS p = 0.03), miR-101-3p (EFS p = 0.05; OS p = 0.01), and miR-199a-3p (EFS p = 0.02; OS p = 0.01) in post-NAC samples, independently from breast cancer subtypes. At multivariate analysis, only let-7a-5p was significantly associated with EFS (p = 0.009) and OS (p = 0.0008). Conclusion: Up-regulation of the above miRNAs could represent biomarkers in breast cancer.


Sign in / Sign up

Export Citation Format

Share Document