scholarly journals Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations

Cancers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2779
Author(s):  
Ionut Dragomir ◽  
Adnan Akbar ◽  
John W. Cassidy ◽  
Nirmesh Patel ◽  
Harry W. Clifford ◽  
...  

Sporadic cancer develops from the accrual of somatic mutations. Out of all small-scale somatic aberrations in coding regions, 95% are base substitutions, with 90% being missense mutations. While multiple studies focused on the importance of this mutation type, a machine learning method based on the number of protein–protein interactions (PPIs) has not been fully explored. This study aims to develop an improved computational method for driver identification, validation and evaluation (DRIVE), which is compared to other methods for assessing its performance. DRIVE aims at distinguishing between driver and passenger mutations using a feature-based learning approach comprising two levels of biological classification for a pan-cancer assessment of somatic mutations. Gene-level features include the maximum number of protein–protein interactions, the biological process and the type of post-translational modifications (PTMs) while mutation-level features are based on pathogenicity scores. Multiple supervised classification algorithms were trained on Genomics Evidence Neoplasia Information Exchange (GENIE) project data and then tested on an independent dataset from The Cancer Genome Atlas (TCGA) study. Finally, the most powerful classifier using DRIVE was evaluated on a benchmark dataset, which showed a better overall performance compared to other state-of-the-art methodologies, however, considerable care must be taken due to the reduced size of the dataset. DRIVE outlines the outstanding potential that multiple levels of a feature-based learning model will play in the future of oncology-based precision medicine.

2016 ◽  
Author(s):  
Héctor Climente-González ◽  
Eduard Porta-Pardo ◽  
Adam Godzik ◽  
Eduardo Eyras

SummaryAlternative splicing changes are frequently observed in cancer and are starting to be recognized as important signatures for tumor progression and therapy. However, their functional impact and relevance to tumorigenesis remains mostly unknown. We carried out a systematic analysis to characterize the potential functional consequences of alternative splicing changes in thousands of tumor samples. This analysis revealed that a subset of alternative splicing changes affect protein domain families that are frequently mutated in tumors and potentially disrupt protein protein interactions in cancer-related pathways. Moreover, there was a negative correlation between the number of these alternative splicing changes in a sample and the number of somatic mutations in drivers. We propose that a subset of the alternative splicing changes observed in tumors may represent independent oncogenic processes that could be relevant to explain the functional transformations in cancer and some of them could potentially be considered alternative splicing drivers (AS-drivers).


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Gizem Gulfidan ◽  
Beste Turanli ◽  
Hande Beklen ◽  
Raghu Sinha ◽  
Kazim Yalcin Arga

2019 ◽  
Vol 20 (3) ◽  
pp. 177-184 ◽  
Author(s):  
Nantao Zheng ◽  
Kairou Wang ◽  
Weihua Zhan ◽  
Lei Deng

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.


Author(s):  
Jonas Defoort ◽  
Yves Van de Peer ◽  
Lorenzo Carretero-Paulet

Abstract Gene duplicates, generated either through whole genome duplication (WGD) or small-scale duplication (SSD), are prominent in angiosperms and are believed to play an important role in adaptation and in generating evolutionary novelty. Previous studies reported contrasting evolutionary and functional dynamics of duplicate genes depending on the mechanism of origin, a behaviour that is hypothesized to stem from constraints to maintain the relative dosage balance between the genes concerned and their interaction context. However, the mechanisms ultimately influencing loss and retention of gene duplicates over evolutionary time are not yet fully elucidated. Here, by using a robust classification of gene duplicates in Arabidopsis thaliana, Solanum lycopersicum and Zea mays, large RNAseq expression compendia and an extensive protein-protein interaction (PPI) network from Arabidopsis, we investigated the impact of PPIs on the differential evolutionary and functional fate of WGD and SSD duplicates. In all three species, retained WGD duplicates show stronger constraints to diverge at the sequence and expression level than SSD ones, a pattern that is also observed for shared PPI partners between Arabidopsis duplicates. PPIs are preferentially distributed among WGD duplicates and specific functional categories. Furthermore, duplicates with PPIs tend to be under stronger constraints to evolve than their counterparts without PPIs regardless of their mechanism of origin. Our results support dosage balance constraint as a specific property of genes involved in biological interactions, including physical PPIs, and suggest that additional factors may be differently influencing the evolution of genes following duplication, depending on the species, time and mechanism of origin.


2020 ◽  
pp. 193229682092262
Author(s):  
Darpit Dave ◽  
Daniel J. DeSalvo ◽  
Balakrishna Haridas ◽  
Siripoom McKay ◽  
Akhil Shenoy ◽  
...  

2019 ◽  
Vol 47 (W1) ◽  
pp. W338-W344 ◽  
Author(s):  
Carlos H M Rodrigues ◽  
Yoochan Myung ◽  
Douglas E V Pires ◽  
David B Ascher

AbstractProtein–protein Interactions are involved in most fundamental biological processes, with disease causing mutations enriched at their interfaces. Here we present mCSM-PPI2, a novel machine learning computational tool designed to more accurately predict the effects of missense mutations on protein–protein interaction binding affinity. mCSM-PPI2 uses graph-based structural signatures to model effects of variations on the inter-residue interaction network, evolutionary information, complex network metrics and energetic terms to generate an optimised predictor. We demonstrate that our method outperforms previous methods, ranking first among 26 others on CAPRI blind tests. mCSM-PPI2 is freely available as a user friendly webserver at http://biosig.unimelb.edu.au/mcsm_ppi2/.


2002 ◽  
Vol 277 (51) ◽  
pp. 49863-49869 ◽  
Author(s):  
Se Bok Jang ◽  
Yeon-Gil Kim ◽  
Yong-Soon Cho ◽  
Pann-Ghill Suh ◽  
Kyung-Hwa Kim ◽  
...  

SEDL is an evolutionarily highly conserved protein in eukaryotic organisms. Deletions or point mutations in theSEDLgene are responsible for the genetic disease spondyloepiphyseal dysplasia tarda (SEDT), an X-linked skeletal disorder. SEDL has been identified as a component of the transport protein particle (TRAPP), critically involved in endoplasmic reticulum-to-Golgi vesicle transport. Herein, we report the 2.4 Å resolution structure of SEDL, which reveals an unexpected similarity to the structures of the N-terminal regulatory domain of two SNAREs, Ykt6p and Sec22b, despite no sequence homology to these proteins. The similarity and the presence of unusually many solvent-exposed apolar residues of SEDL suggest that it serves regulatory and/or adaptor functions through multiple protein-protein interactions. Of the four known missense mutations responsible for SEDT, three mutations (S73L, F83S, V130D) map to the protein interior, where the mutations would disrupt the structure, and the fourth (D47Y) on a surface at which the mutation may abrogate functional interactions with a partner protein.


2016 ◽  
Vol 14 (03) ◽  
pp. 1650011 ◽  
Author(s):  
Wajid Arshad Abbasi ◽  
Fayyaz Ul Amir Afsar Minhas

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein–protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host–pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.


Sign in / Sign up

Export Citation Format

Share Document