Convolutional neural network approach to lung cancer classification integrating protein interaction network and gene expression profiles

2019 ◽  
Vol 17 (03) ◽  
pp. 1940007 ◽  
Author(s):  
Teppei Matsubara ◽  
Tomoshiro Ochiai ◽  
Morihiro Hayashida ◽  
Tatsuya Akutsu ◽  
Jose C. Nacher

Deep learning technologies are permeating every field from image and speech recognition to computational and systems biology. However, the application of convolutional neural networks (CCNs) to “omics” data poses some difficulties, such as the processing of complex networks structures as well as its integration with transcriptome data. Here, we propose a CNN approach that combines spectral clustering information processing to classify lung cancer. The developed spectral-convolutional neural network based method achieves success in integrating protein interaction network data and gene expression profiles to classify lung cancer. The performed computational experiments suggest that in terms of accuracy the predictive performance of our proposed method was better than those of other machine learning methods such as SVM or Random Forest. Moreover, the computational results also indicate that the underlying protein network structure assists to enhance the predictions. Data and CNN code can be downloaded from the link: https://sites.google.com/site/nacherlab/analysis

2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Bi-Qing Li ◽  
Jin You ◽  
Lei Chen ◽  
Jian Zhang ◽  
Ning Zhang ◽  
...  

Lung cancer is one of the leading causes of cancer mortality worldwide. The main types of lung cancer are small cell lung cancer (SCLC) and nonsmall cell lung cancer (NSCLC). In this work, a computational method was proposed for identifying lung-cancer-related genes with a shortest path approach in a protein-protein interaction (PPI) network. Based on the PPI data from STRING, a weighted PPI network was constructed. 54 NSCLC- and 84 SCLC-related genes were retrieved from associated KEGG pathways. Then the shortest paths between each pair of these 54 NSCLC genes and 84 SCLC genes were obtained with Dijkstra’s algorithm. Finally, all the genes on the shortest paths were extracted, and 25 and 38 shortest genes with a permutationPvalue less than 0.05 for NSCLC and SCLC were selected for further analysis. Some of the shortest path genes have been reported to be related to lung cancer. Intriguingly, the candidate genes we identified from the PPI network contained more cancer genes than those identified from the gene expression profiles. Furthermore, these genes possessed more functional similarity with the known cancer genes than those identified from the gene expression profiles. This study proved the efficiency of the proposed method and showed promising results.


2018 ◽  
Vol 6 (4) ◽  
pp. 129-140
Author(s):  
Zhi-Jian Li ◽  
Xing-Ling Sui ◽  
Xue-Bo Yang ◽  
Wen Sun

AbstractTo reveal the biology of AML, we compared gene-expression profiles between normal hematopoietic cells from 38 healthy donors and leukemic blasts (LBs) from 26 AML patients. We defined the comparison of LB and unselected BM as experiment 1, LB and CD34+ isolated from BM as experiment 2, LB and unselected PB as experiment 3, and LB and CD34+ isolated from PB as experiment 4. Then, protein–protein interaction network of DEGs was constructed to identify critical genes. Regulatory impact factors were used to identify critical transcription factors from the differential co-expression network constructed via reanalyzing the microarray profile from the perspective of differential co-expression. Gene ontology enrichment was performed to extract biological meaning. The comparison among the number of DEGs obtained in four experiments showed that cells did not tend to differentiation and CD34+ was more similar to cancer stem cells. Based on the results of protein–protein interaction network,CREBBP,F2RL1,MCM2, andTP53were respectively the key genes in experiments 1, 2, 3, and 4. From gene ontology analysis, we found that immune response was the most common one in four stages. Our results might provide a platform for determining the pathology and therapy of AML.


2022 ◽  
Vol 02 ◽  
Author(s):  
Sergey Shityakov ◽  
Jane Pei-Chen Chang ◽  
Ching-Fang Sun ◽  
David Ta-Wei Guu ◽  
Thomas Dandekar ◽  
...  

Background: Omega-3 polyunsaturated fatty acids (PUFAs), such as eicosapentaenoic (EPA) and docosahexaenoic (DHA) acids, have beneficial effects on human health, but their effect on gene expression in elderly individuals (age ≥ 65) is largely unknown. In order to examine this, the gene expression profiles were analyzed in the healthy subjects (n = 96) at baseline and after 26 weeks of supplementation with EPA+DHA to determine up-regulated and down-regulated dif-ferentially expressed genes (DEGs) triggered by PUFAs. The protein-protein interaction (PPI) networks were constructed by mapping these DEGs to a human interactome and linking them to the specific pathways. Objective: This study aimed to implement supervised machine learning models and protein-protein interaction network analysis of gene expression profiles induced by PUFAs. Methods: The transcriptional profile of GSE12375 was obtained from the Gene Expression Om-nibus database, which is based on the Affymetrix NuGO array. The probe cell intensity data were converted into the gene expression values, and the background correction was performed by the multi-array average algorithm. The LIMMA (Linear Models for Microarray Data) algo-rithm was implemented to identify relevant DEGs at baseline and after 26 weeks of supplemen-tation with a p-value < 0.05. The DAVID web server was used to identify and construct the en-riched KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways. Finally, the construction of machine learning (ML) models, including logistic regression, naïve Bayes, and deep neural networks, were implemented for the analyzed DEGs associated with the specific pathways. Results: The results revealed that up-regulated DEGs were associated with neurotrophin/MAPK signaling, whereas the down-regulated DEGs were linked to cancer, acute myeloid leukemia, and long-term depression pathways. Additionally, ML approaches were able to cluster the EPA/DHA-treated and control groups by the logistic regression performing the best. Conclusion: Overall, this study highlights the pivotal changes in DEGs induced by PUFAs and provides the rationale for the implementation of ML algorithms as predictive models for this type of biomedical data.


2019 ◽  
Vol 17 (01) ◽  
pp. 1950001 ◽  
Author(s):  
Wei Zhang ◽  
Jia Xu ◽  
Yuanyuan Li ◽  
Xiufen Zou

The prediction of protein complexes based on the protein interaction network is a fundamental task for the understanding of cellular life as well as the mechanisms underlying complex disease. A great number of methods have been developed to predict protein complexes based on protein–protein interaction (PPI) networks in recent years. However, because the high throughput data obtained from experimental biotechnology are incomplete, and usually contain a large number of spurious interactions, most of the network-based protein complex identification methods are sensitive to the reliability of the PPI network. In this paper, we propose a new method, Identification of Protein Complex based on Refined Protein Interaction Network (IPC-RPIN), which integrates the topology, gene expression profiles and GO functional annotation information to predict protein complexes from the reconstructed networks. To demonstrate the performance of the IPC-RPIN method, we evaluated the IPC-RPIN on three PPI networks of Saccharomycescerevisiae and compared it with four state-of-the-art methods. The simulation results show that the IPC-RPIN achieved a better result than the other methods on most of the measurements and is able to discover small protein complexes which have traditionally been neglected.


2021 ◽  
Author(s):  
Chayaporn Suphavilai ◽  
Hatairat Yingtaweesittikul

Background: Transcriptomic profiles have become crucial information in understanding diseases and improving treatments. While dysregulated gene sets are identified via pathway analysis, various machine learning models have been proposed for predicting phenotypes such as disease type and drug response based on gene expression patterns. However, these models still lack interpretability, as well as the ability to integrate prior knowledge from a protein-protein interaction network. Results: We propose Grandline, a graph convolutional neural network that can integrate gene expression data and structure of the protein interaction network to predict a specific phenotype. Transforming the interaction network into a spectral domain enables convolution of neighbouring genes and pinpointing high-impact subnetworks, which allow better interpretability of deep learning models. Grandline achieves high phenotype prediction accuracy (67-85% in 8 use cases), comparable to state-of-the-art machine learning models while requiring a smaller number of parameters, allowing it to learn complex but interpretable gene expression patterns from biological datasets. Conclusion: To improve the interpretability of phenotype prediction based on gene expression patterns, we developed Grandline using graph convolutional neural network technique to integrate protein interaction information. We focus on improving the ability to learn nonlinear relationships between gene expression patterns and a given phenotype and incorporation of prior knowledge, which are the main challenges of machine learning models for biological datasets. The graph convolution allows us to aggregate information from relevant genes and reduces the number of trainable parameters, facilitating model training for a small-sized biological dataset.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Trevor Clancy ◽  
Eivind Hovig

Recently, the Immunological Genome Project (ImmGen) completed the first phase of the goal to understand the molecular circuitry underlying the immune cell lineage in mice. That milestone resulted in the creation of the most comprehensive collection of gene expression profiles in the immune cell lineage in any model organism of human disease. There is now a requisite to examine this resource using bioinformatics integration with other molecular information, with the aim of gaining deeper insights into the underlying processes that characterize this immune cell lineage. We present here a bioinformatics approach to study differential protein interaction mechanisms across the entire immune cell lineage, achieved using affinity propagation applied to a protein interaction network similarity matrix. We demonstrate that the integration of protein interaction networks with the most comprehensive database of gene expression profiles of the immune cells can be used to generate hypotheses into the underlying mechanisms governing the differentiation and the differential functional activity across the immune cell lineage. This approach may not only serve as a hypothesis engine to derive understanding of differentiation and mechanisms across the immune cell lineage, but also help identify possible immune lineage specific and common lineage mechanism in the cells protein networks.


Sign in / Sign up

Export Citation Format

Share Document