Detection of Changes in Transitive Associations by Shortest-path Analysis of Protein Interaction Networks Integrated with Gene Expression Profiles

Author(s):  
Hong Qin ◽  
Li Yang
2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Bi-Qing Li ◽  
Jin You ◽  
Lei Chen ◽  
Jian Zhang ◽  
Ning Zhang ◽  
...  

Lung cancer is one of the leading causes of cancer mortality worldwide. The main types of lung cancer are small cell lung cancer (SCLC) and nonsmall cell lung cancer (NSCLC). In this work, a computational method was proposed for identifying lung-cancer-related genes with a shortest path approach in a protein-protein interaction (PPI) network. Based on the PPI data from STRING, a weighted PPI network was constructed. 54 NSCLC- and 84 SCLC-related genes were retrieved from associated KEGG pathways. Then the shortest paths between each pair of these 54 NSCLC genes and 84 SCLC genes were obtained with Dijkstra’s algorithm. Finally, all the genes on the shortest paths were extracted, and 25 and 38 shortest genes with a permutationPvalue less than 0.05 for NSCLC and SCLC were selected for further analysis. Some of the shortest path genes have been reported to be related to lung cancer. Intriguingly, the candidate genes we identified from the PPI network contained more cancer genes than those identified from the gene expression profiles. Furthermore, these genes possessed more functional similarity with the known cancer genes than those identified from the gene expression profiles. This study proved the efficiency of the proposed method and showed promising results.


2005 ◽  
Vol 03 (06) ◽  
pp. 1371-1389 ◽  
Author(s):  
GUANGHUA XIAO ◽  
WEI PAN

Prediction of biological functions of genes is an important issue in basic biology research and has applications in drug discoveries and gene therapies. Previous studies have shown either gene expression data or protein-protein interaction data alone can be used for predicting gene functions. In particular, clustering gene expression profiles has been widely used for gene function prediction. In this paper, we first propose a new method for gene function prediction using protein-protein interaction data, which will facilitate combining prediction results based on clustering gene expression profiles. We then propose a new method to combine the prediction results based on either source of data by weighting on the evidence provided by each. Using protein-protein interaction data downloaded from the GRID database, published gene expression profiles from 300 microarray experiments for the yeast S. cerevisiae, we show that this new combined analysis provides improved predictive performance over that of using either data source alone in a cross-validated analysis of the MIPS gene annotations. Finally, we propose a logistic regression method that is flexible enough to combine information from any number of data sources while maintaining computational feasibility.


2022 ◽  
Vol 02 ◽  
Author(s):  
Sergey Shityakov ◽  
Jane Pei-Chen Chang ◽  
Ching-Fang Sun ◽  
David Ta-Wei Guu ◽  
Thomas Dandekar ◽  
...  

Background: Omega-3 polyunsaturated fatty acids (PUFAs), such as eicosapentaenoic (EPA) and docosahexaenoic (DHA) acids, have beneficial effects on human health, but their effect on gene expression in elderly individuals (age ≥ 65) is largely unknown. In order to examine this, the gene expression profiles were analyzed in the healthy subjects (n = 96) at baseline and after 26 weeks of supplementation with EPA+DHA to determine up-regulated and down-regulated dif-ferentially expressed genes (DEGs) triggered by PUFAs. The protein-protein interaction (PPI) networks were constructed by mapping these DEGs to a human interactome and linking them to the specific pathways. Objective: This study aimed to implement supervised machine learning models and protein-protein interaction network analysis of gene expression profiles induced by PUFAs. Methods: The transcriptional profile of GSE12375 was obtained from the Gene Expression Om-nibus database, which is based on the Affymetrix NuGO array. The probe cell intensity data were converted into the gene expression values, and the background correction was performed by the multi-array average algorithm. The LIMMA (Linear Models for Microarray Data) algo-rithm was implemented to identify relevant DEGs at baseline and after 26 weeks of supplemen-tation with a p-value < 0.05. The DAVID web server was used to identify and construct the en-riched KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways. Finally, the construction of machine learning (ML) models, including logistic regression, naïve Bayes, and deep neural networks, were implemented for the analyzed DEGs associated with the specific pathways. Results: The results revealed that up-regulated DEGs were associated with neurotrophin/MAPK signaling, whereas the down-regulated DEGs were linked to cancer, acute myeloid leukemia, and long-term depression pathways. Additionally, ML approaches were able to cluster the EPA/DHA-treated and control groups by the logistic regression performing the best. Conclusion: Overall, this study highlights the pivotal changes in DEGs induced by PUFAs and provides the rationale for the implementation of ML algorithms as predictive models for this type of biomedical data.


2019 ◽  
Vol 40 (5) ◽  
pp. 624-632
Author(s):  
Ji-Wei Chang ◽  
Yuduan Ding ◽  
Muhammad Tahir ul Qamar ◽  
Yin Shen ◽  
Junxiang Gao ◽  
...  

Abstract Prioritization of cancer-related genes from gene expression profiles and proteomic data is vital to improve the targeted therapies research. Although computational approaches have been complementing high-throughput biological experiments on the understanding of human diseases, it still remains a big challenge to accurately discover cancer-related proteins/genes via automatic learning from large-scale protein/gene expression data and protein–protein interaction data. Most of the existing methods are based on network construction combined with gene expression profiles, which ignore the diversity between normal samples and disease cell lines. In this study, we introduced a deep learning model based on a sparse auto-encoder to learn the specific characteristics of protein interactions in cancer cell lines integrated with protein expression data. The model showed learning ability to identify cancer-related proteins/genes from the input of different protein expression profiles by extracting the characteristics of protein interaction information, which could also predict cancer-related protein combinations. Comparing with other reported methods including differential expression and network-based methods, our model got the highest area under the curve value (>0.8) in predicting cancer-related genes. Our study prioritized ~500 high-confidence cancer-related genes; among these genes, 211 already known cancer drug targets were found, which supported the accuracy of our method. The above results indicated that the proposed auto-encoder model could computationally prioritize candidate proteins/genes involved in cancer and improve the targeted therapies research.


Sign in / Sign up

Export Citation Format

Share Document