scholarly journals Simpati: patient classifier identifies signature pathways based on similarity networks for the disease prediction

2021 ◽  
Author(s):  
Luca Giudice

ABSTRACTBACKGROUNDPathway-based patient classification is a supervised learning task which supports the decision-making process of human experts in biomedical applications providing signature pathways associated to a patient class characterized by a specific clinical outcome. The task can potentially include to simulate the human way of thinking in predicting patients by pathways, decipher hidden multivariate relationships between the characteristics of patient class and provide more information than a probability value. However, these classifiers are rarely integrated into a routine bioinformatics analysis of high-dimensional biological data because they require a nontrivial hyper-parameter tuning, are difficult to interpret and lack in providing new insights. There is the need of new classifiers which can provide novel perspectives about pathways, be easy to apply with different biological omics and produce new data enabling a further analysis of the patients.RESULTSWe propose Simpati, a pathway-based patient classifier which combines the concepts of network-based propagation, patient similarity network, cohesive subgroup detection and pathway enrichment. It exploits a propagation algorithm to classify both dense, sparse, and non-homogenous data. It handles patient’s features (e.g. genes, proteins, mutations) organizing them in pathways represented by patient similarity networks for being interpretable, handling missing data and preserving the patient privacy. A network represents patients as nodes and a novel similarity determines how much every pair act co-ordinately in a pathway. Simpati detects signature biological processes based on how much the topological properties of the related networks discriminate the patient classes. In this step, it includes a novel cohesive subgroup detection algorithm to handle patients not showing the same pathway activity as the other class members. An unknown patient is classified based on how much is similar with known ones. Simpati outperforms state-of-art classifiers on five cancer datasets, classifies well sparse data and provides a novel concept of enrichment which calls pathways as up or down involved with respect the overall patient’s biology.CONCLUSIONSimpati can serve as interpretable accurate pathway-based patient classifier to discover novel signature pathways driving a clinical class, to detect biomarkers and to get insights about how patients are similar based on their regulation of biological processes. The biomarker detection is made possible with the propagation score, likelihood of association between the patient’s feature and outcome, and with the deconvolution of the single feature’s contributions in the patient similarities. The pathway enrichment is enhanced with the integration of the Disgnet and the Human Protein Atlas databases. We provide an R implementation which enables to start Simpati with one function, a GUI interface for the navigation of the patient’s propagated profiles and a function which offers an ad-hoc visualization of patient similarity networks. The software is available at: https://github.com/LucaGiudice/Simpati

2021 ◽  
Author(s):  
Andrew J Kavran ◽  
Aaron Clauset

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “filtered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data.Conclusions: Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology.


2018 ◽  
Vol 2018 ◽  
pp. 1-20 ◽  
Author(s):  
Zhenyan Song ◽  
Fang Yin ◽  
Biao Xiang ◽  
Bin Lan ◽  
Shaowu Cheng

In traditional Chinese medicine (TCM), Acori Tatarinowii Rhizoma (ATR) is widely used to treat memory and cognition dysfunction. This study aimed to confirm evidence regarding the potential therapeutic effect of ATR on Alzheimer’s disease (AD) using a system network level based in silico approach. Study results showed that the compounds in ATR are highly connected to AD-related signaling pathways, biological processes, and organs. These findings were confirmed by compound-target network, target-organ location network, gene ontology analysis, and KEGG pathway enrichment analysis. Most compounds in ATR have been reported to have antifibrillar amyloid plaques, anti-tau phosphorylation, and anti-inflammatory effects. Our results indicated that compounds in ATR interact with multiple targets in a synergetic way. Furthermore, the mRNA expressions of genes targeted by ATR are elevated significantly in heart, brain, and liver. Our results suggest that the anti-inflammatory and immune system enhancing effects of ATR might contribute to its major therapeutic effects on Alzheimer’s disease.


Author(s):  
Sourabh Parmar

Researchers use transcriptomics analyses for biological data mining, interpretation, and presentation. Galaxy-based tools are utilized to analyze various complex disease transcriptomic data to understand the pathogenesis of the disease, which are user-friendly. This work provides simple methods for differential expression analysis and analysis of these results in gene ontology and pathway enrichment tools like David, WebGestalt. This method is very effective in better analysis and understanding the transcriptomic data. Transcriptomics analysis has been made on rheumatoid arthritis sra data. Rheumatoid arthritis (RA) is a systemic autoimmune disease. T cells and autoantibodies mediate the pathogenesis. This article discusses the genes which are differentially expressed between the healthy (n=50) and diseased (n=51) and the functions of those genes in the pathogenesis of RA.


Author(s):  
José Caldas ◽  
Samuel Kaski

Biclustering is the unsupervised learning task of mining a data matrix for useful submatrices, for instance groups of genes that are co-expressed under particular biological conditions. As these submatrices are expected to partly overlap, a significant challenge in biclustering is to develop methods that are able to detect overlapping biclusters. The authors propose a probabilistic mixture modelling framework for biclustering biological data that lends itself to various data types and allows biclusters to overlap. Their framework is akin to the latent feature and mixture-of-experts model families, with inference and parameter estimation being performed via a variational expectation-maximization algorithm. The model compares favorably with competing approaches, both in a binary DNA copy number variation data set and in a miRNA expression data set, indicating that it may potentially be used as a general-problem solving tool in biclustering.


Author(s):  
Andrea Maffezzoli ◽  
Marco Masseroli

In the area of medical informatics, the recent ICT (information and communication technology) tools and systems supporting knowledge on sciences involved in the study of genes, chromosomes, and protein’s expression level in various organisms, that is genomics and proteomics, are becoming necessary to develop new prospects for the comprehension of mechanisms lying at the base of biological processes which cause a disease. This can allow more effective diagnostic and treatment methods and also personalized pharmacological therapies. At this purpose, the mutual intervention of different sciences, such as biology, medicine, engineering, informatics and mathematics, becomes an indispensable step: The development of a science embracing all these fields is identified in bioinformatics, which was conceived for the analysis, storage and processing of huge amount of biological data. The achievement of all the aforementioned operations involves the creation of the so-called genomic or proteomic databanks, which represent a major source of information on nucleotide sequences, as well as biological, clinical, physiological and bibliographical annotations related to singular sequences. There are different types of databanks based on their peculiar characteristics and features (such as primary and derivative or specialized databanks), and several ways to access data stored in these databanks; there are also specific bioinformatics databank-based tools developed to perform searching operations and to extract significant information, in order to summarize and compare gene annotations related to the causes of a disease and finally to identify a list of the most significant genes as cause of disease.


2020 ◽  
Vol 14 ◽  
pp. 117793222090616
Author(s):  
Badreddine Nouadi ◽  
Yousra Sbaoui ◽  
Mariame El Messal ◽  
Faiza Bennis ◽  
Fatima Chegdani

Nowadays, the integration of biological data is a major challenge for bioinformatics. Many studies have examined gene expression in the epithelial tissue in the intestines of infants born to term and breastfed, generating a large amount of data. The integration of these data is important to understand the biological processes involved during bacterial colonization of the newborns intestine, particularly through breast milk. This work aims to exploit the bioinformatics approaches, to provide a new representation and interpretation of the interactions between differentially expressed genes in the host intestine induced by the microbiota.


2020 ◽  
Author(s):  
Abel Szkalisity ◽  
Filippo Piccinini ◽  
Attila Beleon ◽  
Tamas Balassa ◽  
Istvan Gergely Varga ◽  
...  

ABSTRACTBiological processes are inherently continuous, and the chance of phenotypic discovery is significantly restricted by discretising them. Using multi-parametric active regression we introduce a novel concept to describe and explore biological data in a continuous manner. We have implemented Regression Plane (RP), the first user-friendly discovery tool enabling class-free phenotypic supervised machine learning.


2021 ◽  
Author(s):  
By Huan Chen ◽  
Brian Caffo ◽  
Genevieve Stein-O’Brien ◽  
Jinrui Liu ◽  
Ben Langmead ◽  
...  

SummaryIntegrative analysis of multiple data sets has the potential of fully leveraging the vast amount of high throughput biological data being generated. In particular such analysis will be powerful in making inference from publicly available collections of genetic, transcriptomic and epigenetic data sets which are designed to study shared biological processes, but which vary in their target measurements, biological variation, unwanted noise, and batch variation. Thus, methods that enable the joint analysis of multiple data sets are needed to gain insights into shared biological processes that would otherwise be hidden by unwanted intra-data set variation. Here, we propose a method called two-stage linked component analysis (2s-LCA) to jointly decompose multiple biologically related experimental data sets with biological and technological relationships that can be structured into the decomposition. The consistency of the proposed method is established and its empirical performance is evaluated via simulation studies. We apply 2s-LCA to jointly analyze four data sets focused on human brain development and identify meaningful patterns of gene expression in human neurogenesis that have shared structure across these data sets. The code to conduct 2s-LCA has been complied into an R package “PJD”, which is available at https://github.com/CHuanSite/PJD.


Entropy ◽  
2020 ◽  
Vol 22 (4) ◽  
pp. 425
Author(s):  
Zejun Sun ◽  
Jinfang Sheng ◽  
Bin Wang ◽  
Aman Ullah ◽  
FaizaRiaz Khawaja

Identifying communities in dynamic networks is essential for exploring the latent network structures, understanding network functions, predicting network evolution, and discovering abnormal network events. Many dynamic community detection methods have been proposed from different viewpoints. However, identifying the community structure in dynamic networks is very challenging due to the difficulty of parameter tuning, high time complexity and detection accuracy decreasing as time slices increase. In this paper, we present a dynamic community detection framework based on information dynamics and develop a dynamic community detection algorithm called DCDID (dynamic community detection based on information dynamics), which uses a batch processing technique to incrementally uncover communities in dynamic networks. DCDID employs the information dynamics model to simulate the exchange of information among nodes and aims to improve the efficiency of community detection by filtering out the unchanged subgraph. To illustrate the effectiveness of DCDID, we extensively test it on synthetic and real-world dynamic networks, and the results demonstrate that the DCDID algorithm is superior to the representative methods in relation to the quality of dynamic community detection.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Yujie Zhu ◽  
Yuxin Lin ◽  
Wenying Yan ◽  
Zhandong Sun ◽  
Zhi Jiang ◽  
...  

Acute coronary syndrome (ACS) is a life-threatening disease that affects more than half a million people in United States. We currently lack molecular biomarkers to distinguish the unstable angina (UA) and acute myocardial infarction (AMI), which are the two subtypes of ACS. MicroRNAs play significant roles in biological processes and serve as good candidates for biomarkers. In this work, we collected microRNA datasets from the Gene Expression Omnibus database and identified specific microRNAs in different subtypes and universal microRNAs in all subtypes based on our novel network-based bioinformatics approach. These microRNAs were studied for ACS association by pathway enrichment analysis of their target genes. AMI and UA were associated with 27 and 26 microRNAs, respectively, nine of them were detected for both AMI and UA, and five from each subtype had been reported previously. The remaining 22 and 21 microRNAs are novel microRNA biomarkers for AMI and UA, respectively. The findings are then supported by pathway enrichment analysis of the targets of these microRNAs. These novel microRNAs deserve further validation and will be helpful for personalized ACS diagnosis.


Sign in / Sign up

Export Citation Format

Share Document