scholarly journals overlappingCGM: Automatic detection and analysis of overlapping co-expressed gene modules

2021 ◽  
Author(s):  
Quang-Huy Nguyen ◽  
Duc-Hau Le

When it comes to the co-expressed gene module detection, its typical challenges consist of overlap between identified modules and local co-expression in a subset of biological samples. A recent study have reported that the decomposition methods are the most appropriate ones for solving these challenges. In this study, we represent a R tool, termed overlapping co-expressed gene module (overlappingCGM), which possesses those methods with a wholly automatic analysis framework to help non-technical users to easily perform complicated statistical analyses and then gain robust results. We also develop a novel auxiliary statistical approach to select the optimal number of principle components using a permutation procedure. Two example datasets are used, related to human breast cancer and mouse metabolic syndrome, to enable the illustration of the straightforward use of the tool. Computational experiment results show that overlappingCGM outperforms state-of-the-art techniques. The R scripts used in the study, including all information on the tool and its usage are made publicly available at https://github.com/huynguyen250896/overlappingCGM.

BMC Genomics ◽  
2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Quang-Huy Nguyen ◽  
Duc-Hau Le

Abstract Background When it comes to the co-expressed gene module detection, its typical challenges consist of overlap between identified modules and local co-expression in a subset of biological samples. The nature of module detection is the use of unsupervised clustering approaches and algorithms. Those methods are advanced undoubtedly, but the selection of a certain clustering method for sample- and gene-clustering tasks is separate, in which the latter task is often more complicated. Results This study presented an R-package, Overlapping CoExpressed gene Module (oCEM), armed with the decomposition methods to solve the challenges above. We also developed a novel auxiliary statistical approach to select the optimal number of principal components using a permutation procedure. We showed that oCEM outperformed state-of-the-art techniques in the ability to detect biologically relevant modules additionally. Conclusions oCEM helped non-technical users easily perform complicated statistical analyses and then gain robust results. oCEM and its applications, along with example data, were freely provided at https://github.com/huynguyen250896/oCEM.


Author(s):  
Isabelle Bichindaritz ◽  
Christopher Bartlett ◽  
Guanghui Liu

There is usually a trade-off between predictive performance and transparency, where the reasoning process behind an algorithm is shielded behind a ”black-box.” In medical domains, experts being responsible for their decisions need to understand the reasons behind machine-generated recommendations. This paper presents a transparent case-based survival analysis framework that automatically retrieves an optimal number of solved survival cases and adapts them to predict the survival of a new case. With this methodology, retrieved and adapted survival cases lend an insight into which cases a prediction is based on. Our framework is capable of integrating DNA methylation, gene expression, and their combination in breast cancer. Additionally, we test our approach with and without feature selection and demonstrate the usefulness of the adaptation phase. We demonstrate that our framework performs at least as effectively as other state-of-the-art methods while affording greater explainability.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinyu Li ◽  
Wei Zhang ◽  
Jianming Zhang ◽  
Guang Li

Abstract Background Given expression data, gene regulatory network(GRN) inference approaches try to determine regulatory relations. However, current inference methods ignore the inherent topological characters of GRN to some extent, leading to structures that lack clear biological explanation. To increase the biophysical meanings of inferred networks, this study performed data-driven module detection before network inference. Gene modules were identified by decomposition-based methods. Results ICA-decomposition based module detection methods have been used to detect functional modules directly from transcriptomic data. Experiments about time-series expression, curated and scRNA-seq datasets suggested that the advantages of the proposed ModularBoost method over established methods, especially in the efficiency and accuracy. For scRNA-seq datasets, the ModularBoost method outperformed other candidate inference algorithms. Conclusions As a complicated task, GRN inference can be decomposed into several tasks of reduced complexity. Using identified gene modules as topological constraints, the initial inference problem can be accomplished by inferring intra-modular and inter-modular interactions respectively. Experimental outcomes suggest that the proposed ModularBoost method can improve the accuracy and efficiency of inference algorithms by introducing topological constraints.


2014 ◽  
Author(s):  
Xiaoquan Wen ◽  
Francesca Luca ◽  
Roger Pique-Regi

Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at the molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent {\it cis}-eQTL signals that are consistently presented across populations, accounting for heterogeneity in allele frequencies and patterns of linkage disequilibrium. Furthermore, our analysis framework enables integrating high-resolution functional annotations into analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) joint analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTLs; ii) many genes harbor multiple independent eQTLs in their {\it cis} regions; iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10-22).


2020 ◽  
Author(s):  
Thijs Dhollander ◽  
Adam Clemente ◽  
Mervyn Singh ◽  
Frederique Boonstra ◽  
Oren Civier ◽  
...  

Diffusion MRI has provided the neuroimaging community with a powerful tool to acquire in-vivo data sensitive to microstructural features of white matter, up to 3 orders of magnitude smaller than typical voxel sizes. The key to extracting such valuable information lies in complex modelling techniques, which form the link between the rich diffusion MRI data and various metrics related to the microstructural organisation. Over time, increasingly advanced techniques have been developed, up to the point where some diffusion MRI models can now provide access to properties specific to individual fibre populations in each voxel in the presence of multiple "crossing" fibre pathways. While highly valuable, such fibre-specific information poses unique challenges for typical image processing pipelines and statistical analysis. In this work, we review the "fixel-based analysis" (FBA) framework that implements bespoke solutions to this end, and has recently seen a stark increase in adoption for studies of both typical (healthy) populations as well as a wide range of clinical populations. We describe the main concepts related to fixel-based analyses, as well as the methods and specific steps involved in a state-of-the-art FBA pipeline, with a focus on providing researchers with practical advice on how to interpret results. We also include an overview of the scope of current fixel-based analysis studies (until August 2020), categorised across a broad range of neuroscientific domains, listing key design choices and summarising their main results and conclusions. Finally, we critically discuss several aspects and challenges involved with the fixel-based analysis framework, and outline some directions and future opportunities.


1985 ◽  
Vol 31 (1) ◽  
pp. 106-108 ◽  
Author(s):  
S S Ehrmeyer ◽  
R H Laessig

Abstract We report a new technique for realistic assessment of laboratory performance as measured by proficiency testing. Interlaboratory results accumulated from 129 participants during 18 months provide the baseline data from which we established "state-of-the-art" performance criteria for three ranges of pH, pCO2, and pO2. By concurrent use of two statistical measurement techniques, the cumulative percentile rank and the algebraic and absolute mean error, laboratories can accurately evaluate their performance in terms of acceptable state-of-the-art criteria, total error, or medical usefulness. The approach facilitates assessment of the nature of the errors that have led to inferior performance and identification of probable areas where improvement is possible. If criteria based on regulatory standards or medical usefulness goals are included, the system can provide a basis for licensure or professional quality improvement.


2020 ◽  
Author(s):  
John Lee ◽  
Manthan Shah ◽  
Sara Ballouz ◽  
Megan Crow ◽  
Jesse Gillis

ABSTRACTCo-expression analysis has provided insight into gene function in organisms from Arabidopsis to Zebrafish. Comparison across species has the potential to enrich these results, for example by prioritizing among candidate human disease genes based on their network properties, or by finding alternative model systems where their co-expression is conserved. Here, we present CoCoCoNet as a tool for identifying conserved gene modules and comparing co-expression networks. CoCoCoNet is a resource for both data and methods, providing gold-standard networks and sophisticated tools for on-the-fly comparative analyses across 14 species. We show how CoCoCoNet can be used in two use cases. In the first, we demonstrate deep conservation of a nucleolus gene module across very divergent organisms, and in the second, we show how the heterogeneity of autism mechanisms in humans can be broken down by functional groups, and translated to model organisms. CoCoCoNet is free to use and available to all at https://milton.cshl.edu/CoCoCoNet, with data and R scripts available at ftp://milton.cshl.edu/data.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10594
Author(s):  
Qian Zhao ◽  
Yan Zhang ◽  
Shichun Shao ◽  
Yeqing Sun ◽  
Zhengkui Lin

Background Hepatocellular carcinoma (HCC), the main type of liver cancer in human, is one of the most prevalent and deadly malignancies in the world. The present study aimed to identify hub genes and key biological pathways by integrated bioinformatics analysis. Methods A bioinformatics pipeline based on gene co-expression network (GCN) analysis was built to analyze the gene expression profile of HCC. Firstly, differentially expressed genes (DEGs) were identified and a GCN was constructed with Pearson correlation analysis. Then, the gene modules were identified with 3 different community detection algorithms, and the correlation analysis between gene modules and clinical indicators was performed. Moreover, we used the Search Tool for the Retrieval of Interacting Genes (STRING) database to construct a protein protein interaction (PPI) network of the key gene module, and we identified the hub genes using nine topology analysis algorithms based on this PPI network. Further, we used the Oncomine analysis, survival analysis, GEO data set and random forest algorithm to verify the important roles of hub genes in HCC. Lastly, we explored the methylation changes of hub genes using another GEO data (GSE73003). Results Firstly, among the expression profiles, 4,130 up-regulated genes and 471 down-regulated genes were identified. Next, the multi-level algorithm which had the highest modularity divided the GCN into nine gene modules. Also, a key gene module (m1) was identified. The biological processes of GO enrichment of m1 mainly included the processes of mitosis and meiosis and the functions of catalytic and exodeoxyribonuclease activity. Besides, these genes were enriched in the cell cycle and mitotic pathway. Furthermore, we identified 11 hub genes, MCM3, TRMT6, AURKA, CDC20, TOP2A, ECT2, TK1, MCM2, FEN1, NCAPD2 and KPNA2 which played key roles in HCC. The results of multiple verification methods indicated that the 11 hub genes had highly diagnostic efficiencies to distinguish tumors from normal tissues. Lastly, the methylation changes of gene CDC20, TOP2A, TK1, FEN1 in HCC samples had statistical significance (P-value < 0.05). Conclusion MCM3, TRMT6, AURKA, CDC20, TOP2A, ECT2, TK1, MCM2, FEN1, NCAPD2 and KPNA2 could be potential biomarkers or therapeutic targets for HCC. Meanwhile, the metabolic pathway, the cell cycle and mitotic pathway might played vital roles in the progression of HCC.


2020 ◽  
pp. 1-9
Author(s):  
Guanghui Wang ◽  
Fenglong Bie ◽  
Guangxu Li ◽  
Junping Shi ◽  
Yanwu Zeng ◽  
...  

BACKGROUND: Metastasis regularly is a marker of the disease development of cancers. Some metastatic sites significantly showed more serious clinical outcomes in non-small cell lung cancer (NSCLC). Whether they are caused by tissue-specific (TS) or non-tissue-specific (NTS) mechanisms is still unclear. OBJECTIVE: Explore co-expression gene modules of non-small cell lung cancer metastases. METHODS: Weighted Correlation Network Analysis (WGCNA) was used to identify the gene modules among the metastases of NSCLC. The clinical significance of those gene modules was evaluated with the Cox hazard proportional model with another independent dataset. Functions of each gene module were analyzed with gene ontology. Typical genes were further studied. RESULTS: There were two TS gene modules and two NTS gene modules identified. One TS gene module (green module) and one NTS gene module (purple module) significantly correlated with survival. This NTS gene module (purple module) was significantly enriched in the epithelial-to-mesenchymal transition (EMT) process. Higher expression of the typical genes (CA14, SOX10, TWIST1, and ALX1) from EMT process was significantly associated with a worse survival. CONCLUSION: The lethality of NSCLC metastases was caused by TS gene modules and NTS gene modules, among which the EMT-related gene module was critical for a worse clinical outcome.


Sign in / Sign up

Export Citation Format

Share Document