scholarly journals Balanced Functional Module Detection in Genomic Data

2020 ◽  
Author(s):  
David Tritchler ◽  
Lorin M Towle-Miller ◽  
Jeffrey C Miecznikowski

AbstractHigh dimensional genomic data can be analyzed to understand the effects of multiple variables on a target variable such as a clinical outcome, risk factor or diagnosis. Of special interest are functional modules, cooperating sets of variables affecting the target. Graphical models of various types are often useful for characterizing such networks of variables. In other applications such as social networks, the concept of balance in undirected signed graphs characterizes the consistency of associations within the network. To extend this concept to applications where a set of predictor variables influences an outcome variable, we define balance for functional modules. This property specifies that the module variables have a joint effect on the target outcome with no internal conflict, an efficiency that evolution may use for selection in biological networks. We show that for this class of graphs, observed correlations directly reflect paths in the underlying graph. Consequences of the balance property are exploited to implement a new module discovery algorithm, bFMD, which selects a subset of variables from highdimensional data that compose a balanced functional module. Our bFMD algorithm performed favorably in simulations as compared to other module detection methods that do not consider balance properties. Additionally, bFMD detected interpretable results in a real application for RNA-seq data obtained from The Cancer Genome Atlas (TCGA) for Uterine Corpus Endometrial Carcinoma using the percentage of tumor invasion as the target outcome of interest. bFMD detects sparse sets of variables within highdimensional datasets such that interpretability may be favorable as compared to other similar methods by leveraging balance properties used in other graphical applications.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinyu Li ◽  
Wei Zhang ◽  
Jianming Zhang ◽  
Guang Li

Abstract Background Given expression data, gene regulatory network(GRN) inference approaches try to determine regulatory relations. However, current inference methods ignore the inherent topological characters of GRN to some extent, leading to structures that lack clear biological explanation. To increase the biophysical meanings of inferred networks, this study performed data-driven module detection before network inference. Gene modules were identified by decomposition-based methods. Results ICA-decomposition based module detection methods have been used to detect functional modules directly from transcriptomic data. Experiments about time-series expression, curated and scRNA-seq datasets suggested that the advantages of the proposed ModularBoost method over established methods, especially in the efficiency and accuracy. For scRNA-seq datasets, the ModularBoost method outperformed other candidate inference algorithms. Conclusions As a complicated task, GRN inference can be decomposed into several tasks of reduced complexity. Using identified gene modules as topological constraints, the initial inference problem can be accomplished by inferring intra-modular and inter-modular interactions respectively. Experimental outcomes suggest that the proposed ModularBoost method can improve the accuracy and efficiency of inference algorithms by introducing topological constraints.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 319
Author(s):  
Erin K. Wagner ◽  
Satyajeet Raje ◽  
Liz Amos ◽  
Jessica Kurata ◽  
Abhijit S. Badve ◽  
...  

Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources.


2020 ◽  
pp. 34-42
Author(s):  
Liudmyla Kryvoplias-Volodina ◽  
Oleksandr Gavva ◽  
Anastasiia Derenivska ◽  
Oleksandr Volodin

The complex of technical means for optimization synthesis of assembling of a packing machine of separate functional modules has been developed. The method of synthesis of a packing machine, based on criterial assessment of separate functional modules (FM), combined by two main assessment groups, has been offered. FM may be selected and calculated by the program of consumption, based on the overall equipment effectiveness (OEE) criterion. An example of synthesis, based on the offered method, takes into account variants of choice of ready functional modules, based on the hierarchic structure of a module of roll packing material supply. The method takes into account the systemic approach to analysis of equipment constructions for packing fine-piece and piece food products in a consumption package. The synthesis of FM assembling as conceptual models, abstract ones, reflecting the construction structure and connections between separate elements – functional devices (FD) – has been offered. The optimal assembling of the functional device in the structure of the functional module of roll packing material supply has been determined. As a result of solving this problem, a FM1 prototype has been created. At conducting the comparative analysis with the existent equipment, the automatic functional device has been modeled. The use of the OEE criterion with joint properties that reflects the generalized assessment of a packing machine or functional module with a maximin (minimax) criterion by the compromise principle has been substantiated. The analysis is grounded on the idea of optimality of each module or device of the machine for packing food products at adding each next functional module to its composition. The program of assessment calculation of the package equipment with the complex assessment criterion OEE for different assembling of FMi machines for packing piece and fine-piece products has been developed. The FM of roll film material supply with using a microprocessor managing device that maintains a sinusoidal law of movement of a stretching roll of the packing machine has been developed. Optimal characteristics of the technical system have been determined. Results, obtained at processing experimental data, confirm adequacy of the offered method for assessing assembling solutions


Author(s):  
Tiara Bunga Mayang Permata ◽  
Sri Mutya Sekarutami ◽  
Endang Nuryadi ◽  
Angela Giselvania ◽  
Soehartati Gondhowiardjo

In the current big data era, massive genomic cancer data are available for open access from anywhere in the world. They are obtained from popular platforms, such as The Cancer Genome Atlas, which provides genetic information from clinical samples, and Cancer Cell Line Encyclopedia, which offers genomic data of cancer cell lines. For convenient analysis, user-friendly tools, such as the Tumor Immune Estimation Resource (TIMER), which can be used to analyze tumor-infiltrating immune cells comprehensively, are also emerging. In clinical practice, clinical sequencing has been recommended for patients with cancer in many countries. Despite its many challenges, it enables the application of precision medicine, especially in medical oncology. In this review, several efforts devoted to accomplishing precision oncology and applying big data for use in Indonesia are discussed. Utilizing open access genomic data in writing research articles is also described.


2019 ◽  
Author(s):  
Hongzhu Cui ◽  
Suhas Srinivasan ◽  
Dmitry Korkin

AbstractProgress in high-throughput -omics technologies moves us one step closer to the datacalypse in life sciences. In spite of the already generated volumes of data, our knowledge of the molecular mechanisms underlying complex genetic diseases remains limited. Increasing evidence shows that biological networks are essential, albeit not sufficient, for the better understanding of these mechanisms. The identification of disease-specific functional modules in the human interactome can provide a more focused insight into the mechanistic nature of the disease. However, carving a disease network module from the whole interactome is a difficult task. In this paper, we propose a computational framework, DIMSUM, which enables the integration of genome-wide association studies (GWAS), functional effects of mutations, and protein-protein interaction (PPI) network to improve disease module detection. Specifically, our approach incorporates and propagates the functional impact of non-synonymous single nucleotide polymorphisms (nsSNPs) on PPIs to implicate the genes that are most likely influenced by the disruptive mutations, and to identify the module with the greatest impact. Comparison against state-of-the-art seed-based module detection methods shows that our approach could yield modules that are biologically more relevant and have stronger association with the studied disease. We expect for our method to become a part of the common toolbox for disease module analysis, facilitating discovery of new disease markers.


2019 ◽  
Vol 116 (38) ◽  
pp. 18962-18970 ◽  
Author(s):  
Sushant Kumar ◽  
Declan Clarke ◽  
Mark B. Gerstein

Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue–residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.


Sign in / Sign up

Export Citation Format

Share Document