scholarly journals Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science

Entropy ◽  
2021 ◽  
Vol 24 (1) ◽  
pp. 17
Author(s):  
Łukasz Huminiecki

Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel’s concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mario Zanfardino ◽  
Rossana Castaldo ◽  
Katia Pane ◽  
Ornella Affinito ◽  
Marco Aiello ◽  
...  

AbstractAnalysis of large-scale omics data along with biomedical images has gaining a huge interest in predicting phenotypic conditions towards personalized medicine. Multiple layers of investigations such as genomics, transcriptomics and proteomics, have led to high dimensionality and heterogeneity of data. Multi-omics data integration can provide meaningful contribution to early diagnosis and an accurate estimate of prognosis and treatment in cancer. Some multi-layer data structures have been developed to integrate multi-omics biological information, but none of these has been developed and evaluated to include radiomic data. We proposed to use MultiAssayExperiment (MAE) as an integrated data structure to combine multi-omics data facilitating the exploration of heterogeneous data. We improved the usability of the MAE, developing a Multi-omics Statistical Approaches (MuSA) tool that uses a Shiny graphical user interface, able to simplify the management and the analysis of radiogenomic datasets. The capabilities of MuSA were shown using public breast cancer datasets from TCGA-TCIA databases. MuSA architecture is modular and can be divided in Pre-processing and Downstream analysis. The pre-processing section allows data filtering and normalization. The downstream analysis section contains modules for data science such as correlation, clustering (i.e., heatmap) and feature selection methods. The results are dynamically shown in MuSA. MuSA tool provides an easy-to-use way to create, manage and analyze radiogenomic data. The application is specifically designed to guide no-programmer researchers through different computational steps. Integration analysis is implemented in a modular structure, making MuSA an easily expansible open-source software.


2021 ◽  
Author(s):  
Mai Adachi Nakazawa ◽  
Yoshinori Tamada ◽  
Yoshihisa Tanaka ◽  
Marie Ikeguchi ◽  
Kako Higashihara ◽  
...  

The identification of cancer subtypes is important for the understanding of tumor heterogeneity. In recent years, numerous computational methods have been proposed for this problem based on the multi-omics data of patients. It is widely accepted that different cancer subtypes are induced by different molecular regulatory networks. However, only a few incorporate the differences between their molecular systems into the classification processes. In this study, we present a novel method to classify cancer subtypes based on patient-specific molecular systems. Our method quantifies patient-specific gene networks, which are estimated from their transcriptome data. By clustering their quantified networks, our method allows for cancer subtyping, taking into consideration the differences in the molecular systems of patients. Comprehensive analyses of The Cancer Genome Atlas (TCGA) datasets applied to our method confirmed that they were able to identify more clinically meaningful cancer subtypes than the existing subtypes and found that the identified subtypes comprised different molecular features. Our findings show that the proposed method, based on a simple classification using the patient-specific molecular systems, can identify cancer subtypes even with single omics data, which cannot otherwise be captured by existing methods using multi-omics data.


2020 ◽  
Vol 3 (1) ◽  
pp. 43-59
Author(s):  
Peter M. Kasson

Infectious disease research spans scales from the molecular to the global—from specific mechanisms of pathogen drug resistance, virulence, and replication to the movement of people, animals, and pathogens around the world. All of these research areas have been impacted by the recent growth of large-scale data sources and data analytics. Some of these advances rely on data or analytic methods that are common to most biomedical data science, while others leverage the unique nature of infectious disease, namely its communicability. This review outlines major research progress in the past few years and highlights some remaining opportunities, focusing on data or methodological approaches particular to infectious disease.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S11) ◽  
Author(s):  
Tianle Ma ◽  
Aidong Zhang

Abstract Background Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. Results We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. Conclusions To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features.


2016 ◽  
Vol 1 (3-4) ◽  
pp. 177-197 ◽  
Author(s):  
Bonnie J. Dorr ◽  
Craig S. Greenberg ◽  
Peter Fontana ◽  
Mark Przybocki ◽  
Marion Le Bras ◽  
...  

2021 ◽  
Author(s):  
Jialin Meng ◽  
Xiaofan Lu ◽  
Chen Jin ◽  
Yujie Zhou ◽  
Qintao Ge ◽  
...  

Prostate cancer (PCa), the second most common male malignancy, is the fifth leading cause of cancer-related death and places notable burdens on medical resources. Most of the previous subtypes only focused on one or fewer types of data or ignored the genomic heterogeneity among PCa patients with diverse genetic backgrounds. Therefore, it is essential to precisely identify the specific molecular features and judge potential clinical outcomes based on multi-omics data. In the current study, we first identified the PCa multi-omics classification (PMOC) system based on the multi-omics, including mRNA, miRNA, lncRNA, DNA methylation, and gene mutation, using a total of ten state-of-the-art clustering algorithms. The PMOC1 subtype, also called the inflammatory subtype, contains the highest expression levels of immune checkpoint proteins, moderate activated immune-associated pathways. The PMOC2 tumor-activated subtype demonstrated the worst prognosis, which might be impacted by the activated cell cycle and DNA repair pathways, and also characterized by the most genetic alterations of mutant TP53, mutant APC and copy number alteration of 8q24.21 region. The PMOC3 subtype is likely to be a balance subtype, with the activated oncogenic signaling pathways, including hypoxia, angiogenesis, epithelial mesenchymal transition, and PI3K/AKT pathways. As well as with the activated proinflammatory pathways, including IL6/JAK/STAT3, IL2/STAT5, Notch and TNF-α signaling. Additionally, PMOC3 subtype also linked with the activation of the androgen response and the high response rate of ARSI treatment. Taken together, we defined the PMOC system for PCa patients via multi-omics data and consensus results of ten algorithms, this multi-omics consensus PCa molecular classification can further assist in the precise clinical treatment and development of targeted therapy.


2019 ◽  
Author(s):  
Soumita Ghosh ◽  
Abhik Datta ◽  
Hyungwon Choi

AbstractEmerging multi-omics experiments pose new challenges for exploration of quantitative data sets. We present multiSLIDE, a web-based interactive tool for simultaneous heatmap visualization of interconnected molecular features in multi-omics data sets. multiSLIDE operates by keyword search for visualizing biologically connected molecular features, such as genes in pathways and Gene Ontologies, offering convenient functionalities to rearrange, filter, and cluster data sets on a web browser in a real time basis. Various built-in querying mechanisms make it adaptable to diverse omics types, and visualizations are fully customizable. We demonstrate the versatility of the tool through three example studies, each of which showcases its applicability to a wide range of multi-omics data sets, ability to visualize the links between molecules at different granularities of measurement units, and the interface to incorporate inter-molecular relationship from external data sources into the visualization. Online and standalone versions of multiSLIDE are available at https://github.com/soumitag/multiSLIDE.


Sign in / Sign up

Export Citation Format

Share Document