scholarly journals A network of networks approach for modeling interconnected brain tissue-specific networks

2019 ◽  
Vol 35 (17) ◽  
pp. 3092-3101 ◽  
Author(s):  
Hideko Kawakubo ◽  
Yusuke Matsui ◽  
Itaru Kushima ◽  
Norio Ozaki ◽  
Teppei Shimamura

Abstract Motivation Recent sequence-based analyses have identified a lot of gene variants that may contribute to neurogenetic disorders such as autism spectrum disorder and schizophrenia. Several state-of-the-art network-based analyses have been proposed for mechanical understanding of genetic variants in neurogenetic disorders. However, these methods were mainly designed for modeling and analyzing single networks that do not interact with or depend on other networks, and thus cannot capture the properties between interdependent systems in brain-specific tissues, circuits and regions which are connected each other and affect behavior and cognitive processes. Results We introduce a novel and efficient framework, called a ‘Network of Networks’ approach, to infer the interconnectivity structure between multiple networks where the response and the predictor variables are topological information matrices of given networks. We also propose Graph-Oriented SParsE Learning, a new sparse structural learning algorithm for network data to identify a subset of the topological information matrices of the predictors related to the response. We demonstrate on simulated data that propose Graph-Oriented SParsE Learning outperforms existing kernel-based algorithms in terms of F-measure. On real data from human brain region-specific functional networks associated with the autism risk genes, we show that the ‘Network of Networks’ model provides insights on the autism-associated interconnectivity structure between functional interaction networks and a comprehensive understanding of the genetic basis of autism across diverse regions of the brain. Availability and implementation Our software is available from https://github.com/infinite-point/GOSPEL. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Hideko Kawakubo ◽  
Yusuke Matsui ◽  
Itaru Kushima ◽  
Norio Ozaki ◽  
Teppei Shimamura

AbstractMotivationRecent sequence-based analyses have identified a lot of gene variants that may contribute to neurogenetic disorders such as autism spectrum disorder and schizophrenia. Several state-of-the-art network-based analyses have been proposed for mechanical understanding of genetic variants in neurogenetic disorders. However, these methods were mainly designed for modeling and analyzing single networks that do not interact with or depend on other networks, and thus cannot capture the properties between interdependent systems in brain-specific tissues, circuits, and regions which are connected each other and affect behavior and cognitive processes.ResultsWe introduce a novel and efficient framework, called a “Network of Networks” (NoN) approach, to infer the interconnectivity structure between multiple networks where the response and the predictor variables are topological information matrices of given networks. We also propose Graph-Oriented SParsE Learning (GOSPEL), a new sparse structural learning algorithm for network graph data to identify a subset of the topological information matrices of the predictors related to the response. We demonstrate on simulated data that GOSPEL outperforms existing kernel-based algorithms in terms of F-measure. On real data from human brain region-specific functional networks associated with the autism risk genes, we show that the NoN model provides insights on the autism-associated interconnectivity structure between functional interaction networks and a comprehensive understanding of the genetic basis of autism across diverse regions of the brain.AvailabilityOur software is available from https://github.com/infinite-point/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 384
Author(s):  
Rocío Hernández-Sanjaime ◽  
Martín González ◽  
Antonio Peñalver ◽  
Jose J. López-Espín

The presence of unaccounted heterogeneity in simultaneous equation models (SEMs) is frequently problematic in many real-life applications. Under the usual assumption of homogeneity, the model can be seriously misspecified, and it can potentially induce an important bias in the parameter estimates. This paper focuses on SEMs in which data are heterogeneous and tend to form clustering structures in the endogenous-variable dataset. Because the identification of different clusters is not straightforward, a two-step strategy that first forms groups among the endogenous observations and then uses the standard simultaneous equation scheme is provided. Methodologically, the proposed approach is based on a variational Bayes learning algorithm and does not need to be executed for varying numbers of groups in order to identify the one that adequately fits the data. We describe the statistical theory, evaluate the performance of the suggested algorithm by using simulated data, and apply the two-step method to a macroeconomic problem.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i831-i839
Author(s):  
Dong-gi Lee ◽  
Myungjun Kim ◽  
Sang Joon Son ◽  
Chang Hyung Hong ◽  
Hyunjung Shin

Abstract Motivation Recently, various approaches for diagnosing and treating dementia have received significant attention, especially in identifying key genes that are crucial for dementia. If the mutations of such key genes could be tracked, it would be possible to predict the time of onset of dementia and significantly aid in developing drugs to treat dementia. However, gene finding involves tremendous cost, time and effort. To alleviate these problems, research on utilizing computational biology to decrease the search space of candidate genes is actively conducted. In this study, we propose a framework in which diseases, genes and single-nucleotide polymorphisms are represented by a layered network, and key genes are predicted by a machine learning algorithm. The algorithm utilizes a network-based semi-supervised learning model that can be applied to layered data structures. Results The proposed method was applied to a dataset extracted from public databases related to diseases and genes with data collected from 186 patients. A portion of key genes obtained using the proposed method was verified in silico through PubMed literature, and the remaining genes were left as possible candidate genes. Availability and implementation The code for the framework will be available at http://www.alphaminers.net/. Supplementary information Supplementary data are available at Bioinformatics online.


Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 949
Author(s):  
Jiangyi Wang ◽  
Min Liu ◽  
Xinwu Zeng ◽  
Xiaoqiang Hua

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.


Author(s):  
Yang Xu ◽  
Priyojit Das ◽  
Rachel Patton McCord

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.


2021 ◽  
Author(s):  
Yuen Ler Chow ◽  
Shantanu Singh ◽  
Anne E Carpenter ◽  
Gregory P. Way

A variational autoencoder (VAE) is a machine learning algorithm, useful for generating a compressed and interpretable latent space. These representations have been generated from various biomedical data types and can be used to produce realistic-looking simulated data. However, standard vanilla VAEs suffer from entangled and uninformative latent spaces, which can be mitigated using other types of VAEs such as β-VAE and MMD-VAE. In this project, we evaluated the ability of VAEs to learn cell morphology characteristics derived from cell images. We trained and evaluated these three VAE variants-Vanilla VAE, β-VAE, and MMD-VAE-on cell morphology readouts and explored the generative capacity of each model to predict compound polypharmacology (the interactions of a drug with more than one target) using an approach called latent space arithmetic (LSA). To test the generalizability of the strategy, we also trained these VAEs using gene expression data of the same compound perturbations and found that gene expression provides complementary information. We found that the β-VAE and MMD-VAE disentangle morphology signals and reveal a more interpretable latent space. We reliably simulated morphology and gene expression readouts from certain compounds thereby predicting cell states perturbed with compounds of known polypharmacology. Inferring cell state for specific drug mechanisms could aid researchers in developing and identifying targeted therapeutics and categorizing off-target effects in the future.


Author(s):  
Alexandre Todorov

The aim of the RELIEF algorithm is to filter out features (e.g., genes, environmental factors) that are relevant to a trait of interest, starting from a set of that may include thousands of irrelevant features. Though widely used in many fields, its application to the study of gene-environment interaction studies has been limited thus far. We provide here an overview of this machine learning algorithm and some of its variants. Using simulated data, we then compare of the performance of RELIEF to that of logistic regression for screening for gene-environment interactions in SNP data. Even though performance degrades in larger sets of markers, RELIEF remains a competitive alternative to logistic regression, and shows clear promise as a tool for the study of gene-environment interactions. Areas for further improvements of the algorithm are then suggested.


2020 ◽  
Vol 36 (19) ◽  
pp. 4894-4901 ◽  
Author(s):  
Yi Shi ◽  
Zehua Guo ◽  
Xianbin Su ◽  
Luming Meng ◽  
Mingxuan Zhang ◽  
...  

Abstract Motivation The mutations of cancers can encode the seeds of their own destruction, in the form of T-cell recognizable immunogenic peptides, also known as neoantigens. It is computationally challenging, however, to accurately prioritize the potential neoantigen candidates according to their ability of activating the T-cell immunoresponse, especially when the somatic mutations are abundant. Although a few neoantigen prioritization methods have been proposed to address this issue, advanced machine learning model that is specifically designed to tackle this problem is still lacking. Moreover, none of the existing methods considers the original DNA loci of the neoantigens in the perspective of 3D genome which may provide key information for inferring neoantigens’ immunogenicity. Results In this study, we discovered that DNA loci of the immunopositive and immunonegative MHC-I neoantigens have distinct spatial distribution patterns across the genome. We therefore used the 3D genome information along with an ensemble pMHC-I coding strategy, and developed a group feature selection-based deep sparse neural network model (DNN-GFS) that is optimized for neoantigen prioritization. DNN-GFS demonstrated increased neoantigen prioritization power comparing to existing sequence-based approaches. We also developed a webserver named deepAntigen (http://yishi.sjtu.edu.cn/deepAntigen) that implements the DNN-GFS as well as other machine learning methods. We believe that this work provides a new perspective toward more accurate neoantigen prediction which eventually contribute to personalized cancer immunotherapy. Availability and implementation Data and implementation are available on webserver: http://yishi.sjtu.edu.cn/deepAntigen. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (18) ◽  
pp. 3279-3286 ◽  
Author(s):  
Enrico Siragusa ◽  
Niina Haiminen ◽  
Richard Finkers ◽  
Richard Visser ◽  
Laxmi Parida

Abstract Summary Haplotype assembly of polyploids is an open issue in plant genomics. Recent experimental studies on highly heterozygous autotetraploid potato have shown that available methods do not deliver satisfying results in practice. We propose an optimal method to assemble haplotypes of highly heterozygous polyploids from Illumina short-sequencing reads. Our method is based on a generalization of the existing minimum fragment removal model to the polyploid case and on new integer linear programs to reconstruct optimal haplotypes. We validate our methods experimentally by means of a combined evaluation on simulated and experimental data based on 83 previously sequenced autotetraploid potato cultivars. Results on simulated data show that our methods produce highly accurate haplotype assemblies, while results on experimental data confirm a sensible improvement over the state of the art. Availability and implementation Executables for Linux at http://github.com/Computational Genomics/HaplotypeAssembler. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Wilson McKerrow ◽  
David Fenyö

Abstract Motivation LINE-1 elements are retrotransposons that are capable of copying their sequence to new genomic loci. LINE-1 derepression is associated with a number of disease states, and has the potential to cause significant cellular damage. Because LINE-1 elements are repetitive, it is difficult to quantify LINE-1 RNA at specific loci and to separate transcripts with protein coding capability from other sources of LINE-1 RNA. Results We provide a tool, L1EM that uses the expectation maximization algorithm to quantify LINE-1 RNA at each genomic locus, separating transcripts that are capable of generating retrotransposition from those that are not. We show the accuracy of L1EM on simulated data and against long read sequencing from HEK cells. Availability and implementation L1EM is written in python. The source code along with the necessary annotations are available at https://github.com/FenyoLab/L1EM and distributed under GPLv3. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document