scholarly journals IsoResolve: predicting splice isoform functions by integrating gene and isoform-level features with domain adaptation

Author(s):  
Hong-Dong Li ◽  
Changhuo Yang ◽  
Zhimin Zhang ◽  
Mengyun Yang ◽  
Fang-Xiang Wu ◽  
...  

Abstract Motivation High resolution annotation of gene functions is a central goal in functional genomics. A single gene may produce multiple isoforms with different functions through alternative splicing. Conventional approaches, however, consider a gene as a single entity without differentiating these functionally different isoforms. Towards understanding gene functions at higher resolution, recent efforts have focused on predicting the functions of isoforms. However, the performance of existing methods is far from satisfactory mainly because of the lack of isoform-level functional annotation. Results We present IsoResolve, a novel approach for isoform function prediction, which leverages the information from gene function prediction models with domain adaptation (DA). IsoResolve treats gene-level and isoform-level features as source and target domains, respectively. It uses DA to project the two domains into a latent variable space in such a way that the latent variables from the two domains have similar distribution, which enables the gene domain information to be leveraged for isoform function prediction. We systematically evaluated the performance of IsoResolve in predicting functions. Compared with five state-of-the-art methods, IsoResolve achieved significantly better performance. IsoResolve was further validated by case studies of genes with isoform-level functional annotation. Availability and implementation IsoResolve is freely available at https://github.com/genemine/IsoResolve. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Vol 35 (15) ◽  
pp. 2535-2544 ◽  
Author(s):  
Dipan Shaw ◽  
Hao Chen ◽  
Tao Jiang

AbstractMotivationIsoforms are mRNAs produced from the same gene locus by alternative splicing and may have different functions. Although gene functions have been studied extensively, little is known about the specific functions of isoforms. Recently, some computational approaches based on multiple instance learning have been proposed to predict isoform functions from annotated gene functions and expression data, but their performance is far from being desirable primarily due to the lack of labeled training data. To improve the performance on this problem, we propose a novel deep learning method, DeepIsoFun, that combines multiple instance learning with domain adaptation. The latter technique helps to transfer the knowledge of gene functions to the prediction of isoform functions and provides additional labeled training data. Our model is trained on a deep neural network architecture so that it can adapt to different expression distributions associated with different gene ontology terms.ResultsWe evaluated the performance of DeepIsoFun on three expression datasets of human and mouse collected from SRA studies at different times. On each dataset, DeepIsoFun performed significantly better than the existing methods. In terms of area under the receiver operating characteristics curve, our method acquired at least 26% improvement and in terms of area under the precision-recall curve, it acquired at least 10% improvement over the state-of-the-art methods. In addition, we also study the divergence of the functions predicted by our method for isoforms from the same gene and the overall correlation between expression similarity and the similarity of predicted functions.Availability and implementationhttps://github.com/dls03/DeepIsoFun/Supplementary informationSupplementary data are available at Bioinformatics online.


Genes ◽  
2020 ◽  
Vol 11 (11) ◽  
pp. 1264
Author(s):  
Stavros Makrodimitris ◽  
Roeland C. H. J. van Ham ◽  
Marcel J. T. Reinders

The current rate at which new DNA and protein sequences are being generated is too fast to experimentally discover the functions of those sequences, emphasizing the need for accurate Automatic Function Prediction (AFP) methods. AFP has been an active and growing research field for decades and has made considerable progress in that time. However, it is certainly not solved. In this paper, we describe challenges that the AFP field still has to overcome in the future to increase its applicability. The challenges we consider are how to: (1) include condition-specific functional annotation, (2) predict functions for non-model species, (3) include new informative data sources, (4) deal with the biases of Gene Ontology (GO) annotations, and (5) maximally exploit the GO to obtain performance gains. We also provide recommendations for addressing those challenges, by adapting (1) the way we represent proteins and genes, (2) the way we represent gene functions, and (3) the algorithms that perform the prediction from gene to function. Together, we show that AFP is still a vibrant research area that can benefit from continuing advances in machine learning with which AFP in the 2020s can again take a large step forward reinforcing the power of computational biology.


Author(s):  
Amelia Villegas-Morcillo ◽  
Stavros Makrodimitris ◽  
Roeland C H J van Ham ◽  
Angel M Gomez ◽  
Victoria Sanchez ◽  
...  

Abstract Motivation Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. Availability and implementation Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Meet Barot ◽  
Vladimir Gligorijević ◽  
Kyunghyun Cho ◽  
Richard Bonneau

Abstract Motivation Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks. Results In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method, and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism’s PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance. Availability The code is freely available at https://github.com/nowittynamesleft/NetQuilt Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Amelia Villegas-Morcillo ◽  
Stavros Makrodimitris ◽  
Roeland C.H.J. van Ham ◽  
Angel M. Gomez ◽  
Victoria Sanchez ◽  
...  

AbstractMotivationProtein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available.ResultsWe applied an existing deep sequence model that had been pre-trained in an unsupervised setting on the supervised task of protein function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for deep prediction models, as a two-layer perceptron was enough to achieve state-of-the-art performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that three-dimensional structure is also potentially learned during the unsupervised pre-training.AvailabilityImplementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function.Contactameliavm@ugr.esSupplementary informationSupplementary data are available online.


Methodology ◽  
2011 ◽  
Vol 7 (4) ◽  
pp. 157-164
Author(s):  
Karl Schweizer

Probability-based and measurement-related hypotheses for confirmatory factor analysis of repeated-measures data are investigated. Such hypotheses comprise precise assumptions concerning the relationships among the true components associated with the levels of the design or the items of the measure. Measurement-related hypotheses concentrate on the assumed processes, as, for example, transformation and memory processes, and represent treatment-dependent differences in processing. In contrast, probability-based hypotheses provide the opportunity to consider probabilities as outcome predictions that summarize the effects of various influences. The prediction of performance guided by inexact cues serves as an example. In the empirical part of this paper probability-based and measurement-related hypotheses are applied to working-memory data. Latent variables according to both hypotheses contribute to a good model fit. The best model fit is achieved for the model including latent variables that represented serial cognitive processing and performance according to inexact cues in combination with a latent variable for subsidiary processes.


2019 ◽  
Author(s):  
Kevin Constante ◽  
Edward Huntley ◽  
Emma Schillinger ◽  
Christine Wagner ◽  
Daniel Keating

Background: Although family behaviors are known to be important for buffering youth against substance use, research in this area often evaluates a particular type of family interaction and how it shapes adolescents’ behaviors, when it is likely that youth experience the co-occurrence of multiple types of family behaviors that may be protective. Methods: The current study (N = 1716, 10th and 12th graders, 55% female) examined associations between protective family context, a latent variable comprised of five different measures of family behaviors, and past 12 months substance use: alcohol, cigarettes, marijuana, and e-cigarettes. Results: A multi-group measurement invariance assessment supported protective family context as a coherent latent construct with partial (metric) measurement invariance among Black, Latinx, and White youth. A multi-group path model indicated that protective family context was significantly associated with less substance use for all youth, but of varying magnitudes across ethnic-racial groups. Conclusion: These results emphasize the importance of evaluating psychometric properties of family-relevant latent variables on the basis of group membership in order to draw appropriate inferences on how such family variables relate to substance use among diverse samples.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1223
Author(s):  
Ilianna Kollia ◽  
Jack Stevenson ◽  
Stefanos Kollias

This paper provides a review of an emerging field in the food processing sector, referring to efficient and safe food supply chains, ’from farm to fork’, as enabled by Artificial Intelligence (AI). The field is of great significance from economic, food safety and public health points of views. The paper focuses on effective food production, food maintenance energy management and food retail packaging labeling control, using recent advances in machine learning. Appropriate deep neural architectures are adopted and used for this purpose, including Fully Convolutional Networks, Long Short-Term Memories and Recurrent Neural Networks, Auto-Encoders and Attention mechanisms, Latent Variable extraction and clustering, as well as Domain Adaptation. Three experimental studies are presented, illustrating the ability of these AI methodologies to produce state-of-the-art performance in the whole food supply chain. In particular, these concern: (i) predicting plant growth and tomato yield in greenhouses, thus matching food production to market needs and reducing food waste or food unavailability; (ii) optimizing energy consumption across large networks of food retail refrigeration systems, through optimal selection of systems that can be shut-down and through prediction of the respective food de-freezing times, during peaks of power demand load; (iii) optical recognition and verification of food consumption expiry date in automatic inspection of retail packaged food, thus ensuring safety of food and people’s health.


2021 ◽  
Vol 13 (2) ◽  
pp. 51
Author(s):  
Lili Sun ◽  
Xueyan Liu ◽  
Min Zhao ◽  
Bo Yang

Variational graph autoencoder, which can encode structural information and attribute information in the graph into low-dimensional representations, has become a powerful method for studying graph-structured data. However, most existing methods based on variational (graph) autoencoder assume that the prior of latent variables obeys the standard normal distribution which encourages all nodes to gather around 0. That leads to the inability to fully utilize the latent space. Therefore, it becomes a challenge on how to choose a suitable prior without incorporating additional expert knowledge. Given this, we propose a novel noninformative prior-based interpretable variational graph autoencoder (NPIVGAE). Specifically, we exploit the noninformative prior as the prior distribution of latent variables. This prior enables the posterior distribution parameters to be almost learned from the sample data. Furthermore, we regard each dimension of a latent variable as the probability that the node belongs to each block, thereby improving the interpretability of the model. The correlation within and between blocks is described by a block–block correlation matrix. We compare our model with state-of-the-art methods on three real datasets, verifying its effectiveness and superiority.


Author(s):  
Matteo Chiara ◽  
Federico Zambelli ◽  
Marco Antonio Tangaro ◽  
Pietro Mandreoli ◽  
David S Horner ◽  
...  

Abstract Summary While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. Availabilityand implementation Galaxy   http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document