Optimization of co-evolution analysis through phylogenetic profiling reveals pathway-specific signals

2020 ◽  
Vol 36 (14) ◽  
pp. 4116-4125 ◽  
Author(s):  
Idit Bloch ◽  
Dana Sherill-Rofe ◽  
Doron Stupp ◽  
Irene Unterman ◽  
Hodaya Beer ◽  
...  

Abstract Summary The exponential growth in available genomic data is expected to reach full sequencing of a million genomes in the coming decade. Improving and developing methods to analyze these genomes and to reveal their utility is of major interest in a wide variety of fields, such as comparative and functional genomics, evolution and bioinformatics. Phylogenetic profiling is an established method for predicting functional interactions between proteins based on similarities in their evolutionary patterns across species. Proteins that function together (i.e. generate complexes, interact in the same pathways or improve adaptation to environmental niches) tend to show coordinated evolution across the tree of life. The normalized phylogenetic profiling (NPP) method takes into account minute changes in proteins across species to identify protein co-evolution. Despite the success of this method, it is still not clear what set of parameters is required for optimal use of co-evolution in predicting functional interactions. Moreover, it is not clear if pathway evolution or function should direct parameter choice. Here, we create a reliable and usable NPP construction pipeline. We explore the effect of parameter selection on functional interaction prediction using NPP from 1028 genomes, both separately and in various value combinations. We identify several parameter sets that optimize performance for pathways with certain biological annotation. This work reveals the importance of choosing the right parameters for optimized function prediction based on a biological context. Availability and implementation Source code and documentation are available on GitHub: https://github.com/iditam/CompareNPPs. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Tomer Tsaban ◽  
Doron Stupp ◽  
Dana Sherill-Rofe ◽  
Idit Bloch ◽  
Elad Sharon ◽  
...  

Abstract Mapping co-evolved genes via phylogenetic profiling (PP) is a powerful approach to uncover functional interactions between genes and to associate them with pathways. Despite many successful endeavors, the understanding of co-evolutionary signals in eukaryotes remains partial. Our hypothesis is that ‘Clades’, branches of the tree of life (e.g. primates and mammals), encompass signals that cannot be detected by PP using all eukaryotes. As such, integrating information from different clades should reveal local co-evolution signals and improve function prediction. Accordingly, we analyzed 1028 genomes in 66 clades and demonstrated that the co-evolutionary signal was scattered across clades. We showed that functionally related genes are frequently co-evolved in only parts of the eukaryotic tree and that clades are complementary in detecting functional interactions within pathways. We examined the non-homologous end joining pathway and the UFM1 ubiquitin-like protein pathway and showed that both demonstrated distinguished co-evolution patterns in specific clades. Our research offers a different way to look at co-evolution across eukaryotes and points to the importance of modular co-evolution analysis. We developed the ‘CladeOScope’ PP method to integrate information from 16 clades across over 1000 eukaryotic genomes and is accessible via an easy to use web server at http://cladeoscope.cs.huji.ac.il.


Author(s):  
Matteo Chiara ◽  
Federico Zambelli ◽  
Marco Antonio Tangaro ◽  
Pietro Mandreoli ◽  
David S Horner ◽  
...  

Abstract Summary While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. Availabilityand implementation Galaxy   http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
John Zobolas ◽  
Vasundra Touré ◽  
Martin Kuiper ◽  
Steven Vercruysse

Abstract Summary We present a set of software packages that provide uniform access to diverse biological vocabulary resources that are instrumental for current biocuration efforts and tools. The Unified Biological Dictionaries (UniBioDicts or UBDs) provide a single query-interface for accessing the online API services of leading biological data providers. Given a search string, UBDs return a list of matching term, identifier and metadata units from databases (e.g. UniProt), controlled vocabularies (e.g. PSI-MI) and ontologies (e.g. GO, via BioPortal). This functionality can be connected to input fields (user-interface components) that offer autocomplete lookup for these dictionaries. UBDs create a unified gateway for accessing life science concepts, helping curators find annotation terms across resources (based on descriptive metadata and unambiguous identifiers), and helping data users search and retrieve the right query terms. Availability and implementation The UBDs are available through npm and the code is available in the GitHub organisation UniBioDicts (https://github.com/UniBioDicts) under the Affero GPL license. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Kexin Huang ◽  
Tianfan Fu ◽  
Lucas M Glass ◽  
Marinka Zitnik ◽  
Cao Xiao ◽  
...  

Abstract Summary Accurate prediction of drug–target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use DL library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. Availability and implementation https://github.com/kexinhuang12345/DeepPurpose. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (20) ◽  
pp. 5061-5067
Author(s):  
Ali Akbar Jamali ◽  
Anthony Kusalik ◽  
Fang-Xiang Wu

Abstract Motivation Evidence has shown that microRNAs, one type of small biomolecule, regulate the expression level of genes and play an important role in the development or treatment of diseases. Drugs, as important chemical compounds, can interact with microRNAs and change their functions. The experimental identification of microRNA–drug interactions is time-consuming and expensive. Therefore, it is appealing to develop effective computational approaches for predicting microRNA–drug interactions. Results In this study, a matrix factorization-based method, called the microRNA–drug interaction prediction approach (MDIPA), is proposed for predicting unknown interactions among microRNAs and drugs. Specifically, MDIPA utilizes experimentally validated interactions between drugs and microRNAs, drug similarity and microRNA similarity to predict undiscovered interactions. A path-based microRNA similarity matrix is constructed, while the structural information of drugs is used to establish a drug similarity matrix. To evaluate its performance, our MDIPA is compared with four state-of-the-art prediction methods with an independent dataset and cross-validation. The results of both evaluation methods confirm the superior performance of MDIPA over other methods. Finally, the results of molecular docking in a case study with breast cancer confirm the efficacy of our approach. In conclusion, MDIPA can be effective in predicting potential microRNA–drug interactions. Availability and implementation All code and data are freely available from https://github.com/AliJam82/MDIPA. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 2986-2992 ◽  
Author(s):  
Qiang Kang ◽  
Jun Meng ◽  
Jun Cui ◽  
Yushi Luan ◽  
Ming Chen

Abstract Motivation The studies have indicated that not only microRNAs (miRNAs) or long non-coding RNAs (lncRNAs) play important roles in biological activities, but also their interactions affect the biological process. A growing number of studies focus on the miRNA–lncRNA interactions, while few of them are proposed for plant. The prediction of interactions is significant for understanding the mechanism of interaction between miRNA and lncRNA in plant. Results This article proposes a new method for fulfilling plant miRNA–lncRNA interaction prediction (PmliPred). The deep learning model and shallow machine learning model are trained using raw sequence and manually extracted features, respectively. Then they are hybridized based on fuzzy decision for prediction. PmliPred shows better performance and generalization ability compared with the existing methods. Several new miRNA–lncRNA interactions in Solanum lycopersicum are successfully identified using quantitative real time–polymerase chain reaction from the candidates predicted by PmliPred, which further verifies its effectiveness. Availability and implementation The source code of PmliPred is freely available at http://bis.zju.edu.cn/PmliPred/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (14) ◽  
pp. i305-i314 ◽  
Author(s):  
Muhao Chen ◽  
Chelsea J -T Ju ◽  
Guangyu Zhou ◽  
Xuelu Chen ◽  
Tianran Zhang ◽  
...  

AbstractMotivationSequence-based protein–protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information.ResultsWe present an end-to-end framework, PIPR (Protein–Protein Interaction Prediction Based on Siamese Residual RCNN), for PPI predictions using only the protein sequences. PIPR incorporates a deep residual recurrent convolutional neural network in the Siamese architecture, which leverages both robust local features and contextualized information, which are significant for capturing the mutual influence of proteins sequences. PIPR relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that PIPR outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.Availability and implementationThe implementation is available at https://github.com/muhaochen/seq_ppi.git.Supplementary informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Safia Zeghbib ◽  
Róbert Herczeg ◽  
Gábor Kemenesi ◽  
Brigitta Zana ◽  
Kornélia Kurucz ◽  
...  

Abstract Bats are reservoirs of numerous zoonotic viruses. The Picornaviridae family comprises important pathogens which may infect both humans and animals. In this study, a bat-related picornavirus was detected from Algerian Minioptreus schreibersii bats for the first time in the country. Molecular analyses revealed the new virus originates to the Mischivirus genus. In the operational use of the acquired sequence and all available data regarding bat picornaviruses, we performed a co-evolutionary analysis of mischiviruses and their hosts, to authentically reveal evolutionary patterns within this genus. Based on this analysis, we enlarged the dataset, and examined the co-evolutionary history of all bat-related picornaviruses including their hosts, to effectively compile all possible species jumping events during their evolution. Furthermore, we explored the phylogeny association with geographical location, host-genus and host-species in both data sets.


1970 ◽  
Vol 21 (1) ◽  
pp. 163 ◽  
Author(s):  
RH Wharton ◽  
KBW Utech ◽  
HG Turner

An Australian Illawarra Shorthorn herd of 24 cows was mated in three consecutive years with an AIS bull. The cows and their progeny were rated for tick resistance at frequent intervals from August 1959 to December 1965 by counting the numbers of semiengorged female ticks on the right side. The mean of log counts for all counts on a particular animal was adopted as the reference value for its degree of susceptibility. The ranking of cattle generally showed a high level of consistency with mean repeatability of counts (r = 0.47, P < 0.01). Discrimination between animals was more reliable (P < 0.01) in summer (r = 0.52) than in winter (r = 0.27). The repeatability of tick counts increased with mean count, from r = 0.27 when the mean count was 3 to r = 0.67 when it was 100. The reliability of counts on the cows decreased with age and with lactation. Supplementary information on a larger herd showed no effect of pregnancy on mean count or on discrimination between susceptible and resistant animals, but showed that there was a partial breakdown of resistance during lactation. In calves infested naturally, no effects of age or sex on tick counts or their repeatability were detected, though male calves yielded significantly larger numbers of ticks than females when infested artificially. The mean yield of mature female ticks on the cows following two artificial infestations with known numbers of larvae ranged from 0.2 to 27.4% of the potential. Natural and artificial assessments of susceptibility were closely correlated. The rank of the bull was similar to that of the more resistant cows. Mean estimates of the heritability of tick resistance based on single counts were 39 % from dam-calf correlations and 49 % from full-sib correlations. Estimates based on summer counts only were 42 and 64% respectively. These results provide strong encouragement for selecting for tick resistance.


2021 ◽  
Vol 15 ◽  
Author(s):  
Yifan Li ◽  
Mingrui Li ◽  
Yue Feng ◽  
Xiaomeng Ma ◽  
Xin Tan ◽  
...  

Objective: We aimed to explore whether the percent amplitude of fluctuation (PerAF) measurement could provide supplementary information for amplitude of low-frequency fluctuation (ALFF) about spontaneous activity alteration in type 2 diabetes mellitus (T2DM) subjects without mild cognitive impairment (MCI). Then we further evaluated the synchronization through the method of functional connectivity (FC) to better demonstrate brain changes in a more comprehensive manner in T2DM.Methods: Thirty T2DM subjects without MCI and thirty well-matched healthy subjects were recruited in this study. Subjects’ clinical data, neuropsychological test results, and resting-state functional magnetic resonance imaging (rs-fMRI) data were acquired. Voxel-based group difference comparisons between PerAF and ALFF were conducted. Then, seed-based FC between the recognized brain regions based on PerAF and ALFF and the rest of the whole brain was performed.Results: Compared with healthy group, T2DM group had significantly decreased PerAF in the bilateral middle occipital gyrus and the right calcarine, increased ALFF in the right orbital inferior frontal gyrus and decreased ALFF in the right calcarine. Seed-based FC analysis showed that the right middle occipital gyrus of T2DM subjects exhibited significantly decreased FC with the right caudate nucleus and right putamen. According to the partial correlation analyses, hemoglobin A1c (HbA1c) and immediate memory scores on the auditory verbal learning test (AVLT) were negatively correlated in the T2DM group. However, we found that total cholesterol was positively correlated with symbol digit test (SDT) scores.Conclusion: PerAF and ALFF may have different sensitivities in detecting the abnormal spontaneous brain activity in T2DM subjects. We suggest PerAF values may add supplementary information and indicate additional potential neuronal spontaneous activity in T2DM subjects without MCI, which may provide new insights into the neuroimaging mechanisms underlying early diabetes-associated cognitive decline.


Sign in / Sign up

Export Citation Format

Share Document