predict gene function
Recently Published Documents


TOTAL DOCUMENTS

15
(FIVE YEARS 10)

H-INDEX

6
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Georgia Tsagkogeorga ◽  
Helena Santos Rosa ◽  
Andrej Alendar ◽  
Dan Leggate ◽  
Oliver Rausch ◽  
...  

RNA methylation plays an important role in functional regulation of RNAs, and has thus attracted an increasing interest in biology and drug discovery. Here, we collected and collated transcriptomic, proteomic, structural and physical interaction data from the Harmonizome database, and applied supervised machine learning to predict novel genes associated with RNA methylation pathways in human. We selected five types of classifiers, which we trained and evaluated using cross-validation on multiple training sets. The best models reached 88% accuracy based on cross-validation, and an average 91% accuracy on the test set. Using protein-protein interaction data, we propose six molecular sub-networks linking model predictions to previously known RNA methylation genes, with roles in mRNA methylation, tRNA processing, rRNA processing, but also protein and chromatin modifications. Our study exemplifies how access to large omics datasets joined by machine learning methods can be used to predict gene function.


2021 ◽  
Author(s):  
Peng Ken Lim ◽  
Emilia E. Davey ◽  
Sean Wee ◽  
Wei Song Seetoh ◽  
Jong Ching Goh ◽  
...  

The bacterial kingdom comprises unicellular prokaryotes able to establish symbioses from mutualism to parasitism. To combat bacterial pathogenicity, we need an enhanced understanding of gene function and regulation, which will mediate the development of novel antimicrobials. Gene expression can predict gene function, but there lacks a database enabling expansive inter- and intraspecific exploration of gene expression profiles and co-expression networks for bacteria. To address this, we integrated the genomic and transcriptomic data of the 17 most notorious and studied bacterial pathogens, creating bacteria.guru, an interactive database that can identify, visualize, and compare gene expression profiles, co-expression networks, functionally enriched clusters, and gene families across species. Through illustrating antibiotic resistance mechanisms in P. aeruginosa, we demonstrate that bacteria.guru could potentially aid the discovery of multi-faceted antibiotic targets. Hence, we believe bacteria.guru will facilitate future bacterial research. Availability: The database and co-expression networks are freely available from https://bacteria.guru/. The sample annotations are found in the supplemental data.


2021 ◽  
Author(s):  
Alexander Lachmann ◽  
Kaeli Rizzo ◽  
Alon Bartal ◽  
Minji Jeon ◽  
Daniel J. B. Clarke ◽  
...  

Gene co-expression correlations from mRNA-sequencing (RNA-seq) can be used to predict gene function based on the covariance structure that exists within such data. In the past, we showed that RNA-seq co-expression data is highly predictive of gene function and protein-protein interactions. We demonstrated that the performance of such predictions is dependent on the source of the gene expression data. Furthermore, since genes function in different cellular contexts, predictions derived from tissue-specific gene co-expression data outperform predictions derived from cross-tissue gene co-expression data. However, the identification of the optimal tissue type to maximize gene function predictions for all mammalian genes is not trivial. Here we introduce and validate an approach we term Partitioning RNA-seq data Into Segments for Massive co-EXpression-based gene function Predictions (PrismExp), for improved gene function prediction based on RNA-seq co-expression data. With coexpression data from ARCHS4, we apply PrismExp to predict a wide variety of gene functions, including pathway membership, phenotypic associations, and protein-protein interactions. PrismExp outperforms the cross-tissue co-expression correlation matrix approach on all tested domains. Hence, PrismExp can enhance machine learning methods that utilize RNA-seq coexpression correlations to impute knowledge about understudied genes and proteins.


2020 ◽  
Vol 30 (20) ◽  
pp. 3961-3971.e6
Author(s):  
Aditya C. Bandekar ◽  
Sishir Subedi ◽  
Thomas R. Ioerger ◽  
Christopher M. Sassetti

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Matej Mihelčić ◽  
Tomislav Šmuc ◽  
Fran Supek

AbstractGenes with similar roles in the cell cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes. It is a very common occurrence that pairs of dissimilar gene functions – corresponding to semantically distant Gene Ontology terms – are significantly co-located on chromosomes. These neighborhood associations are often as conserved across genomes as the known associations between similar functions, suggesting selective benefits from clustering of certain diverse functions, which may conceivably play complementary roles in the cell. We propose a simple encoding of chromosomal gene order, the neighborhood function profiles (NFP), which draws on diverse gene clustering patterns to predict gene function and phenotype. NFPs yield a 26–46% increase in predictive power over state-of-the-art approaches that propagate function across neighborhoods, thus providing hundreds of novel, high-confidence gene function inferences per genome. Furthermore, we demonstrate that copy number-neutral structural variation that shapes gene function distribution across chromosomes can predict phenotype of individuals from their genome sequence.


2019 ◽  
Vol 48 (D1) ◽  
pp. D768-D775 ◽  
Author(s):  
Qiao Wen Tan ◽  
Marek Mutwil

Abstract Malaria is a tropical parasitic disease caused by the Plasmodium genus, which resulted in an estimated 219 million cases of malaria and 435 000 malaria-related deaths in 2017. Despite the availability of the Plasmodium falciparum genome since 2002, 74% of the genes remain uncharacterized. To remedy this paucity of functional information, we used transcriptomic data to build gene co-expression networks for two Plasmodium species (P. falciparum and P. berghei), and included genomic data of four other Plasmodium species, P. yoelii, P. knowlesi, P. vivax and P. cynomolgi, as well as two non-Plasmodium species from the Apicomplexa, Toxoplasma gondii and Theileria parva. The genomic and transcriptomic data were incorporated into the resulting database, malaria.tools, which is preloaded with tools that allow the identification and cross-species comparison of co-expressed gene neighbourhoods, clusters and life stage-specific expression, thus providing sophisticated tools to predict gene function. Moreover, we exemplify how the tools can be used to easily identify genes relevant for pathogenicity and various life stages of the malaria parasite. The database is freely available at www.malaria.tools.


2019 ◽  
Vol 47 (W1) ◽  
pp. W571-W577 ◽  
Author(s):  
Alexander Lachmann ◽  
Brian M Schilder ◽  
Megan L Wojciechowicz ◽  
Denis Torre ◽  
Maxim V Kuleshov ◽  
...  

Abstract The frequency by which genes are studied correlates with the prior knowledge accumulated about them. This leads to an imbalance in research attention where some genes are highly investigated while others are ignored. Geneshot is a search engine developed to illuminate this gap and to promote attention to the under-studied genome. Through a simple web interface, Geneshot enables researchers to enter arbitrary search terms, to receive ranked lists of genes relevant to the search terms. Returned ranked gene lists contain genes that were previously published in association with the search terms, as well as genes predicted to be associated with the terms based on data integration from multiple sources. The search results are presented with interactive visualizations. To predict gene function, Geneshot utilizes gene–gene similarity matrices from processed RNA-seq data, or from gene–gene co-occurrence data obtained from multiple sources. In addition, Geneshot can be used to analyze the novelty of gene sets and augment gene sets with additional relevant genes. The Geneshot web-server and API are freely and openly available from https://amp.pharm.mssm.edu/geneshot.


2019 ◽  
Author(s):  
Qiao Wen Tan ◽  
Marek Mutwil

ABSTRACTMalaria is a tropical parasitic disease caused by the Plasmodium genus, which resulted in an estimated 219 million cases of malaria and 435,000 malaria-related deaths in 2017. Despite the availability of the P. falciparum genome since 2002, almost 50% of the genes remain unannotated. To remedy this paucity of functional information, we used transcriptomic data to build gene co-expression networks for two Plasmodium species (P. falciparum and P. berghei), and included genomic data of four other Plasmodium species, P. yoleii, P. knowlesi, P. vivax and P. cynomolgi, as well as two non-Plasmodium species from the Apicomplexa, Toxoplasma gondii and Theileria parva. The database is preloaded with tools that allow the identification and cross-species comparison of co-expressed gene neighborhoods, clusters, and life stage-specific expression, thus providing sophisticated tools to predict gene function. Moreover, we exemplify how the tools can be used to easily identify genes relevant for pathogenicity and various life stages of the malaria parasite. The database is freely available at www.malaria.tools.


2019 ◽  
Author(s):  
Matej Mihelčić ◽  
Tomislav Šmuc ◽  
Fran Supek

AbstractGenes with similar roles in the cell are known to cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes. It is a very common occurrence that pairs of dissimilar gene functions – corresponding to semantically distant Gene Ontology terms – are significantly co-located on chromosomes. These neighborhood associations are often as conserved across genomes as the known associations between similar functions, suggesting selective benefits from clustering of certain diverse functions, which may conceivably play complementary roles in the cell. We propose a simple encoding of chromosomal gene order, the neighborhood function profiles (NFP), which draws on diverse gene clustering patterns to predict gene function and phenotype. NFPs yield a 26-46% increase in predictive power over state-of-the-art approaches that propagate function across neighborhoods, thus providing hundreds of novel, high-confidence gene function inferences per genome. Furthermore, we demonstrate that the effect of structural variation on gene function distribution across chromosomes may be used to predict phenotype of individuals from their genome sequence.


Sign in / Sign up

Export Citation Format

Share Document