scholarly journals Machine learning can differentiate venom toxins from other proteins having non-toxic physiological functions

2016 ◽  
Vol 2 ◽  
pp. e90 ◽  
Author(s):  
Ranko Gacesa ◽  
David J. Barlow ◽  
Paul F. Long

Ascribing function to sequence in the absence of biological data is an ongoing challenge in bioinformatics. Differentiating the toxins of venomous animals from homologues having other physiological functions is particularly problematic as there are no universally accepted methods by which to attribute toxin function using sequence data alone. Bioinformatics tools that do exist are difficult to implement for researchers with little bioinformatics training. Here we announce a machine learning tool called ‘ToxClassifier’ that enables simple and consistent discrimination of toxins from non-toxin sequences with >99% accuracy and compare it to commonly used toxin annotation methods. ‘ToxClassifer’ also reports the best-hit annotation allowing placement of a toxin into the most appropriate toxin protein family, or relates it to a non-toxic protein having the closest homology, giving enhanced curation of existing biological databases and new venomics projects. ‘ToxClassifier’ is available for free, either to download (https://github.com/rgacesa/ToxClassifier) or to use on a web-based server (http://bioserv7.bioinfo.pbf.hr/ToxClassifier/).

Author(s):  
Yoshihiro Yamanishi ◽  
Hisashi Kashima

In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compound-protein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two different viewpoints: binary classification and dimension reduction. In the results, they demonstrate the usefulness of the methods on the prediction of drug-target interactions and ligand-protein interactions from chemical structure data and genomic sequence data.


2012 ◽  
pp. 616-630
Author(s):  
Yoshihiro Yamanishi ◽  
Hisashi Kashima

In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compound-protein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two different viewpoints: binary classification and dimension reduction. In the results, they demonstrate the usefulness of the methods on the prediction of drug-target interactions and ligand-protein interactions from chemical structure data and genomic sequence data.


2020 ◽  
Author(s):  
Abhishek Agarwal ◽  
Piyush Agrawal ◽  
Aditi Sharma ◽  
Vinod Kumar ◽  
Chirag Mugdal ◽  
...  

AbstractIndiaBioDb (https://webs.iiitd.edu.in/raghava/indiabiodb/) is a manually curated comprehensive repository of bioinformatics resources developed and maintained by Indian researchers. This repository maintains information about 543 freely accessible functional resources that include around 258 biological databases. Each entry provides a complete detail about a resource that includes the name of resources, web link, detail of publication, information about the corresponding author, name of institute, type of resource. A user-friendly searching module has been integrated, which allows users to search our repository on any field. In order to retrieve categorized information, we integrate the browsing facility in this repository. This database can be utilized for extracting the useful information regarding the present scenario of bioinformatics inclusive of all research labs funded by government and private bodies of India. In addition to web interface, we also developed mobile to facilitate the scientific community.


2020 ◽  
Author(s):  
Hualin Liu ◽  
Jinshui Zheng ◽  
Dexin Bo ◽  
Yun Yu ◽  
Weixing Ye ◽  
...  

SummaryBacillus thuringiensis (Bt) which is a spore-forming gram-positive bacterium, has been used as the most successful microbial pesticide for decades. Its toxin genes (cry) have been successfully used for the development of GM crops against pests. We have previously developed a web-based insecticidal gene mining tool BtToxin_scanner, which has been proved to be the most important method for mining cry genes from Bt genome sequences. To facilitate efficiently mining major toxin genes and novel virulence factors from large-scale Bt genomic data, we re-design this tool with a new workflow. Here we present BtToxin_Digger, a comprehensive, high-throughput, and easy-to-use Bt toxin mining tool. It runs fast and can get rich, accurate, and useful results for downstream analysis and experiment designs. Moreover, it can also be used to mine other targeting genes from large-scale genome and metagenome data with the addition of other query sequences.Availability and ImplementationThe BtToxin_Digger codes and instructions are freely available at https://github.com/BMBGenomics/BtToxin_Digger. A web server of BtToxin_Digger can be found at http://bcam.hzau.edu.cn/[email protected]; [email protected].


2018 ◽  
Author(s):  
Zhao Li ◽  
Jin Li ◽  
Peng Yu

AbstractMetadata curation has become increasingly important for biological discovery and biomedical research because a large amount of heterogeneous biological data is currently freely available. To facilitate efficient metadata curation, we developed an easy-to-use web-based curation application, GEOMetaCuration, for curating the metadata of Gene Expression Omnibus datasets. It can eliminate mechanical operations that consume precious curation time and can help coordinate curation efforts among multiple curators. It improves the curation process by introducing various features that are critical to metadata curation, such as a back-end curation management system and a curator-friendly front-end. The application is based on a commonly used web development framework of Python/Django and is open-sourced under the GNU General Public License V3. GEOMetaCuration is expected to benefit the biocuration community and to contribute to computational generation of biological insights using large-scale biological data. An example use case can be found at the demo website: http://geometacuration.yubiolab.org. Source code URL: https://bitbucket.com/yubiolab/GEOMetaCuration


2020 ◽  
Author(s):  
Benjamin A. Braun ◽  
Catherine H. Schein ◽  
Werner Braun

AbstractMotivationThere is a need for rapid and easy to use, alignment free methods to cluster large groups of protein sequence data. Commonly used phylogenetic trees based on alignments can be used to visualize only a limited number of protein sequences. DGraph, introduced here, is a dynamic programming application developed to generate 2D-maps based on similarity scores for sequences. The program automatically calculates and graphically displays property distance (PD) scores based on physico-chemical property (PCP) similarities from an unaligned list of FASTA files. Such “PD-graphs” show the interrelatedness of the sequences, whereby clusters can reveal deeper connectivities.ResultsPD-Graphs generated for flavivirus (FV), enterovirus (EV), and coronavirus (CoV) sequences from complete polyproteins or individual proteins are consistent with biological data on vector types, hosts, cellular receptors and disease phenotypes. PD-graphs separate the tick- from the mosquito-borne FV, clusters viruses that infect bats, camels, seabirds and humans separately and the clusters correlate with disease phenotype. The PD method segregates the β-CoV spike proteins of SARS, SARS-CoV-2, and MERS sequences from other human pathogenic CoV, with clustering consistent with cellular receptor usage. The graphs also suggest evolutionary relationships that may be difficult to determine with conventional bootstrapping methods that require postulating an ancestral sequence.Availability and implementationDGraph is written in Java, compatible with the Java 5 runtime or newer. Source code and executable is available from the GitHub website (https://github.com/bjmnbraun/DGraph/releases). Documentation for installation and use of the software is available from the Readme.md file at (https://github.com/bjmnbraun/DGraph)[email protected] or [email protected] informationSupplementary information Table S1 and Fig. S1 are online available.


2019 ◽  
Author(s):  
Ana Claudia Sima ◽  
Tarcisio Mendes de Farias ◽  
Erich Zbinden ◽  
Maria Anisimova ◽  
Manuel Gil ◽  
...  

MotivationData integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases.ResultsWe introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: 1) Bgee, a gene expression relational database; 2) OMA, a Hierarchical Data Format 5 (HDF5) orthology data store, and 3) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialised RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.Project URLhttp://biosoda.expasy.org, https://github.com/biosoda/bioquery


Sign in / Sign up

Export Citation Format

Share Document