scholarly journals PreBINDS: An Interactive Web Tool to Create Appropriate Datasets for Predicting Compound–Protein Interactions

2021 ◽  
Vol 8 ◽  
Author(s):  
Kazuyoshi Ikeda ◽  
Takuo Doi ◽  
Masami Ikeda ◽  
Kentaro Tomii

Given the abundant computational resources and the huge amount of data of compound–protein interactions (CPIs), constructing appropriate datasets for learning and evaluating prediction models for CPIs is not always easy. For this study, we have developed a web server to facilitate the development and evaluation of prediction models by providing an appropriate dataset according to the task. Our web server provides an environment and dataset that aid model developers and evaluators in obtaining a suitable dataset for both proteins and compounds, in addition to attributes necessary for deep learning. With the web server interface, users can customize the CPI dataset derived from ChEMBL by setting positive and negative thresholds to be adjusted according to the user’s definitions. We have also implemented a function for graphic display of the distribution of activity values in the dataset as a histogram to set appropriate thresholds for positive and negative examples. These functions enable effective development and evaluation of models. Furthermore, users can prepare their task-specific datasets by selecting a set of target proteins based on various criteria such as Pfam families, ChEMBL’s classification, and sequence similarities. The accuracy and efficiency of in silico screening and drug design using machine learning including deep learning can therefore be improved by facilitating access to an appropriate dataset prepared using our web server (https://binds.lifematics.work/).

2019 ◽  
Vol 21 (5) ◽  
pp. 1798-1805 ◽  
Author(s):  
Kai Yu ◽  
Qingfeng Zhang ◽  
Zekun Liu ◽  
Yimeng Du ◽  
Xinjiao Gao ◽  
...  

Abstract Protein lysine acetylation regulation is an important molecular mechanism for regulating cellular processes and plays critical physiological and pathological roles in cancers and diseases. Although massive acetylation sites have been identified through experimental identification and high-throughput proteomics techniques, their enzyme-specific regulation remains largely unknown. Here, we developed the deep learning-based protein lysine acetylation modification prediction (Deep-PLA) software for histone acetyltransferase (HAT)/histone deacetylase (HDAC)-specific acetylation prediction based on deep learning. Experimentally identified substrates and sites of several HATs and HDACs were curated from the literature to generate enzyme-specific data sets. We integrated various protein sequence features with deep neural network and optimized the hyperparameters with particle swarm optimization, which achieved satisfactory performance. Through comparisons based on cross-validations and testing data sets, the model outperformed previous studies. Meanwhile, we found that protein–protein interactions could enrich enzyme-specific acetylation regulatory relations and visualized this information in the Deep-PLA web server. Furthermore, a cross-cancer analysis of acetylation-associated mutations revealed that acetylation regulation was intensively disrupted by mutations in cancers and heavily implicated in the regulation of cancer signaling. These prediction and analysis results might provide helpful information to reveal the regulatory mechanism of protein acetylation in various biological processes to promote the research on prognosis and treatment of cancers. Therefore, the Deep-PLA predictor and protein acetylation interaction networks could provide helpful information for studying the regulation of protein acetylation. The web server of Deep-PLA could be accessed at http://deeppla.cancerbio.info.


2020 ◽  
Vol 48 (W1) ◽  
pp. W580-W585 ◽  
Author(s):  
Priyanka Banerjee ◽  
Mathias Dunkel ◽  
Emanuel Kemmler ◽  
Robert Preissner

Abstract Cytochrome P450 enzymes (CYPs)-mediated drug metabolism influences drug pharmacokinetics and results in adverse outcomes in patients through drug–drug interactions (DDIs). Absorption, distribution, metabolism, excretion and toxicity (ADMET) issues are the leading causes for the failure of a drug in the clinical trials. As details on their metabolism are known for just half of the approved drugs, a tool for reliable prediction of CYPs specificity is needed. The SuperCYPsPred web server is currently focused on five major CYPs isoenzymes, which includes CYP1A2, CYP2C19, CYP2D6, CYP2C9 and CYP3A4 that are responsible for more than 80% of the metabolism of clinical drugs. The prediction models for classification of the CYPs inhibition are based on well-established machine learning methods. The models were validated both on cross-validation and external validation sets and achieved good performance. The web server takes a 2D chemical structure as input and reports the CYP inhibition profile of the chemical for 10 models using different molecular fingerprints, along with confidence scores, similar compounds, known CYPs information of drugs—published in literature, detailed interaction profile of individual cytochromes including a DDIs table and an overall CYPs prediction radar chart (http://insilico-cyp.charite.de/SuperCYPsPred/). The web server does not require log in or registration and is free to use.


2021 ◽  
Vol 8 ◽  
Author(s):  
Mariela González-Avendaño ◽  
Simón Zúñiga-Almonacid ◽  
Ian Silva ◽  
Boris Lavanderos ◽  
Felipe Robinson ◽  
...  

Mass spectrometry-based proteomics methods are widely used to identify and quantify protein complexes involved in diverse biological processes. Specifically, tandem mass spectrometry methods represent an accurate and sensitive strategy for identifying protein-protein interactions. However, most of these approaches provide only lists of peptide fragments associated with a target protein, without performing further analyses to discriminate physical or functional protein-protein interactions. Here, we present the PPI-MASS web server, which provides an interactive analytics platform to identify protein-protein interactions with pharmacological potential by filtering a large protein set according to different biological features. Starting from a list of proteins detected by MS-based methods, PPI-MASS integrates an automatized pipeline to obtain information of each protein from freely accessible databases. The collected data include protein sequence, functional and structural properties, associated pathologies and drugs, as well as location and expression in human tissues. Based on this information, users can manipulate different filters in the web platform to identify candidate proteins to establish physical contacts with a target protein. Thus, our server offers a simple but powerful tool to detect novel protein-protein interactions, avoiding tedious and time-consuming data postprocessing. To test the web server, we employed the interactome of the TRPM4 and TMPRSS11a proteins as a use case. From these data, protein-protein interactions were identified, which have been validated through biochemical and bioinformatic studies. Accordingly, our web platform provides a comprehensive and complementary tool for identifying protein-protein complexes assisting the future design of associated therapies.


Author(s):  
Shaun C. D'Souza

Cognitive neuroscience is the study of how the human brain functions on tasks like decision making, language, perception and reasoning. Deep learning is a class of machine learning algorithms that use neural networks. They are designed to model the responses of neurons in the human brain. Learning can be supervised or unsupervised. Ngram token models are used extensively in language prediction. Ngrams are probabilistic models that are used in predicting the next word or token. They are a statistical model of word sequences or tokens and are called Language Models or Lms. Ngrams are essential in creating language prediction models. We are exploring a broader sandbox ecosystems enabling for AI. Specifically, around Deep learning applications on unstructured content form on the web.


2018 ◽  
Author(s):  
Shaun C. D'Souza

Cognitive neuroscience is the study of how the human brain functions on tasks like decision making, language, perception and reasoning. Deep learning is a class of machine learning algorithms that use neural networks. They are designed to model the responses of neurons in the human brain. Learning can be supervised or unsupervised. Ngram token models are used extensively in language prediction. Ngrams are probabilistic models that are used in predicting the next word or token. They are a statistical model of word sequences or tokens and are called Language Models or Lms. Ngrams are essential in creating language prediction models. We are exploring a broader sandbox ecosystems enabling for AI. Specifically, around Deep learning applications on unstructured content form on the web.


2019 ◽  
Vol 47 (W1) ◽  
pp. W295-W299 ◽  
Author(s):  
Ralf Gabriels ◽  
Lennart Martens ◽  
Sven Degroeve

AbstractMS²PIP is a data-driven tool that accurately predicts peak intensities for a given peptide's fragmentation mass spectrum. Since the release of the MS²PIP web server in 2015, we have brought significant updates to both the tool and the web server. In addition to the original models for CID and HCD fragmentation, we have added specialized models for the TripleTOF 5600+ mass spectrometer, for TMT-labeled peptides, for iTRAQ-labeled peptides, and for iTRAQ-labeled phosphopeptides. Because the fragmentation pattern is heavily altered in each of these cases, these additional models greatly improve the prediction accuracy for their corresponding data types. We have also substantially reduced the computational resources required to run MS²PIP, and have completely rebuilt the web server, which now allows predictions of up to 100 000 peptide sequences in a single request. The MS²PIP web server is freely available at https://iomics.ugent.be/ms2pip/.


2019 ◽  
Author(s):  
Ralf Gabriels ◽  
Lennart Martens ◽  
Sven Degroeve

ABSTRACTMS2PIP is a data-driven tool that accurately predicts peak intensities for a given peptide’s fragmentation mass spectrum. Since the release of the MS2PIP web server in 2015, we have brought significant updates to both the tool and the web server. Next to the original models for CID and HCD fragmentation, we have added specific models for the TripleTOF 5600+ mass spectrometer, for TMT-labeled peptides, for iTRAQ-labeled peptides and for iTRAQ-labeled phosphopeptides. Because the fragmentation pattern is heavily altered in each of these cases, these additional models greatly improve the prediction accuracy for their corresponding data types. We have also substantially reduced the computational resources required to run MS2PIP, and have completely rebuilt the web server, which now allows predictions of up to 100.000 peptide sequences in a single request. The MS2PIP web server is freely available at https://iomics.ugent.be/ms2pip/.


2018 ◽  
Author(s):  
Shaun C. D'Souza

Cognitive neuroscience is the study of how the human brain functions on tasks like decision making, language, perception and reasoning. Deep learning is a class of machine learning algorithms that use neural networks. They are designed to model the responses of neurons in the human brain. Learning can be supervised or unsupervised. Ngram token models are used extensively in language prediction. Ngrams are probabilistic models that are used in predicting the next word or token. They are a statistical model of word sequences or tokens and are called Language Models or Lms. Ngrams are essential in creating language prediction models. We are exploring a broader sandbox ecosystems enabling for AI. Specifically, around Deep learning applications on unstructured content form on the web.


2016 ◽  
Vol 1 (1) ◽  
pp. 001
Author(s):  
Harry Setya Hadi

String searching is a common process in the processes that made the computer because the text is the main form of data storage. Boyer-Moore is the search string from right to left is considered the most efficient methods in practice, and matching string from the specified direction specifically an algorithm that has the best results theoretically. A system that is connected to a computer network that literally pick a web server that is accessed by multiple users in different parts of both good and bad aim. Any activity performed by the user, will be stored in Web server logs. With a log report contained in the web server can help a web server administrator to search the web request error. Web server log is a record of the activities of a web site that contains the data associated with the IP address, time of access, the page is opened, activities, and access methods. The amount of data contained in the resulting log is a log shed useful information.


2020 ◽  
Author(s):  
Priyanka Meel ◽  
Farhin Bano ◽  
Dr. Dinesh K. Vishwakarma

Sign in / Sign up

Export Citation Format

Share Document