Adaptive One-Class gaussian processes allow accurate prioritization of oncology drug targets

Author(s):  
Antonio de Falco ◽  
Zoltan Dezso ◽  
Francesco Ceccarelli ◽  
Luigi Cerulo ◽  
Angelo Ciaramella ◽  
...  

Abstract Motivation The cost of drug development has dramatically increased in the last decades, with the number new drugs approved per billion US dollars spent on R&D halving every year or less. The selection and prioritization of targets is one the the most influential decisions in drug discovery. Here we present a Gaussian Process model for the prioritization of drug targets cast as a problem of learning with only positive and unlabeled examples. Results Since the absence of negative samples does not allow standard methods for automatic selection of hyperparameters, we propose a novel approach for hyperparameter selection of the kernel in One Class Gaussian Processes. We compare our methods with state-of-the-art approaches on benchmark datasets and then show its application to druggability prediction of oncology drugs. Our score reaches an AUC 0.90 on a set of clinical trial targets starting from a small training set of 102 validated oncology targets. Our score recovers the majority of known drug targets and can be used to identify novel set of proteins as drug target candidates. Availability Source code implemented in Python is freely available for download at https://github.com/AntonioDeFalco/Adaptive-OCGP. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 22 (64) ◽  
pp. 63-84
Author(s):  
JanapatyI Naga Muneiah ◽  
Ch D V SubbaRao

Enterprises often classify their customers based on the degree of profitability in decreasing order like C1, C2, ..., Cn. Generally, customers representing class Cn are zero profitable since they migrate to the competitor. They are called as attritors (or churners) and are the prime reason for the huge losses of the enterprises. Nevertheless, customers of other intermediary classes are reluctant and offer an insignificant amount of profits in different degrees and lead to uncertainty. Various data mining models like decision trees, etc., which are built using the customers’ profiles, are limited to classifying the customers as attritors or non-attritors only and not providing profitable actionable knowledge. In this paper, we present an efficient algorithm for the automatic extraction of profit-maximizing knowledge for business applications with multi-class customers by postprocessing the probability estimation decision tree (PET). When the PET predicts a customer as belonging  to any of the lesser profitable classes, then, our algorithm suggests the cost-sensitive actions to change her/him to a maximum possible higher profitable status. In the proposed novel approach, the PET is represented in the compressed form as a Bit patterns matrix and the postprocessing task is performed on the bit patterns by applying the bitwise AND operations. The computational performance of the proposed method is strong due to the employment of effective data structures. Substantial experiments conducted on UCI datasets, real Mobile phone service data and other benchmark datasets demonstrate that the proposed method remarkably outperforms the state-of-the-art methods.


Author(s):  
Shaohan Huang ◽  
Yu Wu ◽  
Furu Wei ◽  
Zhongzhi Luan

An intuitive way for a human to write paraphrase sentences is to replace words or phrases in the original sentence with their corresponding synonyms and make necessary changes to ensure the new sentences are fluent and grammatically correct. We propose a novel approach to modeling the process with dictionary-guided editing networks which effectively conduct rewriting on the source sentence to generate paraphrase sentences. It jointly learns the selection of the appropriate word level and phrase level paraphrase pairs in the context of the original sentence from an off-the-shelf dictionary as well as the generation of fluent natural language sentences. Specifically, the system retrieves a set of word level and phrase level paraphrase pairs derived from the Paraphrase Database (PPDB) for the original sentence, which is used to guide the decision of which the words might be deleted or inserted with the soft attention mechanism under the sequence-to-sequence framework. We conduct experiments on two benchmark datasets for paraphrase generation, namely the MSCOCO and Quora dataset. The automatic evaluation results demonstrate that our dictionary-guided editing networks outperforms the baseline methods. On human evaluation, results indicate that the generated paraphrases are grammatically correct and relevant to the input sentence.


2018 ◽  
Vol 19 (9) ◽  
pp. 2817 ◽  
Author(s):  
Haixia Long ◽  
Bo Liao ◽  
Xingyu Xu ◽  
Jialiang Yang

Protein hydroxylation is one type of post-translational modifications (PTMs) playing critical roles in human diseases. It is known that protein sequence contains many uncharacterized residues of proline and lysine. The question that needs to be answered is: which residue can be hydroxylated, and which one cannot. The answer will not only help understand the mechanism of hydroxylation but can also benefit the development of new drugs. In this paper, we proposed a novel approach for predicting hydroxylation using a hybrid deep learning model integrating the convolutional neural network (CNN) and long short-term memory network (LSTM). We employed a pseudo amino acid composition (PseAAC) method to construct valid benchmark datasets based on a sliding window strategy and used the position-specific scoring matrix (PSSM) to represent samples as inputs to the deep learning model. In addition, we compared our method with popular predictors including CNN, iHyd-PseAAC, and iHyd-PseCp. The results for 5-fold cross-validations all demonstrated that our method significantly outperforms the other methods in prediction accuracy.


2014 ◽  
Vol 23 (04) ◽  
pp. 1460011 ◽  
Author(s):  
Slim Bouker ◽  
Rabie Saidi ◽  
Sadok Ben Yahia ◽  
Engelbert Mephu Nguifo

The increasing growth of databases raises an urgent need for more accurate methods to better understand the stored data. In this scope, association rules were extensively used for the analysis and the comprehension of huge amounts of data. However, the number of generated rules is too large to be efficiently analyzed and explored in any further process. In order to bypass this hamper, an efficient selection of rules has to be performed. Since selection is necessarily based on evaluation, many interestingness measures have been proposed. However, the abundance of these measures gave rise to a new problem, namely the heterogeneity of the evaluation results and this created confusion to the decision. In this respect, we propose a novel approach to discover interesting association rules without favoring or excluding any measure by adopting the notion of dominance between association rules. Our approach bypasses the problem of measure heterogeneity and unveils a compromise between their evaluations. Interestingly enough, the proposed approach also avoids another non-trivial problem which is the threshold value specification. Extensive carried out experiments on benchmark datasets show the benefits of the introduced approach.


2021 ◽  
Author(s):  
Pritee Nivrutti Hulule

Strategies for prioritizing test cases plan test cases to reduce the cost of retrospective testing and to enhance a specific objective function. Test cases are prioritized as those most important test cases under certain conditions are made before the re-examination process. There are many strategies available in the literature that focus on achieving various pre-test testing objectives and thus reduce their cost. In addition, inspectors often select a few well-known strategies for prioritizing trial cases. The main reason behind the lack of guidelines for the selection of TCP strategies. Therefore, this part of the study introduces the novel approach to TCP strategic planning using the ambiguous concept to support the effective selection of experimental strategies to prioritize experimental cases. This function is an extension of the already selected selection schemes for the prioritization of probation cases.


Author(s):  
Gül Gökay Emel ◽  
Gülcan Petriçli

In the late 1980s, the proportion of outsourced materials in the cost of high-tech products was around 80%. In this respect, with increasing globalization and ever-expanding supply chains, interdependencies between organizations have increased and the selection of suppliers has become more important than ever. This exploratory research study intends to develop a novel approach for a specific type of supplier selection problem which is supplier pre-evaluation. A two-staged multi-layered feed forward neural networks (NN) algorithm for pattern recognition was used to pre-evaluate suppliers under strategy-based organizational and technical criteria. Data for training, validation and testing the network were collected from a global Tier-1 manufacturing company in the automotive industry. The results show that the proposed approach is able to classify candidate suppliers into three separate groups of risky, potential or preferred. With this classification, it becomes feasible to eliminate risky suppliers before doing business with them.


2020 ◽  
Vol 36 (16) ◽  
pp. 4490-4497
Author(s):  
Siqi Liang ◽  
Haiyuan Yu

Abstract Motivation In silico drug target prediction provides valuable information for drug repurposing, understanding of side effects as well as expansion of the druggable genome. In particular, discovery of actionable drug targets is critical to developing targeted therapies for diseases. Results Here, we develop a robust method for drug target prediction by leveraging a class imbalance-tolerant machine learning framework with a novel training scheme. We incorporate novel features, including drug–gene phenotype similarity and gene expression profile similarity that capture information orthogonal to other features. We show that our classifier achieves robust performance and is able to predict gene targets for new drugs as well as drugs that potentially target unexplored genes. By providing newly predicted drug–target associations, we uncover novel opportunities of drug repurposing that may benefit cancer treatment through action on either known drug targets or currently undrugged genes. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 22 (64) ◽  
pp. 47-62
Author(s):  
Mariela Morveli Espinoza ◽  
Juan Carlos Nieves ◽  
Ayslan Possebom ◽  
Cesar Augusto Tacla

By considering rational agents, we focus on the problem of selecting goals out of a set of incompatible ones. We consider three forms of incompatibility introduced by Castelfranchi and Paglieri, namely the terminal, the instrumental (or based on resources), and the superfluity. We represent the agent's plans by means of structured arguments whose premises are pervaded with uncertainty. We measure the strength of these arguments in order to determine the set of compatible goals. We propose two novel ways for calculating the strength of these arguments, depending on the kind of incompatibility thatexists between them. The first one is the logical strength value, it is denoted by a three-dimensional vector, which is calculated from a probabilistic interval associated with each argument. The vector represents the precision of the interval, the location of it, and the combination of precision and location. This type of representation and treatment of the strength of a structured argument has not been defined before by the state of the art. The second way for calculating the strength of the argument is based on the cost of the plans (regarding the necessary resources) and the preference of the goals associated with the plans. Considering our novel approach for measuring the strength of structured arguments, we propose a semantics for the selection of plans and goals that is based on Dung's abstract argumentation theory. Finally, we make a theoretical evaluation of our proposal.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i779-i786
Author(s):  
Yahui Long ◽  
Min Wu ◽  
Yong Liu ◽  
Chee Keong Kwoh ◽  
Jiawei Luo ◽  
...  

Abstract Motivation Human microbes get closely involved in an extensive variety of complex human diseases and become new drug targets. In silico methods for identifying potential microbe–drug associations provide an effective complement to conventional experimental methods, which can not only benefit screening candidate compounds for drug development but also facilitate novel knowledge discovery for understanding microbe–drug interaction mechanisms. On the other hand, the recent increased availability of accumulated biomedical data for microbes and drugs provides a great opportunity for a machine learning approach to predict microbe–drug associations. We are thus highly motivated to integrate these data sources to improve prediction accuracy. In addition, it is extremely challenging to predict interactions for new drugs or new microbes, which have no existing microbe–drug associations. Results In this work, we leverage various sources of biomedical information and construct multiple networks (graphs) for microbes and drugs. Then, we develop a novel ensemble framework of graph attention networks with a hierarchical attention mechanism for microbe–drug association prediction from the constructed multiple microbe–drug graphs, denoted as EGATMDA. In particular, for each input graph, we design a graph convolutional network with node-level attention to learn embeddings for nodes (i.e. microbes and drugs). To effectively aggregate node embeddings from multiple input graphs, we implement graph-level attention to learn the importance of different input graphs. Experimental results under different cross-validation settings (e.g. the setting for predicting associations for new drugs) showed that our proposed method outperformed seven state-of-the-art methods. Case studies on predicted microbe–drug associations further demonstrated the effectiveness of our proposed EGATMDA method. Availability Source codes and supplementary materials are available at: https://github.com/longyahui/EGATMDA/ Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4640-4646 ◽  
Author(s):  
Xi Han ◽  
Xiaonan Wang ◽  
Kang Zhou

Abstract Motivation Protein activity is a significant characteristic for recombinant proteins which can be used as biocatalysts. High activity of proteins reduces the cost of biocatalysts. A model that can predict protein activity from amino acid sequence is highly desired, as it aids experimental improvement of proteins. However, only limited data for protein activity are currently available, which prevents the development of such models. Since protein activity and solubility are correlated for some proteins, the publicly available solubility dataset may be adopted to develop models that can predict protein solubility from sequence. The models could serve as a tool to indirectly predict protein activity from sequence. In literature, predicting protein solubility from sequence has been intensively explored, but the predicted solubility represented in binary values from all the developed models was not suitable for guiding experimental designs to improve protein solubility. Here we propose new machine learning (ML) models for improving protein solubility in vivo. Results We first implemented a novel approach that predicted protein solubility in continuous numerical values instead of binary ones. After combining it with various ML algorithms, we achieved a R2 of 0.4115 when support vector machine algorithm was used. Continuous values of solubility are more meaningful in protein engineering, as they enable researchers to choose proteins with higher predicted solubility for experimental validation, while binary values fail to distinguish proteins with the same value—there are only two possible values so many proteins have the same one. Availability and implementation We present the ML workflow as a series of IPython notebooks hosted on GitHub (https://github.com/xiaomizhou616/protein_solubility). The workflow can be used as a template for analysis of other expression and solubility datasets. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document