scoring functions Latest Research Papers

Preferences Single-Peaked on a Tree: Multiwinner Elections and Structural Results

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12332 ◽

2022 ◽

Vol 73 ◽

pp. 231-276

Author(s):

Dominik Peters ◽

Lan Yu ◽

Hau Chan ◽

Edith Elkind

Keyword(s):

Polynomial Time ◽

Scoring Function ◽

Large Family ◽

Np Hard ◽

Scoring Functions ◽

Winner Determination ◽

Polynomial Time Algorithms ◽

Minimum Number ◽

Number Of Leaves ◽

Positive Results

A preference profile is single-peaked on a tree if the candidate set can be equipped with a tree structure so that the preferences of each voter are decreasing from their top candidate along all paths in the tree. This notion was introduced by Demange (1982), and subsequently Trick (1989b) described an efficient algorithm for deciding if a given profile is single-peaked on a tree. We study the complexity of multiwinner elections under several variants of the Chamberlin–Courant rule for preferences single-peaked on trees. We show that in this setting the egalitarian version of this rule admits a polynomial-time winner determination algorithm. For the utilitarian version, we prove that winner determination remains NP-hard for the Borda scoring function; indeed, this hardness results extends to a large family of scoring functions. However, a winning committee can be found in polynomial time if either the number of leaves or the number of internal vertices of the underlying tree is bounded by a constant. To benefit from these positive results, we need a procedure that can determine whether a given profile is single-peaked on a tree that has additional desirable properties (such as, e.g., a small number of leaves). To address this challenge, we develop a structural approach that enables us to compactly represent all trees with respect to which a given profile is single-peaked. We show how to use this representation to efficiently find the best tree for a given profile for use with our winner determination algorithms: Given a profile, we can efficiently find a tree with the minimum number of leaves, or a tree with the minimum number of internal vertices among trees on which the profile is single-peaked. We then explore the power and limitations of this framework: we develop polynomial-time algorithms to find trees with the smallest maximum degree, diameter, or pathwidth, but show that it is NP-hard to check whether a given profile is single-peaked on a tree that is isomorphic to a given tree, or on a regular tree.

Learning how to search: generating effective test cases through adaptive fitness function selection

Empirical Software Engineering ◽

10.1007/s10664-021-10048-8 ◽

2022 ◽

Vol 27 (2) ◽

Author(s):

Hussein Almulla ◽

Gregory Gay

Keyword(s):

Test Generation ◽

Fitness Function ◽

Generation Process ◽

Scoring Functions ◽

Strategic Choices ◽

Case Examples ◽

Fitness Functions ◽

Function Selection ◽

Test Suites ◽

Effective Fitness

AbstractSearch-based test generation is guided by feedback from one or more fitness functions—scoring functions that judge solution optimality. Choosing informative fitness functions is crucial to meeting the goals of a tester. Unfortunately, many goals—such as forcing the class-under-test to throw exceptions, increasing test suite diversity, and attaining Strong Mutation Coverage—do not have effective fitness function formulations. We propose that meeting such goals requires treating fitness function identification as a secondary optimization step. An adaptive algorithm that can vary the selection of fitness functions could adjust its selection throughout the generation process to maximize goal attainment, based on the current population of test suites. To test this hypothesis, we have implemented two reinforcement learning algorithms in the EvoSuite unit test generation framework, and used these algorithms to dynamically set the fitness functions used during generation for the three goals identified above. We have evaluated our framework, EvoSuiteFIT, on a set of Java case examples. EvoSuiteFIT techniques attain significant improvements for two of the three goals, and show limited improvements on the third when the number of generations of evolution is fixed. Additionally, for two of the three goals, EvoSuiteFIT detects faults missed by the other techniques. The ability to adjust fitness functions allows strategic choices that efficiently produce more effective test suites, and examining these choices offers insight into how to attain our testing goals. We find that adaptive fitness function selection is a powerful technique to apply when an effective fitness function does not already exist for achieving a testing goal.

A Review on Parallel Virtual Screening Softwares for High-Performance Computers

Pharmaceuticals ◽

10.3390/ph15010063 ◽

2022 ◽

Vol 15 (1) ◽

pp. 63

Author(s):

Natarajan Arul Murugan ◽

Artur Podobas ◽

Davide Gadioli ◽

Emanuele Vitali ◽

Gianluca Palermo ◽

...

Keyword(s):

Drug Discovery ◽

Virtual Screening ◽

High Throughput Screening ◽

High Performance ◽

Parallel Implementation ◽

Scoring Function ◽

Scoring Functions ◽

Screening Programs ◽

Lead Compounds ◽

High Performance Computers

Drug discovery is the most expensive, time-demanding, and challenging project in biopharmaceutical companies which aims at the identification and optimization of lead compounds from large-sized chemical libraries. The lead compounds should have high-affinity binding and specificity for a target associated with a disease, and, in addition, they should have favorable pharmacodynamic and pharmacokinetic properties (grouped as ADMET properties). Overall, drug discovery is a multivariable optimization and can be carried out in supercomputers using a reliable scoring function which is a measure of binding affinity or inhibition potential of the drug-like compound. The major problem is that the number of compounds in the chemical spaces is huge, making the computational drug discovery very demanding. However, it is cheaper and less time-consuming when compared to experimental high-throughput screening. As the problem is to find the most stable (global) minima for numerous protein–ligand complexes (on the order of 106 to 1012), the parallel implementation of in silico virtual screening can be exploited to ensure drug discovery in affordable time. In this review, we discuss such implementations of parallelization algorithms in virtual screening programs. The nature of different scoring functions and search algorithms are discussed, together with a performance analysis of several docking softwares ported on high-performance computing architectures.

Soil Quality and Evaluation of Spatial Variability in a Semi-Arid Ecosystem in a Region of the Southeastern Iberian Peninsula (Spain)

Land ◽

10.3390/land11010005 ◽

2021 ◽

Vol 11 (1) ◽

pp. 5

Author(s):

Fernando Santos-Francés ◽

Antonio Martínez-Graña ◽

Carmelo Ávila-Zarza ◽

Marco Criado ◽

Yolanda Sánchez-Sánchez

Keyword(s):

Organic Carbon ◽

Soil Properties ◽

Soil Quality ◽

Water Retention ◽

Mediterranean Ecosystem ◽

Arid Ecosystem ◽

Scoring Functions ◽

Data Set ◽

Preliminary Estimation ◽

Semi Arid

In the last two decades, as the importance of soil has been recognized as a key component of any ecosystem, there has been an increased global demand to establish criteria for determining soil quality and to develop quantitative indices that can be used to classify and compare that quality in different places. The preliminary estimation of the attributes involved in soil quality was made taking into account the opinion of the experts and our own experience in a semi-arid ecosystem. In this study, 16 soil properties have been selected as potential indicators of soil quality, in a region between Campo de Montiel and Sierra de Alcaraz (Spain): sand and clay percentage, pH, electrical conductivity (EC), soil organic carbon (OC), extractables bases of change (Na, K, Ca and Mg), cationic exchange capacity (CEC), carbonate calcium equivalent (CCE), bulk density (BD), water retention at 33 kPa field capacity and 1500 kPa permanent wither point (GWC33 kPa and GWC1500 kPa), coefficient of linear extensibility (COLE) and factor of soil erodibility (K). The main objective has been to develop an adequate index to characterize the quality of the soils in a semi-arid Mediterranean ecosystem. The preliminary estimation of the attributes involved in soil quality was made considering the opinion of the experts and our own experience in semi-arid ecosystems. Two indicator selection approaches have been used to develop the Soil Quality Index (SQI) (total data set -TDS- and minimum data set -MDS-), scoring functions (linear -L- and nonlinear -NL-) and methods (additive -A-, additive weighted -W- and Nemoro -N-. The quality indices have been calculated, considering the properties of the soil control section (between 0 and 100 cm depth), using 185 samples, belonging to horizons A, B and C of 51 soil profiles. The results have shown that the election of the soil properties, both of the topsoil and subsoil, is an important help in establishing a good relationship between quality, soil functions and agricultural management. The Kriging method has been used to determinate the spatial distribution of the soil quality grades. The indices that best reflect the state of soil quality are the TDS-L-W and TDS-L-A should go as sub-indices, as they are the most accurate indices and provide the most consistent results. These indices are especially indicated when carrying out detailed or semi-detailed studies. However, the MDS-L-W and MDS-L-A should go as sub-indices, which use only a limited number of indicators, are best for large-scale studies. The indicators with the greatest influence on soil quality for different land uses and those developed on different rocks, using linear scoring functions, are the following: (Clay), (GWC1500 kPa) and (Ca). These results can also be expressed as follows: the best soils in this region are deep soils, with a clay texture, with high water retention and a neutral or slightly basic pH. However, the indicators with the greatest influence on soil quality, using nonlinear scoring functions, are: (OC Stock), (Ca) and (CaCO3). In other words, the most important indicator is the organic carbon content, which is not logical in the case of a region in which the soils have an excessively low SOC content (0.86%).

A Comparison between Enrichment Optimization Algorithm (EOA)-Based and Docking-Based Virtual Screening

International Journal of Molecular Sciences ◽

10.3390/ijms23010043 ◽

2021 ◽

Vol 23 (1) ◽

pp. 43

Author(s):

Jacob Spiegel ◽

Hanoch Senderowitz

Keyword(s):

Molecular Docking ◽

Virtual Screening ◽

Optimization Algorithm ◽

3D Structure ◽

Predictive Ability ◽

Material Design ◽

3D Qsar ◽

Small Scale ◽

Scoring Functions ◽

Qsar Models

Virtual screening (VS) is a well-established method in the initial stages of many drug and material design projects. VS is typically performed using structure-based approaches such as molecular docking, or various ligand-based approaches. Most docking tools were designed to be as global as possible, and consequently only require knowledge on the 3D structure of the biotarget. In contrast, many ligand-based approaches (e.g., 3D-QSAR and pharmacophore) require prior development of project-specific predictive models. Depending on the type of model (e.g., classification or regression), predictive ability is typically evaluated using metrics of performance on either the training set (e.g.,QCV2) or the test set (e.g., specificity, selectivity or QF1/F2/F32). However, none of these metrics were developed with VS in mind, and consequently, their ability to reliably assess the performances of a model in the context of VS is at best limited. With this in mind we have recently reported the development of the enrichment optimization algorithm (EOA). EOA derives QSAR models in the form of multiple linear regression (MLR) equations for VS by optimizing an enrichment-based metric in the space of the descriptors. Here we present an improved version of the algorithm which better handles active compounds and which also takes into account information on inactive (either known inactive or decoy) compounds. We compared the improved EOA in small-scale VS experiments with three common docking tools, namely, Glide-SP, GOLD and AutoDock Vina, employing five molecular targets (acetylcholinesterase, human immunodeficiency virus type 1 protease, MAP kinase p38 alpha, urokinase-type plasminogen activator, and trypsin I). We found that EOA consistently outperformed all docking tools in terms of the area under the ROC curve (AUC) and EF1% metrics that measured the overall and initial success of the VS process, respectively. This was the case when the docking metrics were calculated based on a consensus approach and when they were calculated based on two different sets of single crystal structures. Finally, we propose that EOA could be combined with molecular docking to derive target-specific scoring functions.

Identification of African Swine Fever Virus Inhibitors through High Performance Virtual Screening Using Machine Learning

International Journal of Molecular Sciences ◽

10.3390/ijms222413414 ◽

2021 ◽

Vol 22 (24) ◽

pp. 13414

Author(s):

Jiwon Choi ◽

Dongseob Tark ◽

Yun-Sook Lim ◽

Soon B. Hwang

Keyword(s):

Machine Learning ◽

High Performance ◽

African Swine Fever Virus ◽

Antiviral Drug ◽

African Swine Fever ◽

Principal Component ◽

Antiviral Drugs ◽

Fever Virus ◽

Scoring Functions

African swine fever virus (ASFV) is a highly contagious virus that causes severe hemorrhagic viral disease resulting in high mortality in domestic and wild pigs, until few antiviral agents can inhibit ASFV infections. Thus, new anti-ASFV drugs need to be urgently identified. Recently, we identified pentagastrin as a potential antiviral drug against ASFVs using molecular docking and machine learning models. However, the scoring functions are easily influenced by properties of protein pockets, resulting in a scoring bias. Here, we employed the 5′-P binding pocket of AsfvPolX as a potential binding site to identify antiviral drugs and classified 13 AsfvPolX structures into three classes based on pocket parameters calculated by the SiteMap module. We then applied principal component analysis to eliminate this scoring bias, which was effective in making the SP Glide score more balanced between 13 AsfvPolX structures in the dataset. As a result, we identified cangrelor and fostamatinib as potential antiviral drugs against ASFVs. Furthermore, the classification of the pocket properties of AsfvPolX protein can provide an alternative approach to identify novel antiviral drugs by optimizing the scoring function of the docking programs. Here, we report a machine learning-based novel approach to generate high binding affinity compounds that are individually matched to the available classification of the pocket properties of AsfvPolX protein.

Sequence based prediction of protein phase separation into disordered condensates using machine learning

10.1101/2021.12.13.472521 ◽

2021 ◽

Author(s):

Pratik Mullick ◽

Antonio Trovato

Keyword(s):

Phase Separation ◽

Area Under The Curve ◽

Protein Sequences ◽

Protein Molecule ◽

Simplex Algorithm ◽

Amino Acid Sequences ◽

Protein Chain ◽

Scoring Functions ◽

Relevant Variables ◽

Liquid Liquid Phase Separation

Several proteins which are responsible for neuro-degenrerative disorders (Alzheimers, Parkinsons etc) are shown to undergo a mechanism known as liquid liquid phase separation (LLPS). We in this research build a predictor which would answer whether a protein molecule would undergo LLPS or not. For this we used some protein sequences for which we already knew the answer. The ones who undergo LLPS were considered as the positive set and the ones who do not, were taken as the negative set. Depending on the knowledge of amino-acid sequences we identified some relevant variables in the context of LLPS e.g. number of amino acids, length of the best pairings, average register shifts. Using these variables we built a number of scoring functions which were basically analytic functions involving these variables and we also combined some scores already existing in the literature. We considered a total of 43636 protein sequences, among them only 121 were positive. We applied logistic regression and performed cross validation, where 25% of the data were used as the training set and the performance of the obtained results were tested on the remaining 75% of the data. In the training process, we used Simplex algorithm to maximize area under the curve (AUC) in receiver operator characteristics (ROC) space for each of the scores we defined. The optimised parameters were then used to evaluate AUC on the test set to check the accuracy. The best performing score was identified as the predicting model to answer the question whether a protein chain would undergo phase separating behavior or not.

Mining sequences with exceptional transition behaviour of varying order using quality measures based on information-theoretic scoring functions

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00808-x ◽

2021 ◽

Author(s):

Rianne M. Schouten ◽

Marcos L. P. Bueno ◽

Wouter Duivesteijn ◽

Mykola Pechenizkiy

Keyword(s):

Markov Chain ◽

Blood Glucose ◽

Markov Chains ◽

Transition Probabilities ◽

Diabetes Type 2 ◽

Quality Measures ◽

Sequential Data ◽

Scoring Functions ◽

Information Theoretic ◽

Transition Behaviour

AbstractDiscrete Markov chains are frequently used to analyse transition behaviour in sequential data. Here, the transition probabilities can be estimated using varying order Markov chains, where order k specifies the length of the sequence history that is used to model these probabilities. Generally, such a model is fitted to the entire dataset, but in practice it is likely that some heterogeneity in the data exists and that some sequences would be better modelled with alternative parameter values, or with a Markov chain of a different order. We use the framework of Exceptional Model Mining (EMM) to discover these exceptionally behaving sequences. In particular, we propose an EMM model class that allows for discovering subgroups with transition behaviour of varying order. To that end, we propose three new quality measures based on information-theoretic scoring functions. Our findings from controlled experiments show that all three quality measures find exceptional transition behaviour of varying order and are reasonably sensitive. The quality measure based on Akaike’s Information Criterion is most robust for the number of observations. We furthermore add to existing work by seeking for subgroups of sequences, as opposite to subgroups of transitions. Since we use sequence-level descriptive attributes, we form subgroups of entire sequences, which is practically relevant in situations where you want to identify the originators of exceptional sequences, such as patients. We show this relevance by analysing sequences of blood glucose values of adult persons with diabetes type 2. In the experiments, we find subgroups of patients based on age and glycated haemoglobin (HbA1c), a measure known to correlate with average blood glucose values. Clinicians and domain experts confirmed the transition behaviour as estimated by the fitted Markov chain models.

Molecular generation by Fast Assembly of (Deep)SMILES fragments

Journal of Cheminformatics ◽

10.1186/s13321-021-00566-4 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Francois Berenger ◽

Koji Tsuda

Keyword(s):

High Frequency ◽

Molecular Diversity ◽

Molecular Design ◽

Peak Performance ◽

Scoring Functions ◽

Training Set ◽

Simple Method ◽

Distribution Matching ◽

Property Profile ◽

Speed Training

Abstract Background In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile. Results In this article, a simple method is described to generate only valid molecules at high frequency ($$>300,000$$ > 300 , 000 molecule/s using a single CPU core), given a molecular training set. The proposed method generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching. When working with DeepSMILES, the method reaches peak performance ($$>340,000$$ > 340 , 000 molecule/s) because it relies almost exclusively on string operations. The “Fast Assembly of SMILES Fragments” software is released as open-source at https://github.com/UnixJunkie/FASMIFRA. Experiments regarding speed, training set distribution matching, molecular diversity and benchmark against several other methods are also shown.

Binding affinity prediction for protein–ligand complex using deep attention mechanism based on intermolecular interactions

BMC Bioinformatics ◽

10.1186/s12859-021-04466-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Sangmin Seo ◽

Jonghwan Choi ◽

Sanghyun Park ◽

Jaegyoon Ahn

Keyword(s):

Deep Learning ◽

Binding Affinity ◽

Prediction Models ◽

Attention Mechanism ◽

Scoring Functions ◽

Ligand Complex ◽

Structure Based Drug Design ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Proposed Model

Abstract Background Accurate prediction of protein–ligand binding affinity is important for lowering the overall cost of drug discovery in structure-based drug design. For accurate predictions, many classical scoring functions and machine learning-based methods have been developed. However, these techniques tend to have limitations, mainly resulting from a lack of sufficient energy terms to describe the complex interactions between proteins and ligands. Recent deep-learning techniques can potentially solve this problem. However, the search for more efficient and appropriate deep-learning architectures and methods to represent protein–ligand complex is ongoing. Results In this study, we proposed a deep-neural network model to improve the prediction accuracy of protein–ligand complex binding affinity. The proposed model has two important features, descriptor embeddings with information on the local structures of a protein–ligand complex and an attention mechanism to highlight important descriptors for binding affinity prediction. The proposed model performed better than existing binding affinity prediction models on most benchmark datasets. Conclusions We confirmed that an attention mechanism can capture the binding sites in a protein–ligand complex to improve prediction performance. Our code is available at https://github.com/Blue1993/BAPA.

scoring functions
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Preferences Single-Peaked on a Tree: Multiwinner Elections and Structural Results

Learning how to search: generating effective test cases through adaptive fitness function selection

A Review on Parallel Virtual Screening Softwares for High-Performance Computers

Soil Quality and Evaluation of Spatial Variability in a Semi-Arid Ecosystem in a Region of the Southeastern Iberian Peninsula (Spain)

A Comparison between Enrichment Optimization Algorithm (EOA)-Based and Docking-Based Virtual Screening

Identification of African Swine Fever Virus Inhibitors through High Performance Virtual Screening Using Machine Learning

Sequence based prediction of protein phase separation into disordered condensates using machine learning

Mining sequences with exceptional transition behaviour of varying order using quality measures based on information-theoretic scoring functions

Molecular generation by Fast Assembly of (Deep)SMILES fragments

Binding affinity prediction for protein–ligand complex using deep attention mechanism based on intermolecular interactions

Export Citation Format

scoring functionsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Preferences Single-Peaked on a Tree: Multiwinner Elections and Structural Results

Learning how to search: generating effective test cases through adaptive fitness function selection

A Review on Parallel Virtual Screening Softwares for High-Performance Computers

Soil Quality and Evaluation of Spatial Variability in a Semi-Arid Ecosystem in a Region of the Southeastern Iberian Peninsula (Spain)

A Comparison between Enrichment Optimization Algorithm (EOA)-Based and Docking-Based Virtual Screening

Identification of African Swine Fever Virus Inhibitors through High Performance Virtual Screening Using Machine Learning

Sequence based prediction of protein phase separation into disordered condensates using machine learning

Mining sequences with exceptional transition behaviour of varying order using quality measures based on information-theoretic scoring functions

Molecular generation by Fast Assembly of (Deep)SMILES fragments

Binding affinity prediction for protein–ligand complex using deep attention mechanism based on intermolecular interactions

scoring functions
Recently Published Documents