scoring function
Recently Published Documents





2022 ◽  
Vol 16 (4) ◽  
pp. 1-22
Siddharth Bhatia ◽  
Rui Liu ◽  
Bryan Hooi ◽  
Minji Yoon ◽  
Kijung Shin ◽  

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose Midas , which focuses on detecting microcluster anomalies , or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose Midas -F, to solve the problem by which anomalies are incorporated into the algorithm’s internal states, creating a “poisoning” effect that can allow future anomalies to slip through undetected. Midas -F introduces two modifications: (1) we modify the anomaly scoring function, aiming to reduce the “poisoning” effect of newly arriving edges; (2) we introduce a conditional merge step, which updates the algorithm’s data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the “poisoning” effect. Experiments show that Midas -F has significantly higher accuracy than Midas . In general, the algorithms proposed in this work have the following properties: (a) they detects microcluster anomalies while providing theoretical guarantees about the false positive probability; (b) they are online, thus processing each edge in constant time and constant memory, and also processes the data orders-of-magnitude faster than state-of-the-art approaches; and (c) they provides up to 62% higher area under the receiver operating characteristic curve than state-of-the-art approaches.

2022 ◽  
Vol 9 ◽  
Zackary Falls ◽  
Jonathan Fine ◽  
Gaurav Chopra ◽  
Ram Samudrala

The human immunodeficiency virus 1 (HIV-1) protease is an important target for treating HIV infection. Our goal was to benchmark a novel molecular docking protocol and determine its effectiveness as a therapeutic repurposing tool by predicting inhibitor potency to this target. To accomplish this, we predicted the relative binding scores of various inhibitors of the protease using CANDOCK, a hierarchical fragment-based docking protocol with a knowledge-based scoring function. We first used a set of 30 HIV-1 protease complexes as an initial benchmark to optimize the parameters for CANDOCK. We then compared the results from CANDOCK to two other popular molecular docking protocols Autodock Vina and Smina. Our results showed that CANDOCK is superior to both of these protocols in terms of correlating predicted binding scores to experimental binding affinities with a Pearson coefficient of 0.62 compared to 0.48 and 0.49 for Vina and Smina, respectively. We further leveraged the Database of Useful Decoys: Enhanced (DUD-E) HIV protease set to ascertain the effectiveness of each protocol in discriminating active versus decoy ligands for proteases. CANDOCK again displayed better efficacy over the other commonly used molecular docking protocols with area under the receiver operating characteristic curve (AUROC) of 0.94 compared to 0.71 and 0.74 for Vina and Smina. These findings support the utility of CANDOCK to help discover novel therapeutics that effectively inhibit HIV-1 and possibly other retroviral proteases.

Semantic Web ◽  
2022 ◽  
pp. 1-24
Marlene Goncalves ◽  
David Chaves-Fraga ◽  
Oscar Corcho

With the increase of data volume in heterogeneous datasets that are being published following Open Data initiatives, new operators are necessary to help users to find the subset of data that best satisfies their preference criteria. Quantitative approaches such as top-k queries may not be the most appropriate approaches as they require the user to assign weights that may not be known beforehand to a scoring function. Unlike the quantitative approach, under the qualitative approach, which includes the well-known skyline, preference criteria are more intuitive in certain cases and can be expressed more naturally. In this paper, we address the problem of evaluating SPARQL qualitative preference queries over an Ontology-Based Data Access (OBDA) approach, which provides uniform access over multiple and heterogeneous data sources. Our main contribution is Morph-Skyline++, a framework for processing SPARQL qualitative preferences by directly querying relational databases. Our framework implements a technique that translates SPARQL qualitative preference queries directly into queries that can be evaluated by a relational database management system. We evaluate our approach over different scenarios, reporting the effects of data distribution, data size, and query complexity on the performance of our proposed technique in comparison with state-of-the-art techniques. Obtained results suggest that the execution time can be reduced by up to two orders of magnitude in comparison to current techniques scaling up to larger datasets while identifying precisely the result set.

2022 ◽  
Vol 73 ◽  
pp. 231-276
Dominik Peters ◽  
Lan Yu ◽  
Hau Chan ◽  
Edith Elkind

A preference profile is single-peaked on a tree if the candidate set can be equipped with a tree structure so that the preferences of each voter are decreasing from their top candidate along all paths in the tree. This notion was introduced by Demange (1982), and subsequently Trick (1989b) described an efficient algorithm for deciding if a given profile is single-peaked on a tree. We study the complexity of multiwinner elections under several variants of the Chamberlin–Courant rule for preferences single-peaked on trees. We show that in this setting the egalitarian version of this rule admits a polynomial-time winner determination algorithm. For the utilitarian version, we prove that winner determination remains NP-hard for the Borda scoring function; indeed, this hardness results extends to a large family of scoring functions. However, a winning committee can be found in polynomial time if either the number of leaves or the number of internal vertices of the underlying tree is bounded by a constant. To benefit from these positive results, we need a procedure that can determine whether a given profile is single-peaked on a tree that has additional desirable properties (such as, e.g., a small number of leaves). To address this challenge, we develop a structural approach that enables us to compactly represent all trees with respect to which a given profile is single-peaked. We show how to use this representation to efficiently find the best tree for a given profile for use with our winner determination algorithms: Given a profile, we can efficiently find a tree with the minimum number of leaves, or a tree with the minimum number of internal vertices among trees on which the profile is single-peaked. We then explore the power and limitations of this framework: we develop polynomial-time algorithms to find trees with the smallest maximum degree, diameter, or pathwidth, but show that it is NP-hard to check whether a given profile is single-peaked on a tree that is isomorphic to a given tree, or on a regular tree.

2022 ◽  
Vol 1 ◽  
Zhi-Hao Guo ◽  
Li Yuan ◽  
Ya-Lan Tan ◽  
Ben-Gong Zhang ◽  
Ya-Zhou Shi

The 3D architectures of RNAs are essential for understanding their cellular functions. While an accurate scoring function based on the statistics of known RNA structures is a key component for successful RNA structure prediction or evaluation, there are few tools or web servers that can be directly used to make comprehensive statistical analysis for RNA 3D structures. In this work, we developed RNAStat, an integrated tool for making statistics on RNA 3D structures. For given RNA structures, RNAStat automatically calculates RNA structural properties such as size and shape, and shows their distributions. Based on the RNA structure annotation from DSSR, RNAStat provides statistical information of RNA secondary structure motifs including canonical/non-canonical base pairs, stems, and various loops. In particular, the geometry of base-pairing/stacking can be calculated in RNAStat by constructing a local coordinate system for each base. In addition, RNAStat also supplies the distribution of distance between any atoms to the users to help build distance-based RNA statistical potentials. To test the usability of the tool, we established a non-redundant RNA 3D structure dataset, and based on the dataset, we made a comprehensive statistical analysis on RNA structures, which could have the guiding significance for RNA structure modeling. The python code of RNAStat, the dataset used in this work, and corresponding statistical data files are freely available at GitHub (

2022 ◽  
Vol 15 (1) ◽  
pp. 63
Natarajan Arul Murugan ◽  
Artur Podobas ◽  
Davide Gadioli ◽  
Emanuele Vitali ◽  
Gianluca Palermo ◽  

Drug discovery is the most expensive, time-demanding, and challenging project in biopharmaceutical companies which aims at the identification and optimization of lead compounds from large-sized chemical libraries. The lead compounds should have high-affinity binding and specificity for a target associated with a disease, and, in addition, they should have favorable pharmacodynamic and pharmacokinetic properties (grouped as ADMET properties). Overall, drug discovery is a multivariable optimization and can be carried out in supercomputers using a reliable scoring function which is a measure of binding affinity or inhibition potential of the drug-like compound. The major problem is that the number of compounds in the chemical spaces is huge, making the computational drug discovery very demanding. However, it is cheaper and less time-consuming when compared to experimental high-throughput screening. As the problem is to find the most stable (global) minima for numerous protein–ligand complexes (on the order of 106 to 1012), the parallel implementation of in silico virtual screening can be exploited to ensure drug discovery in affordable time. In this review, we discuss such implementations of parallelization algorithms in virtual screening programs. The nature of different scoring functions and search algorithms are discussed, together with a performance analysis of several docking softwares ported on high-performance computing architectures.

2022 ◽  
pp. 29-35
Jianping Du ◽  

With the development of Internet, the electronic resume has gradually replaced the paper one. It is the basic requirement of recruitment for enterprises to retrieve the talent information that fulfills the requirement quickly and without omission.Based on the framework of SpringBoot and Lucence full-text search engine, this paper implements a resume intelligent filtering algorithm, which improves the query speed of the system by establishing an index database. At the same time,the scoring function improves the accuracy of the filtering results, reduces the pressure of high concurrency of the database, improves the work efficiency of the Human Resources Department, and avoids the talent loss.

Electronics ◽  
2021 ◽  
Vol 10 (24) ◽  
pp. 3177
Venkat Anil Adibhatla ◽  
Yu-Chieh Huang ◽  
Ming-Chung Chang ◽  
Hsu-Chi Kuo ◽  
Abhijeet Utekar ◽  

Deep learning methods are currently used in industries to improve the efficiency and quality of the product. Detecting defects on printed circuit boards (PCBs) is a challenging task and is usually solved by automated visual inspection, automated optical inspection, manual inspection, and supervised learning methods, such as you only look once (YOLO) of tiny YOLO, YOLOv2, YOLOv3, YOLOv4, and YOLOv5. Previously described methods for defect detection in PCBs require large numbers of labeled images, which is computationally expensive in training and requires a great deal of human effort to label the data. This paper introduces a new unsupervised learning method for the detection of defects in PCB using student–teacher feature pyramid matching as a pre-trained image classification model used to learn the distribution of images without anomalies. Hence, we extracted the knowledge into a student network which had same architecture as the teacher network. This one-step transfer retains key clues as much as possible. In addition, we incorporated a multi-scale feature matching strategy into the framework. A mixture of multi-level knowledge from the features pyramid passes through a better supervision, known as hierarchical feature alignment, which allows the student network to receive it, thereby allowing for the detection of various sizes of anomalies. A scoring function reflects the probability of the occurrence of anomalies. This framework helped us to achieve accurate anomaly detection. Apart from accuracy, its inference speed also reached around 100 frames per second.

Molecules ◽  
2021 ◽  
Vol 26 (23) ◽  
pp. 7369
Jocelyn Sunseri ◽  
David Ryan Koes

Virtual screening—predicting which compounds within a specified compound library bind to a target molecule, typically a protein—is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.

Processes ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 2158
Minghao Zhang ◽  
Li Shi ◽  
Xiangzhi Zhuo ◽  
Yuan Liu

Supplier network collaborative efficiency evaluation is important content in the transformation and upgrading of intelligent manufacturing enterprises. Aiming at the shortcomings of existing methods, this paper proposes a new method to evaluate the collaborative efficiency of internal members of a complex supplier network based on complex network theory. Based on the analysis of the characteristics of the complex supplier network, from the perspective of the system, the macro supplier network is divided into multiple multi-level supplier micro subsystems with manufacturing enterprises as the core. In order to reasonably quantify the collaboration relationship of members in the subsystem structure model, the collaboration entropy is introduced as a measurement tool, and combined with the hesitation fuzzy scoring function, and the collaborative evaluation model of the complex supplier network is constructed. By quantifying the collaboration relationship among the members in the subsystem and summarizing it step by step and iteratively, the collaborative efficiency evaluation of the complex supplier network from local to overall is realized. Finally, taking a large battery manufacturing enterprise in China as an example, the proposed method is used to calculate the collaboration entropy, collaborative efficiency, and collaboration ratio of members at different supplier network levels. The results verify the effectiveness of the model.

Sign in / Sign up

Export Citation Format

Share Document