A New Binary Classifier: Clustering-Launched Classification

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.

Download Full-text

The Computational Complexity of Understanding Binary Classifier Decisions

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12359 ◽

2021 ◽

Vol 70 ◽

Author(s):

Stephan Waeldchen ◽

Jan Macdonald ◽

Sascha Hauch ◽

Gitta Kutyniok

Keyword(s):

Polynomial Time Algorithm ◽

Time Algorithm ◽

Boolean Circuits ◽

Binary Classifier ◽

Minimal Set ◽

Limited Size ◽

Approximation Factor ◽

Special Cases ◽

Relevant Variables ◽

Binary Classifiers

For a d-ary Boolean function Φ: {0, 1}d → {0, 1} and an assignment to its variables x = (x1, x2, . . . , xd) we consider the problem of finding those subsets of the variables that are sufficient to determine the function value with a given probability δ. This is motivated by the task of interpreting predictions of binary classifiers described as Boolean circuits, which can be seen as special cases of neural networks. We show that the problem of deciding whether such subsets of relevant variables of limited size k ≤ d exist is complete for the complexity class NPPP and thus, generally, unfeasible to solve. We then introduce a variant, in which it suffices to check whether a subset determines the function value with probability at least δ or at most δ − γ for 0 < γ < δ. This promise of a probability gap reduces the complexity to the class NPBPP. Finally, we show that finding the minimal set of relevant variables cannot be reasonably approximated, i.e. with an approximation factor d1−α for α > 0, by a polynomial time algorithm unless P = NP. This holds even with the promise of a probability gap.

Download Full-text

A nonparametric ensemble binary classifier and its statistical properties

Statistics & Probability Letters ◽

10.1016/j.spl.2019.01.021 ◽

2019 ◽

Vol 149 ◽

pp. 16-23 ◽

Cited By ~ 6

Author(s):

Tanujit Chakraborty ◽

Ashis Kumar Chakraborty ◽

C.A. Murthy

Keyword(s):

Statistical Properties ◽

Binary Classifier

Download Full-text

Binary Classifier Inspired by Quantum Theory

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110051 ◽

2019 ◽

Vol 33 ◽

pp. 10051-10052 ◽

Cited By ~ 3

Author(s):

Prayag Tiwari ◽

Massimo Melucci

Keyword(s):

Machine Learning ◽

Quantum Theory ◽

State Of The Art ◽

Substantial Improvement ◽

The State ◽

Food Technology ◽

Binary Classifier ◽

Raw Data ◽

Probability And Statistics ◽

Agricultural Food

Machine Learning (ML) helps us to recognize patterns from raw data. ML is used in numerous domains i.e. biomedical, agricultural, food technology, etc. Despite recent technological advancements, there is still room for substantial improvement in prediction. Current ML models are based on classical theories of probability and statistics, which can now be replaced by Quantum Theory (QT) with the aim of improving the effectiveness of ML. In this paper, we propose the Binary Classifier Inspired by Quantum Theory (BCIQT) model, which outperforms the state of the art classification in terms of recall for every category.

Download Full-text

Fido-SNP: the first webserver for scoring the impact of single nucleotide variants in the dog genome

Nucleic Acids Research ◽

10.1093/nar/gkz420 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W136-W141 ◽

Cited By ~ 1

Author(s):

Emidio Capriotti ◽

Ludovica Montanucci ◽

Giuseppe Profiti ◽

Ivan Rossi ◽

Diana Giannuzzi ◽

...

Keyword(s):

Matthews Correlation Coefficient ◽

Genomic Variation ◽

Gradient Boosting ◽

Binary Classifier ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Coding Regions ◽

Variation Data ◽

Boosting Algorithm ◽

The Impact

Abstract As the amount of genomic variation data increases, tools that are able to score the functional impact of single nucleotide variants become more and more necessary. While there are several prediction servers available for interpreting the effects of variants in the human genome, only few have been developed for other species, and none were specifically designed for species of veterinary interest such as the dog. Here, we present Fido-SNP the first predictor able to discriminate between Pathogenic and Benign single-nucleotide variants in the dog genome. Fido-SNP is a binary classifier based on the Gradient Boosting algorithm. It is able to classify and score the impact of variants in both coding and non-coding regions based on sequence features within seconds. When validated on a previously unseen set of annotated variants from the OMIA database, Fido-SNP reaches 88% overall accuracy, 0.77 Matthews correlation coefficient and 0.91 Area Under the ROC Curve.

Download Full-text