Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning

We present quantum algorithms for performing nearest-neighbor learning and $k$--means clustering. At the core of our algorithms are fast and coherent quantum methods for computing the Euclidean distance both directly and via the inner product which we couple with methods for performing amplitude estimation that do not require measurement. We prove upper bounds on the number of queries to the input data required to compute such distances and find the nearest vector to a given test example. In the worst case, our quantum algorithms lead to polynomial reductions in query complexity relative to Monte Carlo algorithms. We also study the performance of our quantum nearest-neighbor algorithms on several real-world binary classification tasks and find that the classification accuracy is competitive with classical methods.

Download Full-text

Quantum algorithms and approximating polynomials for composed functions with shared inputs

Quantum ◽

10.22331/q-2021-09-16-543 ◽

2021 ◽

Vol 5 ◽

pp. 543

Author(s):

Mark Bun ◽

Robin Kothari ◽

Justin Thaler

Keyword(s):

Quantum Algorithms ◽

Time Algorithm ◽

Linear Size ◽

Query Complexity ◽

Inner Product ◽

Worst Case ◽

Subexponential Time ◽

Quantum Query Complexity ◽

Composed Functions ◽

Quantum Query

We give new quantum algorithms for evaluating composed functions whose inputs may be shared between bottom-level gates. Let f be an m-bit Boolean function and consider an n-bit function F obtained by applying f to conjunctions of possibly overlapping subsets of n variables. If f has quantum query complexity Q(f), we give an algorithm for evaluating F using O~(Q(f)⋅n) quantum queries. This improves on the bound of O(Q(f)⋅n) that follows by treating each conjunction independently, and our bound is tight for worst-case choices of f. Using completely different techniques, we prove a similar tight composition theorem for the approximate degree of f.By recursively applying our composition theorems, we obtain a nearly optimal O~(n1−2−d) upper bound on the quantum query complexity and approximate degree of linear-size depth-d AC0 circuits. As a consequence, such circuits can be PAC learned in subexponential time, even in the challenging agnostic setting. Prior to our work, a subexponential-time algorithm was not known even for linear-size depth-3 AC0 circuits.As an additional consequence, we show that AC0∘⊕ circuits of depth d+1 require size Ω~(n1/(1−2−d))≥ω(n1+2−d) to compute the Inner Product function even on average. The previous best size lower bound was Ω(n1+4−(d+1)) and only held in the worst case (Cheraghchi et al., JCSS 2018).

Download Full-text

Faster Coherent Quantum Algorithms for Phase, Energy, and Amplitude Estimation

Quantum ◽

10.22331/q-2021-10-19-566 ◽

2021 ◽

Vol 5 ◽

pp. 566

Author(s):

Patrick Rall

Keyword(s):

Quantum Algorithms ◽

Singular Value ◽

Query Complexity ◽

Phase Estimation ◽

Input State ◽

Quantum Fourier Transform ◽

Sorting Network ◽

Estimation Algorithms ◽

Amplitude Estimation ◽

Novel Algorithms

We consider performing phase estimation under the following conditions: we are given only one copy of the input state, the input state does not have to be an eigenstate of the unitary, and the state must not be measured. Most quantum estimation algorithms make assumptions that make them unsuitable for this 'coherent' setting, leaving only the textbook approach. We present novel algorithms for phase, energy, and amplitude estimation that are both conceptually and computationally simpler than the textbook method, featuring both a smaller query complexity and ancilla footprint. They do not require a quantum Fourier transform, and they do not require a quantum sorting network to compute the median of several estimates. Instead, they use block-encoding techniques to compute the estimate one bit at a time, performing all amplification via singular value transformation. These improved subroutines accelerate the performance of quantum Metropolis sampling and quantum Bayesian inference.

Download Full-text

Bases for Strategy Choice and Strategy Transitions in Binary Classification Tasks

PsycEXTRA Dataset ◽

10.1037/e537052012-284 ◽

2004 ◽

Author(s):

Lyle E. Bourne ◽

Alice F. Healy ◽

James A. Kole ◽

William D. Raymond

Keyword(s):

Binary Classification ◽

Strategy Choice ◽

Classification Tasks

Download Full-text

The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions

10.26434/chemrxiv.12777566.v1 ◽

2020 ◽

Author(s):

Cameron Hargreaves ◽

Matthew Dyer ◽

Michael Gaultois ◽

Vitaliy Kurlin ◽

Matthew J Rosseinsky

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Inorganic Crystal Structure Database ◽

Earth Mover’S Distance ◽

Chemical Similarity ◽

Earth Mover's Distance ◽

Neighbor Search ◽

The Earth ◽

Binary Compounds

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.

Download Full-text

Evaluating Nonlinear Decision Trees for Binary Classification Tasks with Other Existing Methods

2020 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci47803.2020.9308505 ◽

2020 ◽

Author(s):

Yashesh Dhebar ◽

Sparsh Gupta ◽

Kalyanmoy Deb

Keyword(s):

Decision Trees ◽

Binary Classification ◽

Classification Tasks

Download Full-text

Exact quantum algorithms have advantage for almost all Boolean functions

Quantum Information and Computation ◽

10.26421/qic15.5-6-5 ◽

2015 ◽

pp. 435-452

Author(s):

Andris Ambainis ◽

Jozef Gruska ◽

Shenggen Zheng

Keyword(s):

Boolean Function ◽

Boolean Functions ◽

Quantum Algorithms ◽

Quantum Algorithm ◽

Query Complexity ◽

Exact Quantum ◽

Quantum Query Complexity ◽

Almost All ◽

Quantum Query

It has been proved that almost all n-bit Boolean functions have exact classical query complexity n. However, the situation seemed to be very different when we deal with exact quantum query complexity. In this paper, we prove that almost all n-bit Boolean functions can be computed by an exact quantum algorithm with less than n queries. More exactly, we prove that ANDn is the only n-bit Boolean function, up to isomorphism, that requires n queries.

Download Full-text

Angle Bisector Algorithm and Modified Dynamic Programming Algorithm for Dubins Traveling Salesman Problem

10.36227/techrxiv.14767593.v1 ◽

2021 ◽

Author(s):

Eswara Venkata Kumar Dhulipala

Keyword(s):

Euclidean Distance ◽

Travelling Salesman Problem ◽

Dynamic Programming Algorithm ◽

Constant Factor ◽

Programming Algorithm ◽

Worst Case ◽

Angle Bisector ◽

Set Of Points ◽

The Given ◽

Tour Length

A Dubin's Travelling Salesman Problem (DTSP) of finding a minimum length tour through a given set of points is considered. DTSP has a Dubins vehicle, which is capable of moving only forward with constant speed. In this paper, first, a worst case upper bound is obtained on DTSP tour length by assuming DTSP tour sequence same as Euclidean Travelling Salesman Problem (ETSP) tour sequence. It is noted that, in the worst case, \emph{any algorithm that uses of ETSP tour sequence} is a constant factor approximation algorithm for DTSP. Next, two new algorithms are introduced, viz., Angle Bisector Algorithm (ABA) and Modified Dynamic Programming Algorithm (MDPA). In ABA, ETSP tour sequence is used as DTSP tour sequence and orientation angle at each point $i_k$ are calculated by using angle bisector of the relative angle formed between the rays $i_{k}i_{k-1}$ and $i_ki_{k+1}$. In MDPA, tour sequence and orientation angles are computed in an integrated manner. It is shown that the ABA and MDPA are constant factor approximation algorithms and ABA provides an improved upper bound as compared to Alternating Algorithm (AA) \cite{savla2008traveling}. Through numerical simulations, we show that ABA provides an improved tour length compared to AA, Single Vehicle Algorithm (SVA) \cite{rathinam2007resource} and Optimized Heading Algorithm (OHA) \cite{babel2020new,manyam2018tightly} when the Euclidean distance between any two points in the given set of points is at least $4\rho$ where $\rho$ is the minimum turning radius. The time complexity of ABA is comparable with AA and SVA and is better than OHA. Also we show that MDPA provides an improved tour length compared to AA and SVA and is comparable with OHA when there is no constraint on Euclidean distance between the points. In particular, ABA gives a tour length which is at most $4\%$ more than the ETSP tour length when the Euclidean distance between any two points in the given set of points is at least $4\rho$.

Download Full-text

Constrained K-Means Classification

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2149 ◽

2018 ◽

Vol 8 (4) ◽

pp. 3203-3208

Author(s):

P. N. Smyrlis ◽

D. C. Tsouros ◽

M. G. Tsipouras

Keyword(s):

Euclidean Distance ◽

Similarity Criterion ◽

Experimental Results ◽

Classification Tasks

Classification-via-clustering (CvC) is a widely used method, using a clustering procedure to perform classification tasks. In this paper, a novel K-Means-based CvC algorithm is presented, analysed and evaluated. Two additional techniques are employed to reduce the effects of the limitations of K-Means. A hypercube of constraints is defined for each centroid and weights are acquired for each attribute of each class, for the use of a weighted Euclidean distance as a similarity criterion in the clustering procedure. Experiments are made with 42 well–known classification datasets. The experimental results demonstrate that the proposed algorithm outperforms CvC with simple K-Means.

Download Full-text

MobileNetV2 Ensemble for Cervical Precancerous Lesions Classification

Processes ◽

10.3390/pr8050595 ◽

2020 ◽

Vol 8 (5) ◽

pp. 595 ◽

Cited By ~ 1

Author(s):

Cătălin Buiu ◽

Vlad-Rareş Dănăilă ◽

Cristina Nicoleta Răduţă

Keyword(s):

Uterine Cervix ◽

Precancerous Lesions ◽

Binary Classification ◽

The United States ◽

Medical Procedure ◽

Medical Doctors ◽

Analysis Framework ◽

Cervical Precancerous Lesions ◽

Classification Tasks

Women’s cancers remain a major challenge for many health systems. Between 1991 and 2017, the death rate for all major cancers fell continuously in the United States, excluding uterine cervix and uterine corpus cancers. Together with HPV (Human Papillomavirus) testing and cytology, colposcopy has played a central role in cervical cancer screening. This medical procedure allows physicians to view the cervix at a magnification of up to 10%. This paper presents an automated colposcopy image analysis framework for the classification of precancerous and cancerous lesions of the uterine cervix. This framework is based on an ensemble of MobileNetV2 networks. Our experimental results show that this method achieves accuracies of 83.33% and 91.66% on the four-class and binary classification tasks, respectively. These results are promising for the future use of automatic classification methods based on deep learning as tools to support medical doctors.

Download Full-text

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

BMC Genomics ◽

10.1186/s12864-019-6413-7 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 149

Author(s):

Davide Chicco ◽

Giuseppe Jurman

Keyword(s):

Correlation Coefficient ◽

Matthews Correlation Coefficient ◽

Binary Classification ◽

Confusion Matrix ◽

Mathematical Properties ◽

Statistical Measures ◽

Positive Elements ◽

Classification Tasks ◽

Classification Evaluation ◽

Confusion Matrices

Abstract Background To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. Results The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. Conclusions In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.

Download Full-text