Quantum cluster algorithm for data classification

AbstractWe present a quantum algorithm for data classification based on the nearest-neighbor learning algorithm. The classification algorithm is divided into two steps: Firstly, data in the same class is divided into smaller groups with sublabels assisting building boundaries between data with different labels. Secondly we construct a quantum circuit for classification that contains multi control gates. The algorithm is easy to implement and efficient in predicting the labels of test data. To illustrate the power and efficiency of this approach, we construct the phase transition diagram for the metal-insulator transition of VO2, using limited trained experimental data, where VO2 is a typical strongly correlated electron materials, and the metallic-insulating phase transition has drawn much attention in condensed matter physics. Moreover, we demonstrate our algorithm on the classification of randomly generated data and the classification of entanglement for various Werner states, where the training sets can not be divided by a single curve, instead, more than one curves are required to separate them apart perfectly. Our preliminary result shows considerable potential for various classification problems, particularly for constructing different phases in materials.

Download Full-text

Exploring the Uncertainty Space of Ensemble Classifiers in Face Recognition

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415560029 ◽

2015 ◽

Vol 29 (03) ◽

pp. 1556002 ◽

Cited By ~ 7

Author(s):

Juan Luis Fernández-Martínez ◽

Ana Cernea

Keyword(s):

Face Recognition ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Majority Voting ◽

Borda Count ◽

Discrete Wavelet ◽

Final Decision ◽

Ensemble Classifiers ◽

Classification Problems ◽

Uncertainty Space

In this paper, we present a supervised ensemble learning algorithm, called SCAV1, and its application to face recognition. This algorithm exploits the uncertainty space of the ensemble classifiers. Its design includes six different nearest-neighbor (NN) classifiers that are based on different and diverse image attributes: histogram, variogram, texture analysis, edges, bidimensional discrete wavelet transform and Zernike moments. In this approach each attribute, together with its corresponding type of the analysis (local or global), and the distance criterion (p-norm) induces a different individual NN classifier. The ensemble classifier SCAV1 depends on a set of parameters: the number of candidate images used by each individual method to perform the final classification and the individual weights given to each individual classifier. SCAV1 parameters are optimized/sampled using a supervised approach via the regressive particle swarm optimization algorithm (RR-PSO). The final classifier exploits the uncertainty space of SCAV1 and uses majority voting (Borda Count) as a final decision rule. We show the application of this algorithm to the ORL and PUT image databases, obtaining very high and stable accuracies (100% median accuracy and almost null interquartile range). In conclusion, exploring the uncertainty space of ensemble classifiers provides optimum results and seems to be the appropriate strategy to adopt for face recognition and other classification problems.

Download Full-text

Application of K-Nearest Neighbor Algorithm on Classification of Disk Hernia and Spondylolisthesis in Vertebral Column

Indonesian Journal of Information Systems ◽

10.24002/ijis.v2i1.2352 ◽

2019 ◽

Vol 2 (1) ◽

pp. 57 ◽

Cited By ~ 1

Author(s):

Irma Handayani

Keyword(s):

Vertebral Column ◽

Nearest Neighbor ◽

Average Length ◽

Data Classification ◽

The Body ◽

Training Data ◽

K Nearest Neighbor ◽

Sample Data ◽

K Nearest Neighbor Algorithm

Vertebral column as a part of backbone has important role in human body. Trauma in vertebral column can affect spinal cord capability to send and receive messages from brain to the body system that controls sensory and motoric movement. Disk hernia and spondylolisthesis are examples of pathologies on the vertebral column. Research about pathology or damage bones and joints of skeletal system classification is rare whereas the classification system can be used by radiologists as a second opinion so that can improve productivity and diagnosis consistency of the radiologists. This research used dataset Vertebral Column that has three classes (Disk Hernia, Spondylolisthesis and Normal) and instances in UCI Machine Learning. This research applied the K-NN algorithm for classification of disk hernia and spondylolisthesis in vertebral column. The data were then classified into two different but related classification tasks: “normal” and “abnormal”. K-NN algorithm adopts the approach of data classification by optimizing sample data that can be used as a reference for training data to produce vertebral column data classification based on the learning process. The results showed that the accuracy of K-NN classifier was 83%. The average length of time needed to classify the K-NN classifier was 0.000212303 seconds.

Download Full-text

Rough set based ensemble learning algorithm for agricultural data classification

Filomat ◽

10.2298/fil1805917s ◽

2018 ◽

Vol 32 (5) ◽

pp. 1917-1930 ◽

Cited By ~ 1

Author(s):

Lei Shi ◽

Qiguo Duan ◽

Juanjuan Zhang ◽

Lei Xi ◽

Hongbo Qiao ◽

...

Keyword(s):

Ensemble Learning ◽

Rough Set ◽

Rough Set Theory ◽

Learning Algorithm ◽

Uncertain Data ◽

Data Classification ◽

Research Area ◽

Experimental Comparison ◽

Classification Problems ◽

Ensemble Learning Algorithm

Agricultural data classification attracts more and more attention in the research area of intelligent agriculture. As a kind of important machine learning methods, ensemble learning uses multiple base classifiers to deal with classification problems. The rough set theory is a powerful mathematical approach to process unclear and uncertain data. In this paper, a rough set based ensemble learning algorithm is proposed to classify the agricultural data effectively and efficiently. An experimental comparison of different algorithms is conducted on four agricultural datasets. The results of experiment indicate that the proposed algorithm improves performance obviously.

Download Full-text

Analyzing Heuristic-based Randomized Search Strategies for the Quantum Circuit Compilation Problem

Fundamenta Informaticae ◽

10.3233/fi-2020-1942 ◽

2020 ◽

Vol 174 (3-4) ◽

pp. 259-281

Author(s):

Angelo Oddi ◽

Riccardo Rasconi

Keyword(s):

Quantum Computing ◽

Nearest Neighbor ◽

Quantum Algorithm ◽

Quantum Circuit ◽

Quantum Circuits ◽

Quantum Gates ◽

Ranking Functions ◽

Solution Quality ◽

Circuit Realization ◽

Definition Of

In this work we investigate the performance of greedy randomised search (GRS) techniques to the problem of compiling quantum circuits to emerging quantum hardware. Quantum computing (QC) represents the next big step towards power consumption minimisation and CPU speed boost in the future of computing machines. Quantum computing uses quantum gates that manipulate multi-valued bits (qubits). A quantum circuit is composed of a number of qubits and a series of quantum gates that operate on those qubits, and whose execution realises a specific quantum algorithm. Current quantum computing technologies limit the qubit interaction distance allowing the execution of gates between adjacent qubits only. This has opened the way to the exploration of possible techniques aimed at guaranteeing nearest-neighbor (NN) compliance in any quantum circuit through the addition of a number of so-called swap gates between adjacent qubits. In addition, technological limitations (decoherence effect) impose that the overall duration (makespan) of the quantum circuit realization be minimized. One core contribution of the paper is the definition of two lexicographic ranking functions for quantum gate selection, using two keys: one key acts as a global closure metric to minimise the solution makespan; the second one is a local metric, which favours the mutual approach of the closest qstates pairs. We present a GRS procedure that synthesises NN-compliant quantum circuits realizations, starting from a set of benchmark instances of different size belonging to the Quantum Approximate Optimization Algorithm (QAOA) class tailored for the MaxCut problem. We propose a comparison between the presented meta-heuristics and the approaches used in the recent literature against the same benchmarks, both from the CPU efficiency and from the solution quality standpoint. In particular, we compare our approach against a reference benchmark initially proposed and subsequently expanded in [1] by considering: (i) variable qubit state initialisation and (ii) crosstalk constraints that further restrict parallel gate execution.

Download Full-text

A Data Classification Model: For Effective Classification of Intrusion in an Intrusion Detection System Based on Decision Tree Learning Algorithm

Information and Communication Technology for Sustainable Development - Lecture Notes in Networks and Systems ◽

10.1007/978-981-10-3932-4_7 ◽

2017 ◽

pp. 61-66

Author(s):

Latika Mehrotra ◽

Prashant Sahai Saxena ◽

Nitika Vats Doohan

Keyword(s):

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Learning Algorithm ◽

Detection System ◽

Data Classification ◽

Classification Model ◽

Decision Tree Learning

Download Full-text

DEVELOPMENT AND COMPARATIVE ANALYSIS OF SEMI-SUPERVISED LEARNING ALGORITHMS ON A SMALL AMOUNT OF LABELED DATA

Bulletin of National Technical University KhPI Series System Analysis Control and Information Technologies ◽

10.20998/2079-0023.2021.01.16 ◽

2021 ◽

pp. 98-103

Author(s):

Klym Yamkovyi

Keyword(s):

Supervised Learning ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Center Of Mass ◽

Unlabeled Data ◽

Learning Approaches ◽

Classification Problems ◽

K Nearest Neighbor ◽

Supervised Learning Algorithms ◽

Label Information

The paper is dedicated to the development and comparative experimental analysis of semi-supervised learning approaches based on a mix of unsupervised and supervised approaches for the classification of datasets with a small amount of labeled data, namely, identifying to which of a set of categories a new observation belongs using a training set of data containing observations whose category membership is known. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Unlabeled data, when used in combination with a small quantity of labeled data, can produce significant improvement in learning accuracy. The goal is semi-supervised methods development and analysis along with comparing their accuracy and robustness on different synthetics datasets. The proposed approach is based on the unsupervised K-medoids methods, also known as the Partitioning Around Medoid algorithm, however, unlike Kmedoids the proposed algorithm first calculates medoids using only labeled data and next process unlabeled classes – assign labels of nearest medoid. Another proposed approach is the mix of the supervised method of K-nearest neighbor and unsupervised K-Means. Thus, the proposed learning algorithm uses information about both the nearest points and classes centers of mass. The methods have been implemented using Python programming language and experimentally investigated for solving classification problems using datasets with different distribution and spatial characteristics. Datasets were generated using the scikit-learn library. Was compared the developed approaches to find average accuracy on all these datasets. It was shown, that even small amounts of labeled data allow us to use semi-supervised learning, and proposed modifications ensure to improve accuracy and algorithm performance, which was demonstrated during experiments. And with the increase of available label information accuracy of the algorithms grows up. Thus, the developed algorithms are using a distance metric that considers available label information. Keywords: Unsupervised learning, supervised learning. semi-supervised learning, clustering, distance, distance function, nearest neighbor, medoid, center of mass.

Download Full-text

Population-Based Feature Selection for Biomedical Data Classification

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch016 ◽

2014 ◽

pp. 296-326 ◽

Cited By ~ 2

Author(s):

Seyed Jalaleddin Mousavirad ◽

Hossein Ebrahimpour-Komleh

Keyword(s):

Feature Selection ◽

Learning Algorithm ◽

Selection Process ◽

Data Classification ◽

Population Based ◽

Statistical Characteristics ◽

Biomedical Data ◽

Filter Methods ◽

Embedded Methods

Classification of biomedical data plays a significant role in prediction and diagnosis of disease. The existence of redundant and irrelevant features is one of the major problems in biomedical data classification. Excluding these features can improve the performance of classification algorithm. Feature selection is the problem of selecting a subset of features without reducing the accuracy of the original set of features. These algorithms are divided into three categories: wrapper, filter, and embedded methods. Wrapper methods use the learning algorithm for selection of features while filter methods use statistical characteristics of data. In the embedded methods, feature selection process combines with the learning process. Population-based metaheuristics can be applied for wrapper feature selection. In these algorithms, a population of candidate solutions is created. Then, they try to improve the objective function using some operators. This chapter presents the application of population-based feature selection to deal with issues of high dimensionality in the biomedical data classification. The result shows that population-based feature selection has presented acceptable performance in biomedical data classification.

Download Full-text

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

Molecules ◽

10.3390/molecules24152811 ◽

2019 ◽

Vol 24 (15) ◽

pp. 2811 ◽

Cited By ~ 4

Author(s):

Rácz ◽

Bajusz ◽

Héberger

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Learning Algorithm ◽

Classification Problems ◽

Machine Learning Classification ◽

Learning Tasks ◽

Sum Of Ranking Differences ◽

Multi Level

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

Download Full-text

A comparative analysis of text data classification accuracy and speed using neural networks, Bloom filter and naive Bayes

Technology audit and production reserves ◽

10.15587/2706-5448.2021.237767 ◽

2021 ◽

Vol 5 (2(61)) ◽

pp. 6-8

Author(s):

Olena Hryshchenko ◽

Vadym Yaremenko

Keyword(s):

Neural Networks ◽

Data Classification ◽

Bloom Filter ◽

Data Sets ◽

Classification Problems ◽

Text Data ◽

Textual Data ◽

Fast Classification ◽

Parameter Values

The object of research is the methods of fast classification for solving text data classification problems. The need for this study is due to the rapid growth of textual data, both in digital and printed forms. Thus, there is a need to process such data using software, since human resources are not able to process such an amount of data in full. A large number of data classification approaches have been developed. The conducted research is based on the application of the following methods of classification of text data: Bloom filter, naive Bayesian classifier and neural networks to a set of text data in order to classify them into categories. Each method has both disadvantages and advantages. This paper will reflect the strengths and weaknesses of each method on a specific example. These algorithms were comparatively among themselves in terms of speed and efficiency, that is, the accuracy of determining the belonging of a text to a certain class of classification. The work of each method was considered on the same data sets with a change in the amount of training and test data, as well as with a change in the number of classification groups. The dataset used contains the following classes: world, business, sports, and science and technology. In real conditions of the classification of such data, the number of categories is much larger than that considered in the work, and may have subcategories in its composition. In the course of this study, each method was analyzed using different parameter values to obtain the best result. Analyzing the results obtained, the best results for the classification of text data were obtained using a neural network.

Download Full-text

COVID-19 Pneumonia Level Detection using Deep Learning Algorithm

10.36227/techrxiv.12619193.v1 ◽

2020 ◽

Cited By ~ 2

Author(s):

Kayhan Ghafoor

Keyword(s):

Lung Inflammation ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Second Phase ◽

K Nearest Neighbor ◽

Deep Learning Algorithm ◽

Severe Stage ◽

Efficient Detection ◽

Two Phases

The first COVID-19 confirmed case is reported in Wuhan, China and spread across the globe with unprecedented impact on humanity. Since this pandemic requires pervasive diagnosis, it is significant to develop smart, fast and efficient detection technique. To this end, we developed an Artificial Intelligence (AI) engine to classify the lung inflammation level (mild, progressive, severe stage) of the COVID-19 confirmed patient. In particular, the developed model consists of two phases; in the first phase, we calculate the volume and density of lesions and opacities of the CT images of the confirmed COVID-19 patient using Morphological approaches. In the second phase, the second phase classifies the pneumonia level of the confirmed COVID-19 patient. To achieve precise classification of lung inflammation, we use modified Convolution Neural Network (CNN) and k-Nearest Neighbor (kNN). The result of the experiments show that the utilized models can provide the accuracy up to 95.65\% and 91.304 \% of CNN and kNN respectively.<br>

Download Full-text