scholarly journals Quantum cluster algorithm for data classification

2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Junxu Li ◽  
Sabre Kais

AbstractWe present a quantum algorithm for data classification based on the nearest-neighbor learning algorithm. The classification algorithm is divided into two steps: Firstly, data in the same class is divided into smaller groups with sublabels assisting building boundaries between data with different labels. Secondly we construct a quantum circuit for classification that contains multi control gates. The algorithm is easy to implement and efficient in predicting the labels of test data. To illustrate the power and efficiency of this approach, we construct the phase transition diagram for the metal-insulator transition of VO2, using limited trained experimental data, where VO2 is a typical strongly correlated electron materials, and the metallic-insulating phase transition has drawn much attention in condensed matter physics. Moreover, we demonstrate our algorithm on the classification of randomly generated data and the classification of entanglement for various Werner states, where the training sets can not be divided by a single curve, instead, more than one curves are required to separate them apart perfectly. Our preliminary result shows considerable potential for various classification problems, particularly for constructing different phases in materials.

Author(s):  
Juan Luis Fernández-Martínez ◽  
Ana Cernea

In this paper, we present a supervised ensemble learning algorithm, called SCAV1, and its application to face recognition. This algorithm exploits the uncertainty space of the ensemble classifiers. Its design includes six different nearest-neighbor (NN) classifiers that are based on different and diverse image attributes: histogram, variogram, texture analysis, edges, bidimensional discrete wavelet transform and Zernike moments. In this approach each attribute, together with its corresponding type of the analysis (local or global), and the distance criterion (p-norm) induces a different individual NN classifier. The ensemble classifier SCAV1 depends on a set of parameters: the number of candidate images used by each individual method to perform the final classification and the individual weights given to each individual classifier. SCAV1 parameters are optimized/sampled using a supervised approach via the regressive particle swarm optimization algorithm (RR-PSO). The final classifier exploits the uncertainty space of SCAV1 and uses majority voting (Borda Count) as a final decision rule. We show the application of this algorithm to the ORL and PUT image databases, obtaining very high and stable accuracies (100% median accuracy and almost null interquartile range). In conclusion, exploring the uncertainty space of ensemble classifiers provides optimum results and seems to be the appropriate strategy to adopt for face recognition and other classification problems.


2019 ◽  
Vol 2 (1) ◽  
pp. 57 ◽  
Author(s):  
Irma Handayani

Vertebral column as a part of backbone has important role in human body. Trauma in vertebral column can affect spinal cord capability to send and receive messages from brain to the body system that controls sensory and motoric movement. Disk hernia and spondylolisthesis are examples of pathologies on the vertebral column. Research about pathology or damage bones and joints of skeletal system classification is rare whereas the classification system can be used by radiologists as a second opinion so that can improve productivity and diagnosis consistency of the radiologists. This research used dataset Vertebral Column that has three classes (Disk Hernia, Spondylolisthesis and Normal) and instances in UCI Machine Learning. This research applied the K-NN algorithm for classification of disk hernia and spondylolisthesis in vertebral column. The data were then classified into two different but related classification tasks: “normal” and “abnormal”. K-NN algorithm adopts the approach of data classification by optimizing sample data that can be used as a reference for training data to produce vertebral column data classification based on the learning process. The results showed that the accuracy of K-NN classifier was 83%. The average length of time needed to classify the K-NN classifier was 0.000212303 seconds.


Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1917-1930 ◽  
Author(s):  
Lei Shi ◽  
Qiguo Duan ◽  
Juanjuan Zhang ◽  
Lei Xi ◽  
Hongbo Qiao ◽  
...  

Agricultural data classification attracts more and more attention in the research area of intelligent agriculture. As a kind of important machine learning methods, ensemble learning uses multiple base classifiers to deal with classification problems. The rough set theory is a powerful mathematical approach to process unclear and uncertain data. In this paper, a rough set based ensemble learning algorithm is proposed to classify the agricultural data effectively and efficiently. An experimental comparison of different algorithms is conducted on four agricultural datasets. The results of experiment indicate that the proposed algorithm improves performance obviously.


2020 ◽  
Vol 174 (3-4) ◽  
pp. 259-281
Author(s):  
Angelo Oddi ◽  
Riccardo Rasconi

In this work we investigate the performance of greedy randomised search (GRS) techniques to the problem of compiling quantum circuits to emerging quantum hardware. Quantum computing (QC) represents the next big step towards power consumption minimisation and CPU speed boost in the future of computing machines. Quantum computing uses quantum gates that manipulate multi-valued bits (qubits). A quantum circuit is composed of a number of qubits and a series of quantum gates that operate on those qubits, and whose execution realises a specific quantum algorithm. Current quantum computing technologies limit the qubit interaction distance allowing the execution of gates between adjacent qubits only. This has opened the way to the exploration of possible techniques aimed at guaranteeing nearest-neighbor (NN) compliance in any quantum circuit through the addition of a number of so-called swap gates between adjacent qubits. In addition, technological limitations (decoherence effect) impose that the overall duration (makespan) of the quantum circuit realization be minimized. One core contribution of the paper is the definition of two lexicographic ranking functions for quantum gate selection, using two keys: one key acts as a global closure metric to minimise the solution makespan; the second one is a local metric, which favours the mutual approach of the closest qstates pairs. We present a GRS procedure that synthesises NN-compliant quantum circuits realizations, starting from a set of benchmark instances of different size belonging to the Quantum Approximate Optimization Algorithm (QAOA) class tailored for the MaxCut problem. We propose a comparison between the presented meta-heuristics and the approaches used in the recent literature against the same benchmarks, both from the CPU efficiency and from the solution quality standpoint. In particular, we compare our approach against a reference benchmark initially proposed and subsequently expanded in [1] by considering: (i) variable qubit state initialisation and (ii) crosstalk constraints that further restrict parallel gate execution.


Author(s):  
Klym Yamkovyi

The paper is dedicated to the development and comparative experimental analysis of semi-supervised learning approaches based on a mix of unsupervised and supervised approaches for the classification of datasets with a small amount of labeled data, namely, identifying to which of a set of categories a new observation belongs using a training set of data containing observations whose category membership is known. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Unlabeled data, when used in combination with a small quantity of labeled data, can produce significant improvement in learning accuracy. The goal is semi-supervised methods development and analysis along with comparing their accuracy and robustness on different synthetics datasets. The proposed approach is based on the unsupervised K-medoids methods, also known as the Partitioning Around Medoid algorithm, however, unlike Kmedoids the proposed algorithm first calculates medoids using only labeled data and next process unlabeled classes – assign labels of nearest medoid. Another proposed approach is the mix of the supervised method of K-nearest neighbor and unsupervised K-Means. Thus, the proposed learning algorithm uses information about both the nearest points and classes centers of mass. The methods have been implemented using Python programming language and experimentally investigated for solving classification problems using datasets with different distribution and spatial characteristics. Datasets were generated using the scikit-learn library. Was compared the developed approaches to find average accuracy on all these datasets. It was shown, that even small amounts of labeled data allow us to use semi-supervised learning, and proposed modifications ensure to improve accuracy and algorithm performance, which was demonstrated during experiments. And with the increase of available label information accuracy of the algorithms grows up. Thus, the developed algorithms are using a distance metric that considers available label information. Keywords: Unsupervised learning, supervised learning. semi-supervised learning, clustering, distance, distance function, nearest neighbor, medoid, center of mass.


Author(s):  
Seyed Jalaleddin Mousavirad ◽  
Hossein Ebrahimpour-Komleh

Classification of biomedical data plays a significant role in prediction and diagnosis of disease. The existence of redundant and irrelevant features is one of the major problems in biomedical data classification. Excluding these features can improve the performance of classification algorithm. Feature selection is the problem of selecting a subset of features without reducing the accuracy of the original set of features. These algorithms are divided into three categories: wrapper, filter, and embedded methods. Wrapper methods use the learning algorithm for selection of features while filter methods use statistical characteristics of data. In the embedded methods, feature selection process combines with the learning process. Population-based metaheuristics can be applied for wrapper feature selection. In these algorithms, a population of candidate solutions is created. Then, they try to improve the objective function using some operators. This chapter presents the application of population-based feature selection to deal with issues of high dimensionality in the biomedical data classification. The result shows that population-based feature selection has presented acceptable performance in biomedical data classification.


Molecules ◽  
2019 ◽  
Vol 24 (15) ◽  
pp. 2811 ◽  
Author(s):  
Rácz ◽  
Bajusz ◽  
Héberger

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.


2021 ◽  
Vol 5 (2(61)) ◽  
pp. 6-8
Author(s):  
Olena Hryshchenko ◽  
Vadym Yaremenko

The object of research is the methods of fast classification for solving text data classification problems. The need for this study is due to the rapid growth of textual data, both in digital and printed forms. Thus, there is a need to process such data using software, since human resources are not able to process such an amount of data in full. A large number of data classification approaches have been developed. The conducted research is based on the application of the following methods of classification of text data: Bloom filter, naive Bayesian classifier and neural networks to a set of text data in order to classify them into categories. Each method has both disadvantages and advantages. This paper will reflect the strengths and weaknesses of each method on a specific example. These algorithms were comparatively among themselves in terms of speed and efficiency, that is, the accuracy of determining the belonging of a text to a certain class of classification. The work of each method was considered on the same data sets with a change in the amount of training and test data, as well as with a change in the number of classification groups. The dataset used contains the following classes: world, business, sports, and science and technology. In real conditions of the classification of such data, the number of categories is much larger than that considered in the work, and may have subcategories in its composition. In the course of this study, each method was analyzed using different parameter values to obtain the best result. Analyzing the results obtained, the best results for the classification of text data were obtained using a neural network.


Author(s):  
Kayhan Ghafoor

The first COVID-19 confirmed case is reported in Wuhan, China and spread across the globe with unprecedented impact on humanity. Since this pandemic requires pervasive diagnosis, it is significant to develop smart, fast and efficient detection technique. To this end, we developed an Artificial Intelligence (AI) engine to classify the lung inflammation level (mild, progressive, severe stage) of the COVID-19 confirmed patient. In particular, the developed model consists of two phases; in the first phase, we calculate the volume and density of lesions and opacities of the CT images of the confirmed COVID-19 patient using Morphological approaches. In the second phase, the second phase classifies the pneumonia level of the confirmed COVID-19 patient. To achieve precise classification of lung inflammation, we use modified Convolution Neural Network (CNN) and k-Nearest Neighbor (kNN). The result of the experiments show that the utilized models can provide the accuracy up to 95.65\% and 91.304 \% of CNN and kNN respectively.<br>


Sign in / Sign up

Export Citation Format

Share Document