Quantum-Inspired Classification Algorithm from DBSCAN–Deutsch–Jozsa Support Vectors and Ising Prediction Model

Quantum computing is suggested as a new tool to deal with large data set for machine learning applications. However, many quantum algorithms are too expensive to fit into the small-scale quantum hardware available today and the loading of big classical data into small quantum memory is still an unsolved obstacle. These difficulties lead to the study of quantum-inspired techniques using classical computation. In this work, we propose a new classification method based on support vectors from a DBSCAN–Deutsch–Jozsa ranking and an Ising prediction model. The proposed algorithm has an advantage over standard classical SVM in the scaling with respect to the number of training data at the training phase. The method can be executed in a pure classical computer and can be accelerated in a hybrid quantum–classical computing environment. We demonstrate the applicability of the proposed algorithm with simulations and theory.

Download Full-text

Coreset Clustering on Small Quantum Computers

Electronics ◽

10.3390/electronics10141690 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1690

Author(s):

Teague Tomesh ◽

Pranav Gokhale ◽

Eric R. Anschuetz ◽

Frederic T. Chong

Keyword(s):

Quantum Algorithms ◽

Quantum Computers ◽

Data Sets ◽

New Paradigm ◽

Data Set ◽

Small Quantum ◽

Natural Data ◽

Near Term ◽

Classical Computing ◽

Quantum Speedup

Many quantum algorithms for machine learning require access to classical data in superposition. However, for many natural data sets and algorithms, the overhead required to load the data set in superposition can erase any potential quantum speedup over classical algorithms. Recent work by Harrow introduces a new paradigm in hybrid quantum-classical computing to address this issue, relying on coresets to minimize the data loading overhead of quantum algorithms. We investigated using this paradigm to perform k-means clustering on near-term quantum computers, by casting it as a QAOA optimization instance over a small coreset. We used numerical simulations to compare the performance of this approach to classical k-means clustering. We were able to find data sets with which coresets work well relative to random sampling and where QAOA could potentially outperform standard k-means on a coreset. However, finding data sets where both coresets and QAOA work well—which is necessary for a quantum advantage over k-means on the entire data set—appears to be challenging.

Download Full-text

A Software Quality Prediction Model Without Training Data Set

The KIPS Transactions PartD ◽

10.3745/kipstd.2003.10d.4.689 ◽

2003 ◽

Vol 10D (4) ◽

pp. 689-696 ◽

Cited By ~ 2

Keyword(s):

Prediction Model ◽

Software Quality ◽

Training Data ◽

Quality Prediction ◽

Data Set ◽

Software Quality Prediction

Download Full-text

Deep residual detection of radio frequency interference for FAST

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz3521 ◽

2020 ◽

Vol 492 (1) ◽

pp. 1421-1431 ◽

Cited By ~ 4

Author(s):

Zhicheng Yang ◽

Ce Yu ◽

Jian Xiao ◽

Bo Zhang

Keyword(s):

Radio Frequency ◽

Large Data ◽

High Sensitivity ◽

Original Data ◽

Training Data ◽

Radio Frequency Interference ◽

Data Sets ◽

Data Set ◽

Time Required ◽

Key Steps

ABSTRACT Radio frequency interference (RFI) detection and excision are key steps in the data-processing pipeline of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Because of its high sensitivity and large data rate, FAST requires more accurate and efficient RFI flagging methods than its counterparts. In the last decades, approaches based upon artificial intelligence (AI), such as codes using convolutional neural networks (CNNs), have been proposed to identify RFI more reliably and efficiently. However, RFI flagging of FAST data with such methods has often proved to be erroneous, with further manual inspections required. In addition, network construction as well as preparation of training data sets for effective RFI flagging has imposed significant additional workloads. Therefore, rapid deployment and adjustment of AI approaches for different observations is impractical to implement with existing algorithms. To overcome such problems, we propose a model called RFI-Net. With the input of raw data without any processing, RFI-Net can detect RFI automatically, producing corresponding masks without any alteration of the original data. Experiments with RFI-Net using simulated astronomical data show that our model has outperformed existing methods in terms of both precision and recall. Besides, compared with other models, our method can obtain the same relative accuracy with fewer training data, thus reducing the effort and time required to prepare the training data set. Further, the training process of RFI-Net can be accelerated, with overfittings being minimized, compared with other CNN codes. The performance of RFI-Net has also been evaluated with observing data obtained by FAST and the Bleien Observatory. Our results demonstrate the ability of RFI-Net to accurately identify RFI with fine-grained, high-precision masks that required no further modification.

Download Full-text

Car-Following Described by Blending Data-Driven and Analytical Models: A Gaussian Process Regression Approach

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211032648 ◽

2021 ◽

pp. 036119812110326

Author(s):

Ignasi Echaniz Soldevila ◽

Victor L. Knoop ◽

Serge Hoogendoorn

Keyword(s):

Gaussian Process Regression ◽

Large Data ◽

Driving Behavior ◽

Large Data Sets ◽

Training Data ◽

Data Driven ◽

Data Sets ◽

Data Set ◽

Car Following ◽

New Variables

Traffic engineers rely on microscopic traffic models to design, plan, and operate a wide range of traffic applications. Recently, large data sets, yet incomplete and from small space regions, are becoming available thanks to technology improvements and governmental efforts. With this study we aim to gain new empirical insights into longitudinal driving behavior and to formulate a model which can benefit from these new challenging data sources. This paper proposes an application of an existing formulation, Gaussian process regression (GPR), to describe individual longitudinal driving behavior of drivers. The method integrates a parametric and a non-parametric mathematical formulation. The model predicts individual driver’s acceleration given a set of variables. It uses the GPR to make predictions when there exists correlation between new input and the training data set. The data-driven model benefits from a large training data set to capture all driver longitudinal behavior, which would be difficult to fit in fixed parametric equation(s). The methodology allows us to train models with new variables without the need of altering the model formulation. And importantly, the model also uses existing traditional parametric car-following models to predict acceleration when no similar situations are found in the training data set. A case study using radar data in an urban environment shows that a hybrid model performs better than parametric model alone and suggests that traffic light status over time influences drivers’ acceleration. This methodology can help engineers to use large data sets and to find new variables to describe traffic behavior.

Download Full-text

Biomarkers of iron metabolism facilitate clinical diagnosis in Mycobacterium tuberculosis infection

Thorax ◽

10.1136/thoraxjnl-2018-212557 ◽

2019 ◽

Vol 74 (12) ◽

pp. 1161-1167 ◽

Cited By ~ 3

Author(s):

Youchao Dai ◽

Wanshui Shan ◽

Qianting Yang ◽

Jiubiao Guo ◽

Rihong Zhai ◽

...

Keyword(s):

Prediction Model ◽

Iron Homeostasis ◽

Serum Levels ◽

Training Data ◽

Lung Damage ◽

Mycobacterium Tuberculosis Infection ◽

Data Set ◽

Patient Status ◽

Latent Tb Infection ◽

Tb Diagnostics

BackgroundPerturbed iron homeostasis is a risk factor for tuberculosis (TB) progression and an indicator of TB treatment failure and mortality. Few studies have evaluated iron homeostasis as a TB diagnostic biomarker.MethodsWe recruited participants with TB, latent TB infection (LTBI), cured TB (RxTB), pneumonia (PN) and healthy controls (HCs). We measured serum levels of three iron biomarkers including serum iron, ferritin and transferrin, then established and validated our prediction model.ResultsWe observed and verified that the three iron biomarker levels correlated with patient status (TB, HC, LTBI, RxTB or PN) and with the degree of lung damage and bacillary load in patients with TB. We then built a TB prediction model, neural network (NNET), incorporating the data of the three iron biomarkers. The model showed good performance for diagnosis of TB, with 83% (95% CI 77 to 87) sensitivity and 86% (95% CI 83 to 89) specificity in the training data set (n=663) and 70% (95% CI 58 to 79) sensitivity and 92% (95% CI 86 to 96) specificity in the test data set (n=220). The area under the curves (AUCs) of the NNET model to discriminate TB from HC, LTBI, RxTB and PN were all >0.83. Independent validation of the NNET model in a separate cohort (n=967) produced an AUC of 0.88 (95% CI 0.85 to 0.91) with 74% (95% CI 71 to 77) sensitivity and 92% (95% CI 87 to 96) specificity.ConclusionsThe established NNET TB prediction model discriminated TB from HC, LTBI, RxTB and PN in a large cohort of patients. This diagnostic assay may augment current TB diagnostics.

Download Full-text

Congestion Prediction in FPGA Using Regression Based Learning Methods

Electronics ◽

10.3390/electronics10161995 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1995

Author(s):

Pingakshya Goswami ◽

Dinesh Bhatia

Keyword(s):

Prediction Model ◽

Prediction Method ◽

Absolute Error ◽

Physical Design ◽

Design Flow ◽

Training Data ◽

Regression Problem ◽

Data Set ◽

Congestion Prediction ◽

Routing Congestion

Design closure in general VLSI physical design flows and FPGA physical design flows is an important and time-consuming problem. Routing itself can consume as much as 70% of the total design time. Accurate congestion estimation during the early stages of the design flow can help alleviate last-minute routing-related surprises. This paper has described a methodology for a post-placement, machine learning-based routing congestion prediction model for FPGAs. Routing congestion is modeled as a regression problem. We have described the methods for generating training data, feature extractions, training, regression models, validation, and deployment approaches. We have tested our prediction model by using ISPD 2016 FPGA benchmarks. Our prediction method reports a very accurate localized congestion value in each channel around a configurable logic block (CLB). The localized congestion is predicted in both vertical and horizontal directions. We demonstrate the effectiveness of our model on completely unseen designs that are not initially part of the training data set. The generated results show significant improvement in terms of accuracy measured as mean absolute error and prediction time when compared against the latest state-of-the-art works.

Download Full-text

A Practical Robust and Efficient RBF Metamodel Method for Typical Engineering Problems

Volume 1: 34th Design Automation Conference, Parts A and B ◽

10.1115/detc2008-49994 ◽

2008 ◽

Cited By ~ 1

Author(s):

Xingjie Fang ◽

Liping Wang ◽

Don Beeson ◽

Gene Wiggs

Keyword(s):

Principal Component ◽

Large Data ◽

Large Data Sets ◽

Training Data ◽

Data Sets ◽

Dimensional Model ◽

Data Set ◽

Engineering Problems ◽

Processing Techniques ◽

Generalization Accuracy

Radial Basis Function (RBF) metamodels have recently attracted increased interest due to their significant advantages over other types of non-parametric metamodels. However, because of the interpolation nature of the RBF mathematics, the accuracy of the model may dramatically deteriorate if the training data set used contains duplicate information, noise or outliers. Also constructing the metamodel may be time consuming whenever the training data sets are large or a high dimensional model is required. In this paper, we propose a robust and efficient RBF metamodeling approach based on data pre-processing techniques that alleviate the accuracy and efficiency issues commonly encountered when RBF models are used in typical real engineering situations. These techniques include 1) the removal of duplicate training data information, 2) the generation of smaller uniformly distributed subsets of training data from large data sets and 3) the quantification and identification of outliers by principal component analysis (PCA) and Hotelling statistics. Simulation results are used to validate the generalization accuracy and efficiency of the proposed approach.

Download Full-text

Financial Time Series Forecasting Using Directed-Weighted Chunking SVMs

Mathematical Problems in Engineering ◽

10.1155/2014/170424 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7

Author(s):

Yongming Cai ◽

Lei Song ◽

Tingwei Wang ◽

Qing Chang

Keyword(s):

Time Series ◽

Financial Time Series ◽

Training Data ◽

Series Data ◽

Support Vector ◽

Data Set ◽

Promising Alternative ◽

Financial Time ◽

Operation Speed ◽

Support Vectors

Support vector machines (SVMs) are a promising alternative to traditional regression estimation approaches. But, when dealing with massive-scale data set, there exist many problems, such as the long training time and excessive demand of memory space. So, the SVMs algorithm is not suitable to deal with financial time series data. In order to solve these problems, directed-weighted chunking SVMs algorithm is proposed. In this algorithm, the whole training data set is split into several chunks, and then the support vectors are obtained on each subset. Furthermore, the weighted support vector regressions are calculated to obtain the forecast model on the new working data set. Our directed-weighted chunking algorithm provides a new method of support vectors decomposing and combining according to the importance of chunks, which can improve the operation speed without reducing prediction accuracy. Finally, IBM stock daily close prices data are used to verify the validity of the proposed algorithm.

Download Full-text

Few Samples of SAR Automatic Target Recognition Based on Enhanced-Shape CNN

Journal of Mathematics ◽

10.1155/2021/9141023 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Mengmeng Huang ◽

Fang Liu ◽

Xianfa Meng

Keyword(s):

Target Recognition ◽

Recognition Rate ◽

Automatic Target Recognition ◽

Small Sample ◽

Training Data ◽

Small Scale ◽

Data Set ◽

Shape Information ◽

Target Shape ◽

The One

Synthetic Aperture Radar (SAR), as one of the important and significant methods for obtaining target characteristics in the field of remote sensing, has been applied to many fields including intelligence search, topographic surveying, mapping, and geological survey. In SAR field, the SAR automatic target recognition (SAR ATR) is a significant issue. However, on the other hand, it also has high application value. The development of deep learning has enabled it to be applied to SAR ATR. Some researchers point out that existing convolutional neural network (CNN) paid more attention to texture information, which is often not as good as shape information. Wherefore, this study designs the enhanced-shape CNN, which enhances the target shape at the input. Further, it uses an improved attention module, so that the network can highlight target shape in SAR images. Aiming at the problem of the small scale of the existing SAR data set, a small sample experiment is conducted. Enhanced-shape CNN achieved a recognition rate of 99.29% when trained on the full training set, while it is 89.93% on the one-eighth training data set.

Download Full-text

A Machine Learning Model for Accurate Prediction of Sepsis in ICU Patients in China

10.21203/rs.3.rs-152663/v1 ◽

2021 ◽

Author(s):

Dong Wang ◽

JinBo Li ◽

Yali Sun ◽

Xianfei Ding ◽

Xiaojuan Zhang ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Secondary Analysis ◽

Adverse Outcomes ◽

Training Data ◽

Medical Systems ◽

Validation Data ◽

Data Set ◽

Icu Patients

Abstract Background: Although numerous studies are conducted every year on how to reduce the fatality rate associated with sepsis, it is still a major challenge faced by patients, clinicians, and medical systems worldwide. Early identification and prediction of patients at risk of sepsis and adverse outcomes associated with sepsis are critical. We aimed to develop an artificial intelligence algorithm that can predict sepsis early.Methods: This was a secondary analysis of an observational cohort study from the Intensive Care Unit of the First Affiliated Hospital of Zhengzhou University. A total of 4449 infected patients were randomly assigned to the development and validation data set at a ratio of 4:1. After extracting electronic medical record data, a set of 55 features (variables) was calculated and passed to the random forest algorithm to predict the onset of sepsis.Results: The pre-procedure clinical variables were used to build a prediction model from the training data set using the random forest machine learning method; a 5-fold cross-validation was used to evaluate the prediction accuracy of the model. Finally, we tested the model using the validation data set. The area obtained by the model under the receiver operating characteristic (ROC) curve (AUC) was 0.91, the sensitivity was 87%, and the specificity was 89%.Conclusions: The newly established model can accurately predict the onset of sepsis in ICU patients in clinical settings as early as possible. Prospective studies are necessary to determine the clinical utility of the proposed sepsis prediction model.

Download Full-text