scholarly journals Smooth input preparation for quantum and quantum-inspired machine learning

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Zhikuan Zhao ◽  
Jack K. Fitzsimons ◽  
Patrick Rebentrost ◽  
Vedran Dunjko ◽  
Joseph F. Fitzsimons

AbstractMachine learning has recently emerged as a fruitful area for finding potential quantum computational advantage. Many of the quantum-enhanced machine learning algorithms critically hinge upon the ability to efficiently produce states proportional to high-dimensional data points stored in a quantum accessible memory. Even given query access to exponentially many entries stored in a database, the construction of which is considered a one-off overhead, it has been argued that the cost of preparing such amplitude-encoded states may offset any exponential quantum advantage. Here we prove using smoothed analysis that if the data analysis algorithm is robust against small entry-wise input perturbation, state preparation can always be achieved with constant queries. This criterion is typically satisfied in realistic machine learning applications, where input data is subjective to moderate noise. Our results are equally applicable to the recent seminal progress in quantum-inspired algorithms, where specially constructed databases suffice for polylogarithmic classical algorithm in low-rank cases. The consequence of our finding is that for the purpose of practical machine learning, polylogarithmic processing time is possible under a general and flexible input model with quantum algorithms or quantum-inspired classical algorithms in the low-rank cases.

2021 ◽  
Vol 28 (1) ◽  
pp. e100251
Author(s):  
Ian Scott ◽  
Stacey Carter ◽  
Enrico Coiera

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.


2021 ◽  
pp. 1-16
Author(s):  
Kevin Kloos

The use of machine learning algorithms at national statistical institutes has increased significantly over the past few years. Applications range from new imputation schemes to new statistical output based entirely on machine learning. The results are promising, but recent studies have shown that the use of machine learning in official statistics always introduces a bias, known as misclassification bias. Misclassification bias does not occur in traditional applications of machine learning and therefore it has received little attention in the academic literature. In earlier work, we have collected existing methods that are able to correct misclassification bias. We have compared their statistical properties, including bias, variance and mean squared error. In this paper, we present a new generic method to correct misclassification bias for time series and we derive its statistical properties. Moreover, we show numerically that it has a lower mean squared error than the existing alternatives in a wide variety of settings. We believe that our new method may improve machine learning applications in official statistics and we aspire that our work will stimulate further methodological research in this area.


2021 ◽  
Vol 3 (2) ◽  
pp. 43-50
Author(s):  
Safa SEN ◽  
Sara Almeida de Figueiredo

Predicting bank failures has been an essential subject in literature due to the significance of the banks for the economic prosperity of a country. Acting as an intermediary player of the economy, banks channel funds between creditors and debtors. In that matter, banks are considered the backbone of the economies; hence, it is important to create early warning systems that identify insolvent banks from solvent ones. Thus, Insolvent banks can apply for assistance and avoid bankruptcy in financially turbulent times. In this paper, we will focus on two different machine learning disciplines: Boosting and Cost-Sensitive methods to predict bank failures. Boosting methods are widely used in the literature due to their better prediction capability. However, Cost-Sensitive Forest is relatively new to the literature and originally invented to solve imbalance problems in software defect detection. Our results show that comparing to the boosting methods, Cost-Sensitive Forest particularly classifies failed banks more accurately. Thus, we suggest using the Cost-Sensitive Forest when predicting bank failures with imbalanced datasets.


Author(s):  
Mikhail Krechetov ◽  
Jakub Marecek ◽  
Yury Maximov ◽  
Martin Takac

Low-rank methods for semi-definite programming (SDP) have gained a lot of interest recently, especially in machine learning applications. Their analysis often involves determinant-based or Schatten-norm penalties, which are difficult to implement in practice due to high computational efforts. In this paper, we propose Entropy-Penalized Semi-Definite Programming (EP-SDP), which provides a unified framework for a broad class of penalty functions used in practice to promote a low-rank solution. We show that EP-SDP problems admit an efficient numerical algorithm, having (almost) linear time complexity of the gradient computation; this makes it useful for many machine learning and optimization problems. We illustrate the practical efficiency of our approach on several combinatorial optimization and machine learning problems.


Author(s):  
Danielle Bradley ◽  
Erin Landau ◽  
Adam Wolfberg ◽  
Alex Baron

BACKGROUND The rise of highly engaging digital health mobile apps over the past few years has created repositories containing billions of patient-reported data points that have the potential to inform clinical research and advance medicine. OBJECTIVE To determine if self-reported data could be leveraged to create machine learning algorithms to predict the presence of, or risk for, obstetric outcomes and related conditions. METHODS More than 10 million women have downloaded Ovia Health’s three mobile apps (Ovia Fertility, Ovia Pregnancy, and Ovia Parenting). Data points logged by app users can include information about menstrual cycle, health history, current health status, nutrition habits, exercise activity, symptoms, or moods. Machine learning algorithms were developed using supervised machine learning methodologies, specifically, Gradient Boosting Decision Tree algorithms. Each algorithm was developed and trained using anywhere from 385 to 5770 features and data from 77,621 to 121,740 app users. RESULTS Algorithms were created to detect the risk of developing preeclampsia, gestational diabetes, and preterm delivery, as well as to identify the presence of existing preeclampsia. The positive predictive value (PPV) was set to 0.75 for all of the models, as this was the threshold where the researchers felt a clinical response—additional screening or testing—would be reasonable, due to the likelihood of a positive outcome. Sensitivity ranged from 24% to 75% across all models. When PPV was adjusted from 0.75 to 0.52, the sensitivity of the preeclampsia prediction algorithm rose from 24% to 85%. When PPV was adjusted from 0.75 to 0.65, the sensitivity of the preeclampsia detection or diagnostic algorithm increased from 37% to 79%. CONCLUSIONS Algorithms based on patient-reported data can predict serious obstetric conditions with accuracy levels sufficient to guide clinical screening by health care providers and health plans. Further research is needed to determine whether such an approach can improve outcomes for at-risk patients and reduce the cost of screening those not at risk. Presenting the results of these models to patients themselves could also provide important insight into otherwise unknown health risks.


Author(s):  
Syed Jamal Safdar Gardezi ◽  
Mohamed Meselhy Eltoukhy ◽  
Ibrahima Faye

Breast cancer is one of the leading causes of death in women worldwide. Early detection is the key to reduce the mortality rates. Mammography screening has proven to be one of the effective tools for diagnosis of breast cancer. Computer aided diagnosis (CAD) system is a fast, reliable, and cost-effective tool in assisting the radiologists/physicians for diagnosis of breast cancer. CAD systems play an increasingly important role in the clinics by providing a second opinion. Clinical trials have shown that CAD systems have improved the accuracy of breast cancer detection. A typical CAD system involves three major steps i.e. segmentation of suspected lesions, feature extraction and classification of these regions into normal or abnormal class and further into benign or malignant stages. The diagnostics ability of any CAD system is dependent on accurate segmentation, feature extraction techniques and most importantly classification tools that have ability to discriminate the normal tissues from the abnormal tissues. In this chapter we discuss the application of machine learning algorithms e.g. ANN, binary tree, SVM, etc. together with segmentation and feature extraction techniques in a CAD system development. Various methods used in the detection and diagnosis of breast lesions in mammography are reviewed. A brief introduction of machine learning tools, used in diagnosis and their classification performance on various segmentation and feature extraction techniques is presented.


Author(s):  
Anitha Elavarasi S. ◽  
Jayanthi J.

Machine learning provides the system to automatically learn without human intervention and improve their performance with the help of previous experience. It can access the data and use it for learning by itself. Even though many algorithms are developed to solve machine learning issues, it is difficult to handle all kinds of inputs data in-order to arrive at accurate decisions. The domain knowledge of statistical science, probability, logic, mathematical optimization, reinforcement learning, and control theory plays a major role in developing machine learning based algorithms. The key consideration in selecting a suitable programming language for implementing machine learning algorithm includes performance, concurrence, application development, learning curve. This chapter deals with few of the top programming languages used for developing machine learning applications. They are Python, R, and Java. Top three programming languages preferred by data scientist are (1) Python more than 57%, (2) R more than 31%, and (3) Java used by 17% of the data scientists.


2019 ◽  
Vol 2019 ◽  
pp. 1-26 ◽  
Author(s):  
Mohammad Masoud ◽  
Yousef Jaradat ◽  
Ahmad Manasrah ◽  
Ismael Jannoud

Smart device industry allows developers and designers to embed different sensors, processors, and memories in small-size electronic devices. Sensors are added to enhance the usability of these devices and improve the quality of experience through data collection and analysis. However, with the era of big data and machine learning, sensors’ data may be processed by different techniques to infer various hidden information. The extracted information may be beneficial to device users, developers, and designers to enhance the management, operation, and development of these devices. However, the extracted information may be used to compromise the security and the privacy of humans in the era of Internet of Everything (IoE). In this work, we attempt to review the process of inferring meaningful data from smart devices’ sensors, especially, smartphones. In addition, different useful machine learning applications based on smartphones’ sensors data are shown. Moreover, different side channel attacks utilizing the same sensors and the same machine learning algorithms are overviewed.


Author(s):  
Michael McCartney ◽  
Matthias Haeringer ◽  
Wolfgang Polifke

Abstract This paper examines and compares commonly used Machine Learning algorithms in their performance in interpolation and extrapolation of FDFs, based on experimental and simulation data. Algorithm performance is evaluated by interpolating and extrapolating FDFs and then the impact of errors on the limit cycle amplitudes are evaluated using the xFDF framework. The best algorithms in interpolation and extrapolation were found to be the widely used cubic spline interpolation, as well as the Gaussian Processes regressor. The data itself was found to be an important factor in defining the predictive performance of a model, therefore a method of optimally selecting data points at test time using Gaussian Processes was demonstrated. The aim of this is to allow a minimal amount of data points to be collected while still providing enough information to model the FDF accurately. The extrapolation performance was shown to decay very quickly with distance from the domain and so emphasis should be put on selecting measurement points in order to expand the covered domain. Gaussian Processes also give an indication of confidence on its predictions and is used to carry out uncertainty quantification, in order to understand model sensitivities. This was demonstrated through application to the xFDF framework.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Su Bin Lim ◽  
Swee Jin Tan ◽  
Wan-Teck Lim ◽  
Chwee Teck Lim

Abstract There are massive transcriptome profiles in the form of microarray. The challenge is that they are processed using diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset analyses. If there exists a single, integrated data source, data-reuse can be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy. Here, we present merged microarray-acquired datasets (MMDs) across 11 major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Using machine learning algorithms, we show that diagnostic models trained from MMDs can be directly applied to RNA-seq-acquired TCGA data with high classification accuracy. Machine learning optimized MMD further aids to reveal immune landscape across various carcinomas critically needed in disease management and clinical interventions. This unified data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.


Sign in / Sign up

Export Citation Format

Share Document