Exact and inexact subsampled Newton methods for optimization

Abstract The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of the paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact). We provide a complexity analysis for this method based on the properties of the CG iteration and the quality of the Hessian approximation, and compare it with a method that employs a stochastic gradient iteration instead of the CG method. We report preliminary numerical results that illustrate the performance of inexact subsampled Newton methods on machine learning applications based on logistic regression.

Download Full-text

Implications of properties and quality of indoor sensor data for building machine learning applications: Two case studies in smart campuses

Building and Environment ◽

10.1016/j.buildenv.2021.108529 ◽

2021 ◽

pp. 108529

Author(s):

Miia Lillstrang ◽

Markus Harju ◽

Guillermo del Campo ◽

Gonzalo Calderon ◽

Juha Röning ◽

...

Keyword(s):

Machine Learning ◽

Case Studies ◽

Sensor Data ◽

Machine Learning Applications

Download Full-text

Entropy-Penalized Semidefinite Programming

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/157 ◽

2019 ◽

Cited By ~ 2

Author(s):

Mikhail Krechetov ◽

Jakub Marecek ◽

Yury Maximov ◽

Martin Takac

Keyword(s):

Machine Learning ◽

Time Complexity ◽

Optimization Problems ◽

Linear Time ◽

Broad Class ◽

Low Rank ◽

Learning Problems ◽

Unified Framework ◽

Gradient Computation ◽

Machine Learning Applications

Low-rank methods for semi-definite programming (SDP) have gained a lot of interest recently, especially in machine learning applications. Their analysis often involves determinant-based or Schatten-norm penalties, which are difficult to implement in practice due to high computational efforts. In this paper, we propose Entropy-Penalized Semi-Definite Programming (EP-SDP), which provides a unified framework for a broad class of penalty functions used in practice to promote a low-rank solution. We show that EP-SDP problems admit an efficient numerical algorithm, having (almost) linear time complexity of the gradient computation; this makes it useful for many machine learning and optimization problems. We illustrate the practical efficiency of our approach on several combinatorial optimization and machine learning problems.

Download Full-text

Comparison of synchronous and asynchronous parallelization of extreme surrogate-assisted multi-objective evolutionary algorithm

Natural Computing ◽

10.1007/s11047-020-09806-2 ◽

2020 ◽

Author(s):

Tomohiro Harada ◽

Misaki Kaidan ◽

Ruck Thawonmas

Keyword(s):

Machine Learning ◽

Evolutionary Algorithm ◽

Optimization Problems ◽

Computing Time ◽

The Other ◽

Multi Objective Optimization ◽

Multi Objective ◽

Evaluation Time ◽

Surrogate Function

Abstract This paper investigates the integration of a surrogate-assisted multi-objective evolutionary algorithm (MOEA) and a parallel computation scheme to reduce the computing time until obtaining the optimal solutions in evolutionary algorithms (EAs). A surrogate-assisted MOEA solves multi-objective optimization problems while estimating the evaluation of solutions with a surrogate function. A surrogate function is produced by a machine learning model. This paper uses an extreme learning surrogate-assisted MOEA/D (ELMOEA/D), which utilizes one of the well-known MOEA algorithms, MOEA/D, and a machine learning technique, extreme learning machine (ELM). A parallelization of MOEA, on the other hand, evaluates solutions in parallel on multiple computing nodes to accelerate the optimization process. We consider a synchronous and an asynchronous parallel MOEA as a master-slave parallelization scheme for ELMOEA/D. We carry out an experiment with multi-objective optimization problems to compare the synchronous parallel ELMOEA/D with the asynchronous parallel ELMOEA/D. In the experiment, we simulate two settings of the evaluation time of solutions. One determines the evaluation time of solutions by the normal distribution with different variances. On the other hand, another evaluation time correlates to the objective function value. We compare the quality of solutions obtained by the parallel ELMOEA/D variants within a particular computing time. The experimental results show that the parallelization of ELMOEA/D significantly reduces the computational time. In addition, the integration of ELMOEA/D with the asynchronous parallelization scheme obtains higher quality of solutions quicker than the synchronous parallel ELMOEA/D.

Download Full-text

Sensors of Smart Devices in the Internet of Everything (IoE) Era: Big Opportunities and Massive Doubts

Journal of Sensors ◽

10.1155/2019/6514520 ◽

2019 ◽

Vol 2019 ◽

pp. 1-26 ◽

Cited By ~ 7

Author(s):

Mohammad Masoud ◽

Yousef Jaradat ◽

Ahmad Manasrah ◽

Ismael Jannoud

Keyword(s):

Machine Learning ◽

Electronic Devices ◽

Machine Learning Algorithms ◽

Smart Devices ◽

Smart Device ◽

Internet Of Everything ◽

Data Collection And Analysis ◽

Machine Learning Applications ◽

Management Operation

Smart device industry allows developers and designers to embed different sensors, processors, and memories in small-size electronic devices. Sensors are added to enhance the usability of these devices and improve the quality of experience through data collection and analysis. However, with the era of big data and machine learning, sensors’ data may be processed by different techniques to infer various hidden information. The extracted information may be beneficial to device users, developers, and designers to enhance the management, operation, and development of these devices. However, the extracted information may be used to compromise the security and the privacy of humans in the era of Internet of Everything (IoE). In this work, we attempt to review the process of inferring meaningful data from smart devices’ sensors, especially, smartphones. In addition, different useful machine learning applications based on smartphones’ sensors data are shown. Moreover, different side channel attacks utilizing the same sensors and the same machine learning algorithms are overviewed.

Download Full-text

Testing and Validating Two Morphological Flare Predictors by Logistic Regression Machine Learning

Frontiers in Astronomy and Space Sciences ◽

10.3389/fspas.2020.571186 ◽

2021 ◽

Vol 7 ◽

Author(s):

M. B. Korsós ◽

R. Erdélyi ◽

J. Liu ◽

H. Morgan

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Active Regions ◽

Morphological Parameters ◽

Mixed States ◽

Separation Parameter ◽

Prediction Probability ◽

Machine Learning Applications ◽

Joint Prediction ◽

Two Parameters

Whilst the most dynamic solar active regions (ARs) are known to flare frequently, predicting the occurrence of individual flares and their magnitude, is very much a developing field with strong potentials for machine learning applications. The present work is based on a method which is developed to define numerical measures of the mixed states of ARs with opposite polarities. The method yields compelling evidence for the assumed connection between the level of mixed states of a given AR and the level of the solar eruptive probability of this AR by employing two morphological parameters: 1) the separation parameter Sl−f and 2) the sum of the horizontal magnetic gradient GS. In this work, we study the efficiency of Sl−f and GS as flare predictors on a representative sample of ARs, based on the SOHO/MDI-Debrecen Data (SDD) and the SDO/HMI - Debrecen Data (HMIDD) sunspot catalogues. In particular, we investigate about 1,000 ARs in order to test and validate the joint prediction capabilities of the two morphological parameters by applying the logistic regression machine learning method. Here, we confirm that the two parameters with their threshold values are, when applied together, good complementary predictors. Furthermore, the prediction probability of these predictor parameters is given at least 70% a day before.

Download Full-text

Logistic Regression for Employability Prediction

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8170.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2471-2478

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Weather Forecasting ◽

Previous Knowledge ◽

Recruitment Process ◽

Proposed Model ◽

Unknown Event ◽

The Right

Prediction is a conjecture about something which may happen. Prediction need not be based upon the previous knowledge or experience on the unknown event of interest in the future. But it is a necessity for mankind to foresee and make the right decisions to live better. Every person does predictions but the quality of the predictions differs and that differentiates successful persons and unsuccessful persons. In order to automate the prediction process and to make quality predictions available to every person, machines are trained to make predictions and such field comes under machine learning and later on deep learning algorithms. Various fields such as health care, weather forecasting, natural calamities, and crime prediction are some of the applications of prediction. The researchers have applied the field of prediction to see whether a model can predict the employability of a candidate in a recruitment process. Organizations use human expertise to identify a skilled candidate for employment based on various factors and now these organizations are trying to migrate to automated systems by harnessing the benefits of the exponential growth in the area of machine learning and deep learning. This investigation presents the development of a model to predict the employability by using Logistic Regression. A set of candidates was tested in the proposed model and results are discussed in this paper.

Download Full-text

UNIVERSAL ALGORITHMS FOR PROBABILITY FORECASTING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213012400155 ◽

2012 ◽

Vol 21 (04) ◽

pp. 1240015

Author(s):

FEDOR ZHDANOV ◽

YURI KALNISHKAN

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Linear Model ◽

Loss Function ◽

Efficient Algorithms ◽

Classification Problems ◽

Computationally Efficient ◽

Multi Class Classification ◽

Computationally Efficient Algorithms

Multi-class classification is one of the most important tasks in machine learning. In this paper we consider two online multi-class classification problems: classification by a linear model and by a kernelized model. The quality of predictions is measured by the Brier loss function. We obtain two computationally efficient algorithms for these problems by applying the Aggregating Algorithms to certain pools of experts and prove theoretical guarantees on the losses of these algorithms. We kernelize one of the algorithms and prove theoretical guarantees on its loss. We perform experiments and compare our algorithms with logistic regression.

Download Full-text

Collective annotation patterns in learning from crowds

Intelligent Data Analysis ◽

10.3233/ida-200009 ◽

2020 ◽

Vol 24 ◽

pp. 63-86

Author(s):

Francisco Mena ◽

Ricardo Ñanculef ◽

Carlos Valle

Keyword(s):

Machine Learning ◽

Large Scale ◽

Ground Truth ◽

Experimental Results ◽

Ground Truth Data ◽

Satisfactory Performance ◽

Machine Learning Applications ◽

Data Points ◽

Confusion Matrices

The lack of annotated data is one of the major barriers facing machine learning applications today. Learning from crowds, i.e. collecting ground-truth data from multiple inexpensive annotators, has become a common method to cope with this issue. It has been recently shown that modeling the varying quality of the annotations obtained in this way, is fundamental to obtain satisfactory performance in tasks where inexpert annotators may represent the majority but not the most trusted group. Unfortunately, existing techniques represent annotation patterns for each annotator individually, making the models difficult to estimate in large-scale scenarios. In this paper, we present two models to address these problems. Both methods are based on the hypothesis that it is possible to learn collective annotation patterns by introducing confusion matrices that involve groups of data point annotations or annotators. The first approach clusters data points with a common annotation pattern, regardless the annotators from which the labels have been obtained. Implicitly, this method attributes annotation mistakes to the complexity of the data itself and not to the variable behavior of the annotators. The second approach explicitly maps annotators to latent groups that are collectively parametrized to learn a common annotation pattern. Our experimental results show that, compared with other methods for learning from crowds, both methods have advantages in scenarios with a large number of annotators and a small number of annotations per annotator.

Download Full-text

Structuring clinical text with AI: old vs. new natural language processing techniques evaluated on eight common cardiovascular diseases

10.1101/2021.01.27.21250477 ◽

2021 ◽

Author(s):

Xianghao Zhan ◽

Marie Humbert-Droz ◽

Pritam Mukherjee ◽

Olivier Gevaert

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Cardiovascular Diseases ◽

Free Text ◽

Diagnostic Code ◽

Word Embeddings ◽

Clinical Notes ◽

Diagnostic Codes ◽

Mimic Iii

AbstractMining the structured data in electronic health records(EHRs) enables many clinical applications while the information in free-text clinical notes often remains untapped. Free-text notes are unstructured data harder to use in machine learning while structured diagnostic codes can be missing or even erroneous. To improve the quality of diagnostic codes, this work extracts structured diagnostic codes from the unstructured notes concerning cardiovascular diseases. Five old and new word embeddings were used to vectorize over 5 million progress notes from Stanford EHR and logistic regression was used to predict eight ICD-10 codes of common cardiovascular diseases. The models were interpreted by the important words in predictions and analyses of false positive cases. Trained on Stanford notes, the model transferability was tested in the prediction of corresponding ICD-9 codes of the MIMIC-III discharge summaries. The word embeddings and logistic regression showed good performance in the diagnostic code extraction with TF-IDF as the best word embedding model showing AU-ROC ranging from 0.9499 to 0.9915 and AUPRC ranging from 0.2956 to 0.8072. The models also showed transferability when tested on MIMIC-III data set with AUROC ranging from 0.7952 to 0.9790 and AUPRC ranging from 0.2353 to 0.8084. Model interpretability was showed by the important words with clinical meanings matching each disease. This study shows the feasibility to accurately extract structured diagnostic codes, impute missing codes and correct erroneous codes from free-text clinical notes with interpretable models for clinicians, which helps improve the data quality of diagnostic codes for information retrieval and downstream machine-learning applications.

Download Full-text

Dealer

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447700 ◽

2021 ◽

Vol 14 (6) ◽

pp. 957-969

Author(s):

Jinfei Liu ◽

Jian Lou ◽

Junxu Liu ◽

Li Xiong ◽

Jian Pei ◽

...

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Optimization Problems ◽

Data Driven ◽

Noise Sensitivity ◽

Efficiency And Effectiveness ◽

Machine Learning Applications ◽

Model Training ◽

Price Functions ◽

Synthetic Datasets

Data-driven machine learning has become ubiquitous. A marketplace for machine learning models connects data owners and model buyers, and can dramatically facilitate data-driven machine learning applications. In this paper, we take a formal data marketplace perspective and propose the first en D -to-end mod e l m a rketp l ace with diff e rential p r ivacy ( Dealer ) towards answering the following questions: How to formulate data owners' compensation functions and model buyers' price functions? How can the broker determine prices for a set of models to maximize the revenue with arbitrage-free guarantee, and train a set of models with maximum Shapley coverage given a manufacturing budget to remain competitive ? For the former, we propose compensation function for each data owner based on Shapley value and privacy sensitivity, and price function for each model buyer based on Shapley coverage sensitivity and noise sensitivity. Both privacy sensitivity and noise sensitivity are measured by the level of differential privacy. For the latter, we formulate two optimization problems for model pricing and model training, and propose efficient dynamic programming algorithms. Experiment results on the real chess dataset and synthetic datasets justify the design of Dealer and verify the efficiency and effectiveness of the proposed algorithms.

Download Full-text