Inexact Nonconvex Newton-Type Methods

The paper aims to extend the theory and application of nonconvex Newton-type methods, namely trust region and cubic regularization, to the settings in which, in addition to the solution of subproblems, the gradient and the Hessian of the objective function are approximated. Using certain conditions on such approximations, the paper establishes optimal worst-case iteration complexities as the exact counterparts. This paper is part of a broader research program on designing, analyzing, and implementing efficient second-order optimization methods for large-scale machine learning applications. The authors were based at UC Berkeley when the idea of the project was conceived. The first two authors were PhD students, the third author was a postdoc, all supervised by the fourth author.

Download Full-text

Clinician checklist for assessing suitability of machine learning applications in healthcare

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100251 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100251

Author(s):

Ian Scott ◽

Stacey Carter ◽

Enrico Coiera

Keyword(s):

Machine Learning ◽

Large Scale ◽

Clinical Decision Making ◽

Improve Patient Care ◽

Clinical Decision ◽

Routine Care ◽

Machine Learning Algorithms ◽

Clinical Settings ◽

Machine Learning Applications ◽

Key Issues

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.

Download Full-text

Straggler-Aware Distributed Learning: Communication–Computation Latency Trade-Off

Entropy ◽

10.3390/e22050544 ◽

2020 ◽

Vol 22 (5) ◽

pp. 544 ◽

Cited By ~ 1

Author(s):

Emre Ozfatura ◽

Sennur Ulukus ◽

Deniz Gündüz

Keyword(s):

Machine Learning ◽

Gradient Descent ◽

Large Scale ◽

Computation Time ◽

Distributed Learning ◽

Trade Off ◽

Communication Latency ◽

Advantages And Disadvantages ◽

Machine Learning Applications ◽

Coded Communication

When gradient descent (GD) is scaled to many parallel workers for large-scale machine learning applications, its per-iteration computation time is limited by straggling workers. Straggling workers can be tolerated by assigning redundant computations and/or coding across data and computations, but in most existing schemes, each non-straggling worker transmits one message per iteration to the parameter server (PS) after completing all its computations. Imposing such a limitation results in two drawbacks: over-computation due to inaccurate prediction of the straggling behavior, and under-utilization due to discarding partial computations carried out by stragglers. To overcome these drawbacks, we consider multi-message communication (MMC) by allowing multiple computations to be conveyed from each worker per iteration, and propose novel straggler avoidance techniques for both coded computation and coded communication with MMC. We analyze how the proposed designs can be employed efficiently to seek a balance between the computation and communication latency. Furthermore, we identify the advantages and disadvantages of these designs in different settings through extensive simulations, both model-based and real implementation on Amazon EC2 servers, and demonstrate that proposed schemes with MMC can help improve upon existing straggler avoidance schemes.

Download Full-text

Essays on machine learning applications in economics : causal inference and prediction

10.32469/10355/85768 ◽

2021 ◽

Author(s):

◽

Yong Bian

Keyword(s):

Machine Learning ◽

Gender Bias ◽

Linear Models ◽

Gender Effects ◽

Classification Model ◽

H Index ◽

Undergraduate Major ◽

The Third ◽

Machine Learning Applications ◽

Text Content

This study includes three chapters related to machine learning applications with focus on different empirical topics. The first chapter talks about a new method and its application. The second chapter focuses on young economics professors salary issues. While the third chapter discusses scientific paper publication values based on text analysis and gender bias. In the first Chapter, I give a discussion of Double/Debiased Machine Learning (DML) which is a causal estimation method recently created by Chernozhukov, Chetverikov, Demirer, Duo, Hansen, Newey, and Robins (2018) and apply it to an education empirical analysis. I explain why DML is practically useful and what it does; I also take a bootstrap procedure to improve the built-in DML standard errors in the curriculum adoption application. As an extension to the existing studies on how curriculum materials affect student achievement, my work compares the results of DML, kernel matching, and ordinary least squares (OLS). In my study, the DML estimators avoid the possible misspecification bias of linear models and obtain statistically significant results that improve upon the kernel matching results. In the second chapter, we analyze the effects of gender, PhD graduation school rank, and undergraduate major on young economics professors' salaries. The dataset used is novel, containing detailed and time-varying research productivity measures and other demographic information of young economics professors from 28 of the top 50 public research universities in the United States. We apply double/debiased machine learning (DML) to obtain consistent estimators under the high-dimensional control variable set. By tracking the first 10 years of their professional work experience, we find that there barely exist effects on young faculties' salaries from the above three factors in most of the experience years. However, the gender effect on salary in experience year 7 is both statistically significant and economically significant (large enough in magnitude to have a practical meaning). In experience years 5 to 7, which are also near most faculties' promotion years, the gender effects are obvious. For both PhD graduation school rank and undergraduate major, the estimates for experience years 7 to 9 are large in magnitude; however they do not possess statistical significance. Overall, the effects tend to expand with years of experience. We also discuss possible economic mechanisms and reasons. In the third chapter, we build machine learning and simple linear models to predict academic paper publication outcomes as measured by journal H-indices, and we discuss the gender bias associated with these outcomes. We use a novel dataset with paper text content and each paper's associated H-index, authors' genders, and other information, collected from recently published economics journals. We apply term frequency-inverse document frequency vectorization and other Natural Language Processing (NLP) tools to transfer text content into numerical values as model inputs. We find that when using paper text content to predict an H-index, the prediction power is around 60 [percent] in our classification model (4 tiers) and the root mean squared error is around 44 in our regression model. Moreover, when controlling for paper text, the gender causal effect hardly exists. As long as the paper contains similar text, gender does not influence the change in H-index. Additionally, we give real-world meanings associated with the models.

Download Full-text

The Trials and Tribulations of Assembling Large Medical Imaging Datasets for Machine Learning Applications

Journal of Digital Imaging ◽

10.1007/s10278-021-00505-7 ◽

2021 ◽

Author(s):

Kirti Magudia ◽

Christopher P. Bridge ◽

Katherine P. Andriole ◽

Michael H. Rosenthal

Keyword(s):

Machine Learning ◽

Medical Imaging ◽

Data Storage ◽

Large Scale ◽

Large Datasets ◽

Abdominal Ct ◽

Institutional Review ◽

Retrospective Clinical Study ◽

Primary Investigator ◽

Machine Learning Applications

AbstractWith vast interest in machine learning applications, more investigators are proposing to assemble large datasets for machine learning applications. We aim to delineate multiple possible roadblocks to exam retrieval that may present themselves and lead to significant time delays. This HIPAA-compliant, institutional review board–approved, retrospective clinical study required identification and retrieval of all outpatient and emergency patients undergoing abdominal and pelvic computed tomography (CT) at three affiliated hospitals in the year 2012. If a patient had multiple abdominal CT exams, the first exam was selected for retrieval (n=23,186). Our experience in attempting to retrieve 23,186 abdominal CT exams yielded 22,852 valid CT abdomen/pelvis exams and identified four major categories of challenges when retrieving large datasets: cohort selection and processing, retrieving DICOM exam files from PACS, data storage, and non-recoverable failures. The retrieval took 3 months of project time and at minimum 300 person-hours of time between the primary investigator (a radiologist), a data scientist, and a software engineer. Exam selection and retrieval may take significantly longer than planned. We share our experience so that other investigators can anticipate and plan for these challenges. We also hope to help institutions better understand the demands that may be placed on their infrastructure by large-scale medical imaging machine learning projects.

Download Full-text

Large scale optimization methods for machine learning

10.14711/thesis-991012757567403412 ◽

2019 ◽

Author(s):

Shuai Zheng

Keyword(s):

Machine Learning ◽

Large Scale ◽

Optimization Methods ◽

Large Scale Optimization ◽

Scale Optimization

Download Full-text

Adaptive cubic regularization methods with dynamic inexact Hessian information and applications to finite-sum minimization

IMA Journal of Numerical Analysis ◽

10.1093/imanum/drz076 ◽

2020 ◽

Cited By ~ 1

Author(s):

Stefania Bellavia ◽

Gianmarco Gurioli ◽

Benedetta Morini

Keyword(s):

Theoretical Analysis ◽

Numerical Experiments ◽

Large Scale ◽

Optimization Problems ◽

Regularization Methods ◽

Worst Case ◽

Nonconvex Optimization Problems ◽

New Variant ◽

Cubic Regularization ◽

Finite Sum

Abstract We consider the adaptive regularization with cubics approach for solving nonconvex optimization problems and propose a new variant based on inexact Hessian information chosen dynamically. The theoretical analysis of the proposed procedure is given. The key property of ARC framework, constituted by optimal worst-case function/derivative evaluation bounds for first- and second-order critical point, is guaranteed. Application to large-scale finite-sum minimization based on subsampled Hessian is discussed and analyzed in both a deterministic and probabilistic manner, and equipped with numerical experiments on synthetic and real datasets.

Download Full-text

Machine learning method for tight-binding Hamiltonian parameterization from ab-initio band structure

npj Computational Materials ◽

10.1038/s41524-020-00490-5 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Zifeng Wang ◽

Shizhuo Ye ◽

Hao Wang ◽

Jin He ◽

Qijun Huang ◽

...

Keyword(s):

Machine Learning ◽

Ab Initio ◽

Large Scale ◽

Tight Binding ◽

Real Space ◽

Machine Learning Method ◽

Learning Method ◽

Energy Bands ◽

Physical Problems ◽

Machine Learning Applications

AbstractThe tight-binding (TB) method is an ideal candidate for determining electronic and transport properties for a large-scale system. It describes the system as real-space Hamiltonian matrices expressed on a manageable number of parameters, leading to substantially lower computational costs than the ab-initio methods. Since the whole system is defined by the parameterization scheme, the choice of the TB parameters decides the reliability of the TB calculations. The typical empirical TB method uses the TB parameters directly from the existing parameter sets, which hardly reproduces the desired electronic structures quantitatively without specific optimizations. It is thus not suitable for quantitative studies like the transport property calculations. The ab-initio TB method derives the TB parameters from the ab-initio results through the transformation of basis functions, which achieves much higher numerical accuracy. However, it assumes prior knowledge of the basis and may encompass truncation error. Here, a machine learning method for TB Hamiltonian parameterization is proposed, within which a neural network (NN) is introduced with its neurons acting as the TB matrix elements. This method can construct the empirical TB model that reproduces the given ab-initio energy bands with predefined accuracy, which provides a fast and convenient way for TB model construction and gives insights into machine learning applications in physical problems.

Download Full-text

Collective annotation patterns in learning from crowds

Intelligent Data Analysis ◽

10.3233/ida-200009 ◽

2020 ◽

Vol 24 ◽

pp. 63-86

Author(s):

Francisco Mena ◽

Ricardo Ñanculef ◽

Carlos Valle

Keyword(s):

Machine Learning ◽

Large Scale ◽

Ground Truth ◽

Experimental Results ◽

Ground Truth Data ◽

Satisfactory Performance ◽

Machine Learning Applications ◽

Data Points ◽

Confusion Matrices

The lack of annotated data is one of the major barriers facing machine learning applications today. Learning from crowds, i.e. collecting ground-truth data from multiple inexpensive annotators, has become a common method to cope with this issue. It has been recently shown that modeling the varying quality of the annotations obtained in this way, is fundamental to obtain satisfactory performance in tasks where inexpert annotators may represent the majority but not the most trusted group. Unfortunately, existing techniques represent annotation patterns for each annotator individually, making the models difficult to estimate in large-scale scenarios. In this paper, we present two models to address these problems. Both methods are based on the hypothesis that it is possible to learn collective annotation patterns by introducing confusion matrices that involve groups of data point annotations or annotators. The first approach clusters data points with a common annotation pattern, regardless the annotators from which the labels have been obtained. Implicitly, this method attributes annotation mistakes to the complexity of the data itself and not to the variable behavior of the annotators. The second approach explicitly maps annotators to latent groups that are collectively parametrized to learn a common annotation pattern. Our experimental results show that, compared with other methods for learning from crowds, both methods have advantages in scenarios with a large number of annotators and a small number of annotations per annotator.

Download Full-text

MS2AI: Automated repurposing of public peptide LC-MS data for machine learning applications

10.1101/2021.01.27.428375 ◽

2021 ◽

Author(s):

Tobias Greisager Rehfeldt ◽

Konrad Krawczyk ◽

Mathias Bøgebjerg ◽

Veit Schwämmle ◽

Richard Röttger

Keyword(s):

Machine Learning ◽

Mass Spectrometry ◽

Large Scale ◽

Peptide Identification ◽

Training Data ◽

Supplementary Information ◽

Large Sample Size ◽

Raw Data ◽

Machine Learning Applications ◽

Rich Data

AbstractMotivationLiquid-chromatography mass-spectrometry (LC-MS) is the established standard for analyzing the proteome in biological samples by identification and quantification of thousands of proteins. Machine learning (ML) promises to considerably improve the analysis of the resulting data, however, there is yet to be any tool that mediates the path from raw data to modern ML applications. More specifically, ML applications are currently hampered by three major limitations: (1) absence of balanced training data with large sample size; (2) unclear definition of sufficiently information-rich data representations for e.g. peptide identification; (3) lack of benchmarking of ML methods on specific LC-MS problems.ResultsWe created the MS2AI pipeline that automates the process of gathering vast quantities of mass spectrometry (MS) data for large scale ML applications. The software retrieves raw data from either in-house sources or from the proteomics identifications database, PRIDE. Subsequently, the raw data is stored in a standardized format amenable for ML encompassing MS1/MS2 spectra and peptide identifications. This tool bridges the gap between MS and AI, and to this effect we also present an ML application in the form of a convolutional neural network for the identification of oxidized peptides.AvailabilityAn open source implementation of the software can be found freely available for non-commercial use at https://gitlab.com/roettgerlab/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Optimization Methods for Large-Scale Machine Learning

SIAM Review ◽

10.1137/16m1080173 ◽

2018 ◽

Vol 60 (2) ◽

pp. 223-311 ◽

Cited By ~ 352

Author(s):

Léon Bottou ◽

Frank E. Curtis ◽

Jorge Nocedal

Keyword(s):

Machine Learning ◽

Large Scale ◽

Optimization Methods

Download Full-text