Communication-Efficient Modeling with Penalized Quantile Regression for Distributed Data

In order to deal with high-dimensional distributed data, this article develops a novel and communication-efficient approach for sparse and high-dimensional data with the penalized quantile regression. In each round, the proposed method only requires the master machine to deal with a sparse penalized quantile regression which could be realized fastly by proximal alternating direction method of multipliers (ADMM) algorithm and the other worker machines to compute the subgradient on local data. The advantage of the proximal ADMM algorithm is that it could make every parameter of iteration to have closed formula even in high-dimensional case, which greatly improves the speed of calculation. As for the communication efficiency, the proposed method does not sacrifice any statistical accuracy and provably improves the estimation error obtained by centralized method, provided the penalty levels are chosen properly. Moreover, the asymptotic properties of the proposed estimation and the convergence of the algorithm are convincible. Especially, it presents extensive experiments on both the numerical simulations and the HIV drug resistance data analysis, which all confirm the significant efficiency of our proposed method in quantile regression for distributed data by comparative and empirical analysis.

Download Full-text

Quantile Regression with Generated Regressors

Econometrics ◽

10.3390/econometrics9020016 ◽

2021 ◽

Vol 9 (2) ◽

pp. 16

Author(s):

Liqiong Chen ◽

Antonio F. Galvao ◽

Suyong Song

Keyword(s):

Quantile Regression ◽

Estimation Error ◽

Asymptotic Variance ◽

Asymptotic Properties ◽

Estimation Procedure ◽

Finite Sample ◽

Testing Procedures ◽

Engel Curves ◽

Generated Regressors ◽

Two Step Estimation

This paper studies estimation and inference for linear quantile regression models with generated regressors. We suggest a practical two-step estimation procedure, where the generated regressors are computed in the first step. The asymptotic properties of the two-step estimator, namely, consistency and asymptotic normality are established. We show that the asymptotic variance-covariance matrix needs to be adjusted to account for the first-step estimation error. We propose a general estimator for the asymptotic variance-covariance, establish its consistency, and develop testing procedures for linear hypotheses in these models. Monte Carlo simulations to evaluate the finite-sample performance of the estimation and inference procedures are provided. Finally, we apply the proposed methods to study Engel curves for various commodities using data from the UK Family Expenditure Survey. We document strong heterogeneity in the estimated Engel curves along the conditional distribution of the budget share of each commodity. The empirical application also emphasizes that correctly estimating confidence intervals for the estimated Engel curves by the proposed estimator is of importance for inference.

Download Full-text

Inference for High Dimensional Censored Quantile Regression

Journal of the American Statistical Association ◽

10.1080/01621459.2021.1957900 ◽

2021 ◽

pp. 1-37

Author(s):

Zhe Fei ◽

Qi Zheng ◽

Hyokyoung G. Hong ◽

Yi Li

Keyword(s):

Quantile Regression ◽

High Dimensional ◽

Censored Quantile Regression

Download Full-text

Multi-round smoothed composite quantile regression for distributed data

Annals of the Institute of Statistical Mathematics ◽

10.1007/s10463-021-00816-0 ◽

2022 ◽

Author(s):

Fengrui Di ◽

Lei Wang

Keyword(s):

Quantile Regression ◽

Distributed Data ◽

Composite Quantile Regression

Download Full-text

Fast Algorithms for LS and LAD-Collaborative Regression

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595922500014 ◽

2021 ◽

Author(s):

Jun Sun ◽

Lingchen Kong ◽

Mei Li

Keyword(s):

Numerical Experiments ◽

Modern Science ◽

Alternating Direction Method ◽

Least Square ◽

High Dimensional ◽

Statistical Interpretation ◽

Absolute Deviation ◽

Linear Rate ◽

Alternating Direction ◽

High Dimensional Datasets

With the development of modern science and technology, it is easy to obtain a large number of high-dimensional datasets, which are related but different. Classical unimodel analysis is less likely to capture potential links between the different datasets. Recently, a collaborative regression model based on least square (LS) method for this problem has been proposed. In this paper, we propose a robust collaborative regression based on the least absolute deviation (LAD). We give the statistical interpretation of the LS-collaborative regression and LAD-collaborative regression. Then we design an efficient symmetric Gauss–Seidel-based alternating direction method of multipliers algorithm to solve the two models, which has the global convergence and the Q-linear rate of convergence. Finally we report numerical experiments to illustrate the efficiency of the proposed methods.

Download Full-text

Functional Linear Regression

10.1093/oxfordhb/9780199568444.013.2 ◽

2018 ◽

Author(s):

Hervé Cardot ◽

Pascal Sarda

Keyword(s):

Linear Regression ◽

Least Squares ◽

Linear Models ◽

Estimation Error ◽

Asymptotic Properties ◽

Principal Component Regression ◽

Principal Component ◽

Penalized Least Squares ◽

Open Problems ◽

Functional Linear Regression

This article presents a selected bibliography on functional linear regression (FLR) and highlights the key contributions from both applied and theoretical points of view. It first defines FLR in the case of a scalar response and shows how its modelization can also be extended to the case of a functional response. It then considers two kinds of estimation procedures for this slope parameter: projection-based estimators in which regularization is performed through dimension reduction, such as functional principal component regression, and penalized least squares estimators that take into account a penalized least squares minimization problem. The article proceeds by discussing the main asymptotic properties separating results on mean square prediction error and results on L2 estimation error. It also describes some related models, including generalized functional linear models and FLR on quantiles, and concludes with a complementary bibliography and some open problems.

Download Full-text

An Iterative Coordinate Descent Algorithm for High-Dimensional Nonconvex Penalized Quantile Regression

Journal of Computational and Graphical Statistics ◽

10.1080/10618600.2014.913516 ◽

2015 ◽

Vol 24 (3) ◽

pp. 676-694 ◽

Cited By ~ 19

Author(s):

Bo Peng ◽

Lan Wang

Keyword(s):

Quantile Regression ◽

Coordinate Descent ◽

High Dimensional ◽

Descent Algorithm ◽

Coordinate Descent Algorithm

Download Full-text

High-dimensional Log-Error-in-Variable Regression with Applications to Microbial Compositional Data Analysis

Biometrika ◽

10.1093/biomet/asab020 ◽

2021 ◽

Author(s):

Pixu Shi ◽

Yuchen Zhou ◽

Anru R Zhang

Keyword(s):

Data Analysis ◽

Compositional Data ◽

Estimation Error ◽

Real Data ◽

Upper And Lower Bounds ◽

High Dimensional ◽

Compositional Data Analysis ◽

Sequencing Data ◽

Contrast Model ◽

Critical Issues

Abstract In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. In this article, we introduce a surprisingly simple, interpretable, and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides both corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. The merit of the procedure is illustrated through real data analysis and simulation studies.

Download Full-text

Extensions to Quantile Regression Forests for Very High-Dimensional Data

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-319-06605-9_21 ◽

2014 ◽

pp. 247-258 ◽

Cited By ~ 5

Author(s):

Nguyen Thanh Tung ◽

Joshua Zhexue Huang ◽

Imran Khan ◽

Mark Junjie Li ◽

Graham Williams

Keyword(s):

Quantile Regression ◽

High Dimensional Data ◽

High Dimensional ◽

Very High

Download Full-text

Distributed Fast-Tracking Alternating Direction Method of Multipliers (ADMM) Algorithm with Optimal Convergence Rate

10.1109/smc52423.2021.9658615 ◽

2021 ◽

Author(s):

Shreyansh Shethia ◽

Akshita Gupta ◽

Omanshu Thapliyal ◽

Inseok Hwang

Keyword(s):

Convergence Rate ◽

Alternating Direction Method ◽

Method Of Multipliers ◽

Optimal Convergence Rate ◽

Optimal Convergence ◽

Fast Tracking ◽

Alternating Direction ◽

Admm Algorithm

Download Full-text

An Inexact Projected Gradient Method for Sparsity-Constrained Quadratic Measurements Regression

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595919400086 ◽

2019 ◽

Vol 36 (02) ◽

pp. 1940008

Author(s):

Jun Fan ◽

Liqun Wang ◽

Ailing Yan

Keyword(s):

Gradient Method ◽

Least Squares Method ◽

Optimal Solution ◽

High Dimensional ◽

Sparse Signals ◽

Projected Gradient Method ◽

Constrained Least Squares ◽

Projected Gradient ◽

High Dimensional Case ◽

Noisy Measurements

In this paper, we employ the sparsity-constrained least squares method to reconstruct sparse signals from the noisy measurements in high-dimensional case, and derive the existence of the optimal solution under certain conditions. We propose an inexact sparse-projected gradient method for numerical computation and discuss its convergence. Moreover, we present numerical results to demonstrate the efficiency of the proposed method.

Download Full-text