High-Dimensional Least-Squares with Perfect Positive Correlation

The least-squares is a common and important method in linear regression. However, it often leads to overfitting phenomenon as dealing with high-dimensional problems, and various regularization schemes regarding prior information for specific problems are studied to make up such a deficiency. In the sense of Kendall’s [Formula: see text] from the community of nonparametric analysis, we establish a new model wherein the ordinary least-squares is equipped with perfect positive correlation constraint, sought to maintain the concordance of the rankings of the observations and the systematic components. By sorting the observations into an ascending order, we reduce the perfect positive correlation constraint into a linear inequality system. The resulting linearly constrained least-squares problem together with its dual problem is shown to be solvable. In particular, we introduce a mild assumption on the observations and the measurement matrix which rules out the zero vector from the optimal solution set. This indicates that our proposed model is statistically meaningful. To handle large-scale instances, we propose an efficient alternating direction method of multipliers (ADMM) to solve the proposed model from the dual perspective. The effectiveness of our model compared to ordinary least-squares is evaluated in terms of rank correlation coefficient between outputs and the systematic components, and the efficiency of our dual algorithm is demonstrated with the comparison to three efficient solvers via CVX in terms of computation time, solution accuracy and rank correlation coefficient.

Download Full-text

Wang and Leng (2016), High‐dimensional ordinary least‐squares projection for screening variables, Journal of the Royal Statistical Society Series B, 78, 589–611

Journal of the Royal Statistical Society Series B (Statistical Methodology) ◽

10.1111/rssb.12427 ◽

2021 ◽

Author(s):

Xiangyu Wang ◽

Chenlei Leng ◽

Tom Boot

Keyword(s):

Least Squares ◽

Ordinary Least Squares ◽

High Dimensional ◽

Royal Statistical Society

Download Full-text

Using Quantile Regression to Estimate Intervention Effects Beyond the Mean

Educational and Psychological Measurement ◽

10.1177/0013164419837321 ◽

2019 ◽

Vol 79 (5) ◽

pp. 883-910 ◽

Cited By ~ 1

Author(s):

Spyros Konstantopoulos ◽

Wei Li ◽

Shazia Miller ◽

Arie van der Ploeg

Keyword(s):

Quantile Regression ◽

Least Squares ◽

Large Scale ◽

Social Science Research ◽

Empirical Work ◽

Science Research ◽

Ordinary Least Squares ◽

Least Squares Regression ◽

Intervention Effects ◽

Education Data

This study discusses quantile regression methodology and its usefulness in education and social science research. First, quantile regression is defined and its advantages vis-à-vis vis ordinary least squares regression are illustrated. Second, specific comparisons are made between ordinary least squares and quantile regression methods. Third, the applicability of quantile regression to empirical work to estimate intervention effects is demonstrated using education data from a large-scale experiment. The estimation of quantile treatment effects at various quantiles in the presence of dropouts is also discussed. Quantile regression is especially suitable in examining predictor effects at various locations of the outcome distribution (e.g., lower and upper tails).

Download Full-text

The Development of an Ordinary Least Squares Parametric Model to Estimate the Cost Per Flying Hour of ‘Unknown’ Aircraft Types and a Comparative Application †

Aerospace ◽

10.3390/aerospace5040104 ◽

2018 ◽

Vol 5 (4) ◽

pp. 104 ◽

Cited By ~ 3

Author(s):

Ilias Lappas ◽

Michail Bozoudis

Keyword(s):

Least Squares ◽

Large Scale ◽

Ordinary Least Squares ◽

Parametric Model ◽

Development Programs ◽

Cost Estimating ◽

Life Cycle Stages ◽

Wide Range ◽

High Level ◽

The Cost

The development of a parametric model for the variable portion of the Cost Per Flying Hour (CPFH) of an ‘unknown’ aircraft platform and its application to diverse types of fixed and rotary wing aircraft development programs (F-35A, Su-57, Dassault Rafale, T-X candidates, AW189, Airbus RACER among others) is presented. The novelty of this paper lies in the utilization of a diverse sample of aircraft types, aiming to obtain a ‘universal’ Cost Estimating Relationship (CER) applicable to a wide range of platforms. Moreover, the model does not produce absolute cost figures but rather analogy ratios versus the F-16’s CPFH, broadening the model’s applicability. The model will enable an analyst to carry out timely and reliable Operational and Support (O&S) cost estimates for a wide range of ‘unknown’ aircraft platforms at their early stages of conceptual design, despite the lack of actual data from the utilization and support life cycle stages. The statistical analysis is based on Ordinary Least Squares (OLS) regression, conducted with R software (v5.3.1, released on 2 July 2018). The model’s output is validated against officially published CPFH data of several existing ‘mature’ aircraft platforms, including one of the most prolific fighter jet types all over the world, the F-16C/D, which is also used as a reference to compare CPFH estimates of various next generation aircraft platforms. Actual CPFH data of the Hellenic Air Force (HAF) have been used to develop the parametric model, the application of which is expected to significantly inform high level decision making regarding aircraft procurement, budgeting and future force structure planning, including decisions related to large scale aircraft modifications and upgrades.

Download Full-text

Smartphone and Tablet Application (App) Life Cycle Characterization via Apple App Store Rank

Data and Information Management ◽

10.2478/dim-2020-0002 ◽

2020 ◽

Vol 4 (1) ◽

pp. 44-67

Author(s):

Han Jia ◽

Chun Guo ◽

Xiaozhong Liu

Keyword(s):

Life Cycle ◽

Least Squares ◽

Product Life Cycle ◽

Large Scale ◽

Ordinary Least Squares ◽

Heterogeneous Data ◽

Life Cycles ◽

Volume Data ◽

App Store ◽

Management Domain

AbstractWith the rapid growth of the smartphone and tablet market, mobile application (App) industry that provides a variety of functional devices is also growing at a striking speed. Product life cycle (PLC) theory, which has a long history, has been applied to a great number of industries and products and is widely used in the management domain. In this study, we apply classical PLC theory to mobile Apps on Apple smartphone and tablet devices (Apple App Store). Instead of trying to utilize often-unavailable sales or download volume data, we use open-access App daily download rankings as an indicator to characterize the normalized dynamic market popularity of an App. We also use this ranking information to generate an App life cycle model. By using this model, we compare paid and free Apps from 20 different categories. Our results show that Apps across various categories have different kinds of life cycles and exhibit various unique and unpredictable characteristics. Furthermore, as large-scale heterogeneous data (e.g., user App ratings, App hardware/software requirements, or App version updates) become available and are attached to each target App, an important contribution of this paper is that we perform in-depth studies to explore how such data correlate and affect the App life cycle. Using different regression techniques (i.e., logistic, ordinary least squares, and partial least squares), we built different models to investigate these relationships. The results indicate that some explicit and latent independent variables are more important than others for the characterization of App life cycle. In addition, we find that life cycle analysis for different App categories requires different tailored regression models, confirming that inner-category App life cycles are more predictable and comparable than App life cycles across different categories.

Download Full-text

Activity Recommendation Model Using Rank Correlation for Chronic Stress Management

Applied Sciences ◽

10.3390/app9204284 ◽

2019 ◽

Vol 9 (20) ◽

pp. 4284 ◽

Cited By ~ 5

Author(s):

Ji-Soo Kang ◽

Dong-Hoon Shin ◽

Ji-Won Baek ◽

Kyungyong Chung

Keyword(s):

Stress Management ◽

Correlation Coefficient ◽

Chronic Stress ◽

Rank Correlation ◽

Korean People ◽

Spearman’S Rank Correlation ◽

Spearman’S Rank Correlation Coefficient ◽

Proposed Model ◽

Basic Rank

Korean people are exposed to stress due to the constant competitive structure caused by rapid industrialization. As a result, there is a need for ways that can effectively manage stress and help improve quality of life. Therefore, this study proposes an activity recommendation model using rank correlation for chronic stress management. Using Spearman’s rank correlation coefficient, the proposed model finds the correlations between users’ Positive Activity for Stress Management (PASM), Negative Activity for Stress Management (NASM), and Perceived Stress Scale (PSS). Spearman’s rank correlation coefficient improves the accuracy of recommendations by putting a basic rank value in a missing value to solve the sparsity problem and cold-start problem. For the performance evaluation of the proposed model, F-measure is applied using the average precision and recall after five times of recommendations for 20 users. As a result, the proposed method has better performance than other models, since it recommends activities with the use of the correlation between PASM and NASM. The proposed activity recommendation model for stress management makes it possible to manage user’s stress effectively by lowering the user’s PSS using correlation.

Download Full-text

High dimensional ordinary least squares projection for screening variables

Journal of the Royal Statistical Society Series B (Statistical Methodology) ◽

10.1111/rssb.12127 ◽

2015 ◽

Vol 78 (3) ◽

pp. 589-611 ◽

Cited By ~ 18

Author(s):

Xiangyu Wang ◽

Chenlei Leng

Keyword(s):

Least Squares ◽

Ordinary Least Squares ◽

High Dimensional

Download Full-text

Lasso-based simulation for high-dimensional multi-period portfolio optimization

IMA Journal of Management Mathematics ◽

10.1093/imaman/dpz013 ◽

2019 ◽

Vol 31 (3) ◽

pp. 257-280

Author(s):

Zhongyu Li ◽

Ka Ho Tsang ◽

Hoi Ying Wong

Keyword(s):

Least Squares ◽

Portfolio Optimization ◽

Optimization Problems ◽

Estimation Error ◽

Sample Path ◽

Ordinary Least Squares ◽

Monte Carlo Algorithm ◽

High Dimensional ◽

Finite Sample ◽

Sample Paths

Abstract This paper proposes a regression-based simulation algorithm for multi-period mean-variance portfolio optimization problems with constraints under a high-dimensional setting. For a high-dimensional portfolio, the least squares Monte Carlo algorithm for portfolio optimization can perform less satisfactorily with finite sample paths due to the estimation error from the ordinary least squares (OLS) in the regression steps. Our algorithm, which resolves this problem e, that demonstrates significant improvements in numerical performance for the case of finite sample path and high dimensionality. Specifically, we replace the OLS by the least absolute shrinkage and selection operator (lasso). Our major contribution is the proof of the asymptotic convergence of the novel lasso-based simulation in a recursive regression setting. Numerical experiments suggest that our algorithm achieves good stability in both low- and higher-dimensional cases.

Download Full-text

An Integrated Multi-Echelon Supply Chain Network Design Considering Stochastic Demand: A Genetic Algorithm Based Solution

PROMET - Traffic&Transportation ◽

10.7307/ptt.v29i4.2193 ◽

2017 ◽

Vol 29 (4) ◽

pp. 391-400 ◽

Cited By ~ 6

Author(s):

Sara Nakhjirkan ◽

Farimah Mokhatab Rafiei

Keyword(s):

Genetic Algorithm ◽

Supply Chain ◽

Large Scale ◽

Human Life ◽

Optimal Solution ◽

Green Supply Chain ◽

Inventory Routing ◽

Model Object ◽

Proposed Model ◽

Practical Tool

The growing trend of natural resources consumption has caused irreparable losses to the environment. The scientists believe that if environmental degradation continues at its current pace, the prospect of human life will be shrouded in mystery. One of the most effective ways to deal with the environmental adverse effects is by implementing green supply chains. In this study a multilevel mathematical model including supply, production, distribution and customer levels has been presented for routing–location–inventoryin green supply chain. Vehicle routing between distribution centres and customers has been considered in the model. Establishment place of distribution centres among potential places is determined by the model. The distributors use continuous review policy (r, Q) to control the inventory. The proposed model object is to find an optimal supply chain with minimum costs. To validate the proposed model and measure its compliance with real world problems, GAMS IDE/Cplex has been used. In order to measure the efficiency of the proposed model in large scale problems, a genetic algorithm has been used. The results confirm the efficiency of the proposed model as a practical tool for decision makers to solve location-inventory-routing problems in green supply chain. The proposed GA could reduce the solving time by 85% while reaching on the average 97% of optimal solution compared with exact method.

Download Full-text

HB-PLS: An algorithm for identifying biological process or pathway regulators by integrating Huber loss and Berhu penalty with partial least squares regression

10.1101/2020.05.16.089623 ◽

2020 ◽

Author(s):

Wenping Deng ◽

Kui Zhang ◽

Zhigang Wei ◽

Lihu Wang ◽

Cheng He ◽

...

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Regulatory Networks ◽

Biological Process ◽

Ordinary Least Squares ◽

Descent Method ◽

High Dimensional ◽

Gradient Descent Method ◽

Least Squares Regression ◽

Non Gaussian

AbstractGene expression data features high dimensionality, multicollinearity, and the existence of outlier or non-Gaussian distribution noise, which make the identification of true regulatory genes controlling a biological process or pathway difficult. In this study, we embedded the Huber-Berhu (HB) regression into the partial least squares (PLS) framework and created a new method called HB-PLS for predicting biological process or pathway regulators through construction of regulatory networks. PLS is an alternative to ordinary least squares (OLS) for handling multicollinearity in high dimensional data. The Huber loss is more robust to outliers than square loss, and the Berhu penalty can obtain a better balance between the ℓ2 penalty and the ℓ1 penalty. HB-PLS therefore inherits the advantages of the Huber loss, the Berhu penalty, and PLS. To solve the Huber-Berhu regression, a fast proximal gradient descent method was developed; the HB regression runs much faster than CVX, a Matlab-based modeling system for convex optimization. Implementation of HB-PLS to real transcriptomic data from Arabidopsis and maize led to the identification of many pathway regulators that had previously been identified experimentally. In terms of its efficiency in identifying positive biological process or pathway regulators, HB-PLS is comparable to sparse partial least squares (SPLS), a very efficient method developed for variable selection and dimension reduction in handling multicollinearity in high dimensional genomic data. However, HB-PLS is able to identify some distinct regulators, and in one case identify more positive regulators at the top of output list, which can reduce the burden for experimental test of the identified candidate targets. Our study suggests that HB-PLS is instrumental for identifying biological process and pathway genes.

Download Full-text

Projective Quadratic Regression for Online Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5951 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5093-5100

Author(s):

Wenye Ma

Keyword(s):

Online Learning ◽

Large Scale ◽

Learning Algorithm ◽

Optimal Solution ◽

Streaming Data ◽

Low Rank ◽

High Dimensional ◽

Quadratic Regression ◽

Convex Model ◽

Real World Data

This paper considers online convex optimization (OCO) problems - the paramount framework for online learning algorithm design. The loss function of learning task in OCO setting is based on streaming data so that OCO is a powerful tool to model large scale applications such as online recommender systems. Meanwhile, real-world data are usually of extreme high-dimensional due to modern feature engineering techniques so that the quadratic regression is impractical. Factorization Machine as well as its variants are efficient models for capturing feature interactions with low-rank matrix model but they can't fulfill the OCO setting due to their non-convexity. In this paper, We propose a projective quadratic regression (PQR) model. First, it can capture the import second-order feature information. Second, it is a convex model, so the requirements of OCO are fulfilled and the global optimal solution can be achieved. Moreover, existing modern online optimization methods such as Online Gradient Descent (OGD) or Follow-The-Regularized-Leader (FTRL) can be applied directly. In addition, by choosing a proper hyper-parameter, we show that it has the same order of space and time complexity as the linear model and thus can handle high-dimensional data. Experimental results demonstrate the performance of the proposed PQR model in terms of accuracy and efficiency by comparing with the state-of-the-art methods.

Download Full-text