scholarly journals High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection

2019 ◽  
Vol 1 (1) ◽  
pp. 359-383 ◽  
Author(s):  
Frank Emmert-Streib ◽  
Matthias Dehmer

Regression models are a form of supervised learning methods that are important for machine learning, statistics, and general data science. Despite the fact that classical ordinary least squares (OLS) regression models have been known for a long time, in recent years there are many new developments that extend this model significantly. Above all, the least absolute shrinkage and selection operator (LASSO) model gained considerable interest. In this paper, we review general regression models with a focus on the LASSO and extensions thereof, including the adaptive LASSO, elastic net, and group LASSO. We discuss the regularization terms responsible for inducing coefficient shrinkage and variable selection leading to improved performance metrics of these regression models. This makes these modern, computational regression models valuable tools for analyzing high-dimensional problems.

2020 ◽  
Author(s):  
Haruyo Nakamura ◽  
Floriano Amimo ◽  
Siyan Yi ◽  
Sovannary Tuot ◽  
Tomoya Yoshida ◽  
...  

Abstract BackgroundFinancial protection is a key health system objective and an essential dimension of universal health coverage. However, it is a challenge for low- and middle-income countries, where the general tax revenue is limited, and a majority of the population is engaged in the informal economy. This study developed and validated regression models for Cambodia to predict household consumption, which allows the country to collect insurance contributions according to one’s ability to pay. This strategy would maximize the contribution revenue, optimize the government subsidy, and simultaneously ensure equity in healthcare access.MethodsThis study used nationally representative survey data collected annually between 2010 and 2017, involving 38472 households. We developed four alternative prediction models for annual household consumption: ordinary least squares (OLS) method with manually selected predictors, OLS method with stepwise backward variable selection, mixed-effects linear regression, and elastic net regression, which resulted in an adaptive least absolute shrinkage and selection operator (LASSO) regression. Household-level socioeconomic characteristics were also included as the predictors. Subsequently, we performed out-of-sample cross-validation for each model. Finally, we evaluated the prediction performance of the models using mean absolute error, root mean squared error, and mean absolute percentage error (MAPE). ResultsOverall, we found a linearly positive relationship between observed and predicted household consumptions in all four models. While the prediction performance of the four alternative models did not substantially differ, Stepwise Linear Model showed the best performance with the lowest values in all three statistical measurements, including MAPE of 1.376%. The use of regularization and the mixed effects in the regression was not particularly effective in this environment. The household consumption was better predicted for those with lower consumption, and the predictive performance declined as the consumption level increased. Although the richer household consumptions were likely to be overestimated, the trend was less noticeable in Adaptive LASSO Model.ConclusionsThis study suggests the possibility of predicting household consumption at a reasonable level with the existing survey data. Such a prediction would enable the country to raise the secured health insurance revenue equitably. The prediction model should be tested in real settings and continuously improved.


Author(s):  
Jeremy Freese

This article presents a method and program for identifying poorly fitting observations for maximum-likelihood regression models for categorical dependent variables. After estimating a model, the program leastlikely will list the observations that have the lowest predicted probabilities of observing the value of the outcome category that was actually observed. For example, when run after estimating a binary logistic regression model, leastlikely will list the observations with a positive outcome that had the lowest predicted probabilities of a positive outcome and the observations with a negative outcome that had the lowest predicted probabilities of a negative outcome. These can be considered the observations in which the outcome is most surprising given the values of the independent variables and the parameter estimates and, like observations with large residuals in ordinary least squares regression, may warrant individual inspection. Use of the program is illustrated with examples using binary and ordered logistic regression.


2021 ◽  
pp. 009385482110067
Author(s):  
Matthew C. Matusiak

Research suggests policing is a highly institutionalized field. Limited attention has been paid, however, to the institutionalization of leaders’ views. Assessing turnover in 71 Texas police organizations between October, 2011, and July, 2015, this research evaluates whether there is consistency (i.e., institutional homogenization) after turnover in chiefs’ perceptions of their environments and agency priorities. The research is unique in that it assesses two chiefs’ perceptions that have both led the same law enforcement agency in successive time periods. Assessments of environment and priorities from former chiefs and those replacing them are evaluated utilizing descriptive, bivariate, and multivariate methods. These assessments are also compared with a control group of chiefs from agencies not experiencing turnover. Bivariate results suggest little variation across current and former chiefs, whereas ordinary least squares (OLS) regression models suggest differing relationships across chiefs groups between environmental perceptions and agency priorities. Discussion of the findings is framed by institutional theory.


Risks ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 10
Author(s):  
Valentina Kravchenko ◽  
Tatiana Kudryavtseva ◽  
Yuriy Kuporov

The issue of economic security is becoming an increasingly urgent one. The purpose of this article is to develop a method for assessing threats to the economic security of the Russian region. This method is based on step-by-step actions: first of all, choosing an element of the region’s economic security system and collecting its descriptive indicators; then grouping indicators by admittance-process-result categories and building hypotheses about their influence; testing hypotheses using a statistical package and choosing the most significant connections, which can pose a threat to the economic security of the region; thereafter ranking regions by the level of threats and developing further recommendations. The importance of this method is that with the help of grouping regions (territory of a country) based on proposed method, it is possible to develop individual economic security monitoring tools. As a result, the efficiency of that country’s region can be higher. In this work, the proposed method was tested in the framework of public procurement in Russia. A total of 14 indicators of procurement activity were collected for each region of the Russian Federation for the period from 2014 to 2018. Regression models were built on the basis of the grouped indicators. Ordinary Least Squares (OLS) Estimation was used. As a result of pairwise regression models analysis, we have defined four significant relationships between public procurement indicators. There are positive connections between contracts that require collateral and the percentage of tolerances, between the number of bidders and the number of regular suppliers, between the number of bidders and the average price drop, and between the number of purchases made from a single supplier and the number of contracts concluded without reduction. It was determined that the greatest risks for the system were associated with the connection between competition and budget savings. It was proposed to rank analyzed regions into four groups: ineffective government procurement, effective government procurement, and government procurement that threatens the system of economic security of the region, that is, high competition with low savings and low competition with high savings. Based on these groups, individual economic security monitoring tools can be developed for each region.


Author(s):  
Hector Donaldo Mata ◽  
Mohammed Hadi ◽  
David Hale

Transportation agencies utilize key performance indicators (KPIs) to measure the performance of their traffic networks and business processes. To make effective decisions based on these KPIs, there is a need to align the KPIs at the strategic, tactical, and operational decision levels and to set targets for these KPIs. However, there has been no known effort to develop methods to ensure this alignment producing a correlative model to explore the relationships to support the derivation of the KPI targets. Such development will lead to more realistic target setting and effective decisions based on these targets, ensuring that agency goals are met subject to the available resources. This paper presents a methodology in which the KPIs are represented in a tree-like structure that can be used to depict the association between metrics at the strategic, tactical, and operational levels. Utilizing a combination of business intelligence and machine learning tools, this paper demonstrates that it is possible not only to identify such relationships but also to quantify them. The proposed methodology compares the effectiveness and accuracy of multiple machine learning models including ordinary least squares regression (OLS), least absolute shrinkage and selection operator (LASSO), and ridge regression, for the identification and quantification of interlevel relationships. The output of the model allows the identification of which metrics have more influence on the upper-level KPI targets. The analysis can be performed at the system, facility, and segment levels, providing important insights on what investments are needed to improve system performance.


Politics ◽  
2018 ◽  
Vol 39 (4) ◽  
pp. 464-479
Author(s):  
Gert-Jan Put ◽  
Jef Smulders ◽  
Bart Maddens

This article investigates the effect of candidates exhibiting local personal vote-earning attributes (PVEA) on the aggregate party vote share at the district level. Previous research has often assumed that packing ballot lists with localized candidates increases the aggregate party vote and seat shares. We present a strict empirical test of this argument by analysing the relative electoral swing of ballot lists at the district level, a measure of change in party vote shares which controls for the national party trend and previous party results in the district. The analysis is based on data of 7527 candidacies during six Belgian regional and federal election cycles between 2003 and 2014, which is aggregated to an original data set of 223 ballot lists. The ordinary least squares (OLS) regression models do not show a significant effect of candidates exhibiting local PVEA on relative electoral swing of ballot lists. However, the results suggest that ballot lists do benefit electorally if candidates with local PVEA are geographically distributed over different municipalities in the district.


Author(s):  
George H. Cheng ◽  
Adel Younis ◽  
Kambiz Haji Hajikolaei ◽  
G. Gary Wang

Mode Pursuing Sampling (MPS) was developed as a global optimization algorithm for optimization problems involving expensive black box functions. MPS has been found to be effective and efficient for problems of low dimensionality, i.e., the number of design variables is less than ten. A previous conference publication integrated the concept of trust regions into the MPS framework to create a new algorithm, TRMPS, which dramatically improved performance and efficiency for high dimensional problems. However, although TRMPS performed better than MPS, it was unproven against other established algorithms such as GA. This paper introduces an improved algorithm, TRMPS2, which incorporates guided sampling and low function value criterion to further improve algorithm performance for high dimensional problems. TRMPS2 is benchmarked against MPS and GA using a suite of test problems. The results show that TRMPS2 performs better than MPS and GA on average for high dimensional, expensive, and black box (HEB) problems.


Sign in / Sign up

Export Citation Format

Share Document