A Projection Framework for Testing Shape Restrictions That Form Convex Cones

This paper develops a uniformly valid and asymptotically nonconservative test based on projection for a class of shape restrictions. The key insight we exploit is that these restrictions form convex cones, a simple and yet elegant structure that has been barely harnessed in the literature. Based on a monotonicity property afforded by such a geometric structure, we construct a bootstrap procedure that, unlike many studies in nonstandard settings, dispenses with estimation of local parameter spaces, and the critical values are obtained in a way as simple as computing the test statistic. Moreover, by appealing to strong approximations, our framework accommodates nonparametric regression models as well as distributional/density‐related and structural settings. Since the test entails a tuning parameter (due to the nonstandard nature of the problem), we propose a data‐driven choice and prove its validity. Monte Carlo simulations confirm that our test works well.

Download Full-text

Research on a novel data-driven aging estimation method for battery systems in real-world electric vehicles

Advances in Mechanical Engineering ◽

10.1177/16878140211027735 ◽

2021 ◽

Vol 13 (7) ◽

pp. 168781402110277

Author(s):

Yankai Hou ◽

Zhaosheng Zhang ◽

Peng Liu ◽

Chunbao Song ◽

Zhenpo Wang

Keyword(s):

Electric Vehicles ◽

Real World ◽

Regression Models ◽

Estimation Method ◽

Recursive Least Squares ◽

Data Driven ◽

Accurate Estimation ◽

Support Vector ◽

Battery Degradation ◽

Operational Data

Accurate estimation of the degree of battery aging is essential to ensure safe operation of electric vehicles. In this paper, using real-world vehicles and their operational data, a battery aging estimation method is proposed based on a dual-polarization equivalent circuit (DPEC) model and multiple data-driven models. The DPEC model and the forgetting factor recursive least-squares method are used to determine the battery system’s ohmic internal resistance, with outliers being filtered using boxplots. Furthermore, eight common data-driven models are used to describe the relationship between battery degradation and the factors influencing this degradation, and these models are analyzed and compared in terms of both estimation accuracy and computational requirements. The results show that the gradient descent tree regression, XGBoost regression, and light GBM regression models are more accurate than the other methods, with root mean square errors of less than 6.9 mΩ. The AdaBoost and random forest regression models are regarded as alternative groups because of their relative instability. The linear regression, support vector machine regression, and k-nearest neighbor regression models are not recommended because of poor accuracy or excessively high computational requirements. This work can serve as a reference for subsequent battery degradation studies based on real-time operational data.

Download Full-text

A Test for Functional Form Against Nonparametric Alternatives

Econometric Theory ◽

10.1017/s0266466600013165 ◽

1992 ◽

Vol 8 (4) ◽

pp. 452-475 ◽

Cited By ~ 72

Author(s):

Jeffrey M. Wooldridge

Keyword(s):

Alternative Model ◽

Regression Models ◽

Functional Form ◽

Parametric Model ◽

Finite Sample ◽

Test Statistic ◽

Sieve Estimation ◽

Finite Sample Properties ◽

Standard Normal ◽

Nonparametric Alternatives

A test for neglected nonlinearities in regression models is proposed. The test is of the Davidson-MacKinnon type against an increasingly rich set of non-nested alternatives, and is based on sieve estimation of the alternative model. For the case of a linear parametric model, the test statistic is shown to be asymptotically standard normal under the null, while rejecting with probability going to one if the linear model is misspecified. A small simulation study suggests that the test has adequate finite sample properties, but one must guard against over fitting the nonparametric alternative.

Download Full-text

NeMoR: a New Method Based on Data-Driven for Neonatal Mortality Rate Forecasting

10.1101/2021.04.22.21255916 ◽

2021 ◽

Author(s):

Carlos Eduardo Beluzo ◽

Luciana Correia Alves ◽

Natália Martins Arruda ◽

Cátia Sepetauskas ◽

Everton Silva ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Neonatal Mortality ◽

Regression Models ◽

Mortality Rates ◽

Data Driven ◽

Health Policies ◽

Neonatal Mortality Rate ◽

Policy Makers ◽

Public Health Policies

ABSTRACTReduction in child mortality is one of the United Nations Sustainable Development Goals for 2030. In Brazil, despite recent reduction in child mortality in the last decades, the neonatal mortality is a persistent problem and it is associated with the quality of prenatal, childbirth care and social-environmental factors. In a proper health system, the effect of some of these factors could be minimized by the appropriate number of newborn intensive care units, number of health care units, number of neonatal incubators and even by the correct level of instruction of mothers, which can lead to a proper care along the prenatal period. With the intent of providing knowledge resources for planning public health policies focused on neonatal mortality reduction, we propose a new data-driven machine leaning method for Neonatal Mortality Rate forecasting called NeMoR, which predicts neonatal mortality rates for 4 months ahead, using NeoDeathForecast, a monthly base time series dataset composed by these factors and by neonatal mortality rates history (2006-2016), having 57,816 samples, for all 438 Brazilian administrative health regions. In order to build the model, Extra-Tree, XGBoost Regressor, Gradient Boosting Regressor and Lasso machine learning regression models were evaluated and a hyperparameters search was also performed as a fine tune step. The method has been validated using São Paulo city data, mainly because of data quality. On the better configuration the method predicted the neonatal mortality rates with a Mean Square Error lower than 0.18. Besides that, the forecast results may be useful as it provides a way for policy makers to anticipate trends on neonatal mortality rates curves, an important resource for planning public health policies.Graphical AbstractHighlightsProposition of a new data-driven approach for neonatal mortality rate forecast, which provides a way for policy-makers to anticipate trends on neonatal mortality rates curves, making a better planning of health policies focused on NMR reduction possible;a method for NMR forecasting with a MSE lower than 0.18;an extensive evaluation of different Machine Learning (ML) regression models, as well as hyperparameters search, which accounts for the last stage in NeMoR;a new time series database for NMR prediction problems;a new features projection space for NMR forecasting problems, which considerably reduces errors in NRM prediction.

Download Full-text

Regional regression models of percentile flows for the contiguous US: Expert versus data-driven independent variable selection

10.5194/hess-2016-639 ◽

2016 ◽

Author(s):

Geoffrey Fouad ◽

André Skupin ◽

Christina L. Tague

Keyword(s):

Regression Model ◽

Regression Models ◽

Predictive Performance ◽

Data Driven ◽

Mean Annual Precipitation ◽

Expert Assessment ◽

Independent Variables ◽

Regional Regression ◽

Data Driven Approach ◽

Small Set

Abstract. Percentile flows are statistics derived from the flow duration curve (FDC) that describe the flow equaled or exceeded for a given percent of time. These statistics provide important information for managing rivers, but are often unavailable since most basins are ungauged. A common approach for predicting percentile flows is to deploy regional regression models based on gauged percentile flows and related independent variables derived from physical and climatic data. The first step of this process identifies groups of basins through a cluster analysis of the independent variables, followed by the development of a regression model for each group. This entire process hinges on the independent variables selected to summarize the physical and climatic state of basins. Distributed physical and climatic datasets now exist for the contiguous United States (US). However, it remains unclear how to best represent these data for the development of regional regression models. The study presented here developed regional regression models for the contiguous US, and evaluated the effect of different approaches for selecting the initial set of independent variables on the predictive performance of the regional regression models. An expert assessment of the dominant controls on the FDC was used to identify a small set of independent variables likely related to percentile flows. A data-driven approach was also applied to evaluate two larger sets of variables that consist of either (1) the averages of data for each basin or (2) both the averages and statistical distribution of basin data distributed in space and time. The small set of variables from the expert assessment of the FDC and two larger sets of variables for the data-driven approach were each applied for a regional regression procedure. Differences in predictive performance were evaluated using 184 validation basins withheld from regression model development. The small set of independent variables selected through expert assessment produced similar, if not better, performance than the two larger sets of variables. A parsimonious set of variables only consisted of mean annual precipitation, potential evapotranspiration, and baseflow index. Additional variables in the two larger sets of variables added little to no predictive information. Regional regression models based on the parsimonious set of variables were developed using 734 calibration basins, and were converted into a tool for predicting 13 percentile flows in the contiguous US. Supplementary Material for this paper includes an R graphical user interface for predicting the percentile flows of basins within the range of conditions used to calibrate the regression models. The equations and performance statistics of the models are also supplied in tabular form.

Download Full-text

Remaining Useful Life Prediction of the Concrete Piston Based on Probability Statistics and Data Driven

Applied Sciences ◽

10.3390/app11188482 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8482

Author(s):

Jie Li ◽

Yuejin Tan ◽

Bingfeng Ge ◽

Hua Zhao ◽

Xin Lu

Keyword(s):

Inventory Management ◽

Regression Models ◽

Life Prediction ◽

Remaining Useful Life ◽

Data Driven ◽

Distribution Fitting ◽

Useful Life ◽

Concrete Pump Truck ◽

Concrete Pump ◽

Actual Life

This paper proposes a method on predicting the remaining useful life (RUL) of a concrete piston of a concrete pump truck based on probability statistics and data-driven approaches. Firstly, the average useful life of the concrete piston is determined by probability distribution fitting using actual life data. Secondly, according to condition monitoring data of the concrete pump truck, a concept of life coefficient of the concrete piston is proposed to represent the influence of the loading condition on the actual useful life of individual concrete pistons, and different regression models are established to predict the RUL of the concrete pistons. Finally, according to the prediction result of the concrete piston at different life stages, a replacement warning point is established to provide support for the inventory management and replacement plan of the concrete piston.

Download Full-text

Assessment of groundwater nitrate contamination hazard in a semi-arid region by using integrated parametric IPNOA and data-driven logistic regression models

Environmental Monitoring and Assessment ◽

10.1007/s10661-018-7013-8 ◽

2018 ◽

Vol 190 (11) ◽

Cited By ~ 17

Author(s):

Hossein Mojaddadi Rizeei ◽

Omer Saud Azeez ◽

Biswajeet Pradhan ◽

Hayder Hassan Khamees

Keyword(s):

Logistic Regression ◽

Arid Region ◽

Regression Models ◽

Nitrate Contamination ◽

Data Driven ◽

Logistic Regression Models ◽

Semi Arid Region ◽

Groundwater Nitrate Contamination ◽

Semi Arid

Download Full-text

Inference for Support Vector Regression under ℓ1 Regularization

AEA Papers and Proceedings ◽

10.1257/pandp.20211035 ◽

2021 ◽

Vol 111 ◽

pp. 611-615

Author(s):

Yuehao Bai ◽

Hung Ho ◽

Guillaume A. Pouliot ◽

Joshua Shea

Keyword(s):

Support Vector Regression ◽

Tuning Parameter ◽

Support Vector ◽

L1 Norm ◽

Test Statistic ◽

Sample Distribution ◽

Large Sample ◽

Variance Estimate ◽

Error Bars ◽

ℓ1 Regularization

We provide large-sample distribution theory for support vector regression (SVR) with l1-norm along with error bars for the SVR regression coefficients. Although a classical Wald confidence interval obtains from our theory, its implementation inherently depends on the choice of a tuning parameter that scales the variance estimate and thus the width of the error bars. We address this shortcoming by further proposing an alternative large-sample inference method based on the inversion of a novel test statistic that displays competitive power properties and does not depend on the choice of a tuning parameter.

Download Full-text

Alternative Algorithm for Automatically Driving Best-Fit Building Energy Baseline Models Using a Data—Driven Grid Search

Sustainability ◽

10.3390/su11246976 ◽

2019 ◽

Vol 11 (24) ◽

pp. 6976

Author(s):

Suwon Song ◽

Chun Gun Park

Keyword(s):

Change Point ◽

Regression Models ◽

Building Energy ◽

Measured Data ◽

Data Driven ◽

Grid Search ◽

Point Model ◽

Change Point Model ◽

Best Fit ◽

Optimal Change

Change-point regression models are often used to develop building energy baselines that can be used to predict energy use and determine energy savings during a given performance period. However, the reliability of building energy baselines can depend on how well the change-point model fits the data measured during the baseline period. This research proposes the use of segmented linear regression models with one or two change points for automatically driving best-fit building energy baseline models, along with an algorithm using a data-driven grid search to find the optimal change point(s) within a given data boundary for the proposed models. The algorithm was programmed and tested with actual measured data (e.g., daily gas and electricity use) for case-study buildings. Graphical and statistical analysis was also performed to validate its reliability within acceptable deviations of an overall coefficient of variation of the root mean squared error (i.e., CV(RMSE)) of 1%, as compared to the results derived from the ASHRAE Inverse Model Toolkit (IMT) that was developed as a public domain program to manually derive the change-point model with user specified parameters. Consequently, it is expected that the algorithm can be applied for automatically deriving best-fit building energy baseline models with optimal change point(s) from measured data.

Download Full-text