scholarly journals A Projection Framework for Testing Shape Restrictions That Form Convex Cones

Econometrica ◽  
2021 ◽  
Vol 89 (5) ◽  
pp. 2439-2458 ◽  
Author(s):  
Zheng Fang ◽  
Juwon Seo

This paper develops a uniformly valid and asymptotically nonconservative test based on projection for a class of shape restrictions. The key insight we exploit is that these restrictions form convex cones, a simple and yet elegant structure that has been barely harnessed in the literature. Based on a monotonicity property afforded by such a geometric structure, we construct a bootstrap procedure that, unlike many studies in nonstandard settings, dispenses with estimation of local parameter spaces, and the critical values are obtained in a way as simple as computing the test statistic. Moreover, by appealing to strong approximations, our framework accommodates nonparametric regression models as well as distributional/density‐related and structural settings. Since the test entails a tuning parameter (due to the nonstandard nature of the problem), we propose a data‐driven choice and prove its validity. Monte Carlo simulations confirm that our test works well.

2021 ◽  
Vol 13 (7) ◽  
pp. 168781402110277
Author(s):  
Yankai Hou ◽  
Zhaosheng Zhang ◽  
Peng Liu ◽  
Chunbao Song ◽  
Zhenpo Wang

Accurate estimation of the degree of battery aging is essential to ensure safe operation of electric vehicles. In this paper, using real-world vehicles and their operational data, a battery aging estimation method is proposed based on a dual-polarization equivalent circuit (DPEC) model and multiple data-driven models. The DPEC model and the forgetting factor recursive least-squares method are used to determine the battery system’s ohmic internal resistance, with outliers being filtered using boxplots. Furthermore, eight common data-driven models are used to describe the relationship between battery degradation and the factors influencing this degradation, and these models are analyzed and compared in terms of both estimation accuracy and computational requirements. The results show that the gradient descent tree regression, XGBoost regression, and light GBM regression models are more accurate than the other methods, with root mean square errors of less than 6.9 mΩ. The AdaBoost and random forest regression models are regarded as alternative groups because of their relative instability. The linear regression, support vector machine regression, and k-nearest neighbor regression models are not recommended because of poor accuracy or excessively high computational requirements. This work can serve as a reference for subsequent battery degradation studies based on real-time operational data.


1992 ◽  
Vol 8 (4) ◽  
pp. 452-475 ◽  
Author(s):  
Jeffrey M. Wooldridge

A test for neglected nonlinearities in regression models is proposed. The test is of the Davidson-MacKinnon type against an increasingly rich set of non-nested alternatives, and is based on sieve estimation of the alternative model. For the case of a linear parametric model, the test statistic is shown to be asymptotically standard normal under the null, while rejecting with probability going to one if the linear model is misspecified. A small simulation study suggests that the test has adequate finite sample properties, but one must guard against over fitting the nonparametric alternative.


2021 ◽  
Author(s):  
Carlos Eduardo Beluzo ◽  
Luciana Correia Alves ◽  
Natália Martins Arruda ◽  
Cátia Sepetauskas ◽  
Everton Silva ◽  
...  

ABSTRACTReduction in child mortality is one of the United Nations Sustainable Development Goals for 2030. In Brazil, despite recent reduction in child mortality in the last decades, the neonatal mortality is a persistent problem and it is associated with the quality of prenatal, childbirth care and social-environmental factors. In a proper health system, the effect of some of these factors could be minimized by the appropriate number of newborn intensive care units, number of health care units, number of neonatal incubators and even by the correct level of instruction of mothers, which can lead to a proper care along the prenatal period. With the intent of providing knowledge resources for planning public health policies focused on neonatal mortality reduction, we propose a new data-driven machine leaning method for Neonatal Mortality Rate forecasting called NeMoR, which predicts neonatal mortality rates for 4 months ahead, using NeoDeathForecast, a monthly base time series dataset composed by these factors and by neonatal mortality rates history (2006-2016), having 57,816 samples, for all 438 Brazilian administrative health regions. In order to build the model, Extra-Tree, XGBoost Regressor, Gradient Boosting Regressor and Lasso machine learning regression models were evaluated and a hyperparameters search was also performed as a fine tune step. The method has been validated using São Paulo city data, mainly because of data quality. On the better configuration the method predicted the neonatal mortality rates with a Mean Square Error lower than 0.18. Besides that, the forecast results may be useful as it provides a way for policy makers to anticipate trends on neonatal mortality rates curves, an important resource for planning public health policies.Graphical AbstractHighlightsProposition of a new data-driven approach for neonatal mortality rate forecast, which provides a way for policy-makers to anticipate trends on neonatal mortality rates curves, making a better planning of health policies focused on NMR reduction possible;a method for NMR forecasting with a MSE lower than 0.18;an extensive evaluation of different Machine Learning (ML) regression models, as well as hyperparameters search, which accounts for the last stage in NeMoR;a new time series database for NMR prediction problems;a new features projection space for NMR forecasting problems, which considerably reduces errors in NRM prediction.


2016 ◽  
Author(s):  
Geoffrey Fouad ◽  
André Skupin ◽  
Christina L. Tague

Abstract. Percentile flows are statistics derived from the flow duration curve (FDC) that describe the flow equaled or exceeded for a given percent of time. These statistics provide important information for managing rivers, but are often unavailable since most basins are ungauged. A common approach for predicting percentile flows is to deploy regional regression models based on gauged percentile flows and related independent variables derived from physical and climatic data. The first step of this process identifies groups of basins through a cluster analysis of the independent variables, followed by the development of a regression model for each group. This entire process hinges on the independent variables selected to summarize the physical and climatic state of basins. Distributed physical and climatic datasets now exist for the contiguous United States (US). However, it remains unclear how to best represent these data for the development of regional regression models. The study presented here developed regional regression models for the contiguous US, and evaluated the effect of different approaches for selecting the initial set of independent variables on the predictive performance of the regional regression models. An expert assessment of the dominant controls on the FDC was used to identify a small set of independent variables likely related to percentile flows. A data-driven approach was also applied to evaluate two larger sets of variables that consist of either (1) the averages of data for each basin or (2) both the averages and statistical distribution of basin data distributed in space and time. The small set of variables from the expert assessment of the FDC and two larger sets of variables for the data-driven approach were each applied for a regional regression procedure. Differences in predictive performance were evaluated using 184 validation basins withheld from regression model development. The small set of independent variables selected through expert assessment produced similar, if not better, performance than the two larger sets of variables. A parsimonious set of variables only consisted of mean annual precipitation, potential evapotranspiration, and baseflow index. Additional variables in the two larger sets of variables added little to no predictive information. Regional regression models based on the parsimonious set of variables were developed using 734 calibration basins, and were converted into a tool for predicting 13 percentile flows in the contiguous US. Supplementary Material for this paper includes an R graphical user interface for predicting the percentile flows of basins within the range of conditions used to calibrate the regression models. The equations and performance statistics of the models are also supplied in tabular form.


2021 ◽  
Vol 11 (18) ◽  
pp. 8482
Author(s):  
Jie Li ◽  
Yuejin Tan ◽  
Bingfeng Ge ◽  
Hua Zhao ◽  
Xin Lu

This paper proposes a method on predicting the remaining useful life (RUL) of a concrete piston of a concrete pump truck based on probability statistics and data-driven approaches. Firstly, the average useful life of the concrete piston is determined by probability distribution fitting using actual life data. Secondly, according to condition monitoring data of the concrete pump truck, a concept of life coefficient of the concrete piston is proposed to represent the influence of the loading condition on the actual useful life of individual concrete pistons, and different regression models are established to predict the RUL of the concrete pistons. Finally, according to the prediction result of the concrete piston at different life stages, a replacement warning point is established to provide support for the inventory management and replacement plan of the concrete piston.


2021 ◽  
Vol 111 ◽  
pp. 611-615
Author(s):  
Yuehao Bai ◽  
Hung Ho ◽  
Guillaume A. Pouliot ◽  
Joshua Shea

We provide large-sample distribution theory for support vector regression (SVR) with l1-norm along with error bars for the SVR regression coefficients. Although a classical Wald confidence interval obtains from our theory, its implementation inherently depends on the choice of a tuning parameter that scales the variance estimate and thus the width of the error bars. We address this shortcoming by further proposing an alternative large-sample inference method based on the inversion of a novel test statistic that displays competitive power properties and does not depend on the choice of a tuning parameter.


2019 ◽  
Vol 11 (24) ◽  
pp. 6976
Author(s):  
Suwon Song ◽  
Chun Gun Park

Change-point regression models are often used to develop building energy baselines that can be used to predict energy use and determine energy savings during a given performance period. However, the reliability of building energy baselines can depend on how well the change-point model fits the data measured during the baseline period. This research proposes the use of segmented linear regression models with one or two change points for automatically driving best-fit building energy baseline models, along with an algorithm using a data-driven grid search to find the optimal change point(s) within a given data boundary for the proposed models. The algorithm was programmed and tested with actual measured data (e.g., daily gas and electricity use) for case-study buildings. Graphical and statistical analysis was also performed to validate its reliability within acceptable deviations of an overall coefficient of variation of the root mean squared error (i.e., CV(RMSE)) of 1%, as compared to the results derived from the ASHRAE Inverse Model Toolkit (IMT) that was developed as a public domain program to manually derive the change-point model with user specified parameters. Consequently, it is expected that the algorithm can be applied for automatically deriving best-fit building energy baseline models with optimal change point(s) from measured data.


2005 ◽  
Vol 33 (2) ◽  
pp. 840-870 ◽  
Author(s):  
Emmanuel Guerre ◽  
Pascal Lavergne

Sign in / Sign up

Export Citation Format

Share Document