A DATA-DRIVEN NONPARAMETRIC SPECIFICATION TEST FOR DYNAMIC REGRESSION MODELS

2006 ◽  
Vol 22 (04) ◽  
Author(s):  
Alain Guay ◽  
Emmanuel Guerre
2021 ◽  
Vol 13 (7) ◽  
pp. 168781402110277
Author(s):  
Yankai Hou ◽  
Zhaosheng Zhang ◽  
Peng Liu ◽  
Chunbao Song ◽  
Zhenpo Wang

Accurate estimation of the degree of battery aging is essential to ensure safe operation of electric vehicles. In this paper, using real-world vehicles and their operational data, a battery aging estimation method is proposed based on a dual-polarization equivalent circuit (DPEC) model and multiple data-driven models. The DPEC model and the forgetting factor recursive least-squares method are used to determine the battery system’s ohmic internal resistance, with outliers being filtered using boxplots. Furthermore, eight common data-driven models are used to describe the relationship between battery degradation and the factors influencing this degradation, and these models are analyzed and compared in terms of both estimation accuracy and computational requirements. The results show that the gradient descent tree regression, XGBoost regression, and light GBM regression models are more accurate than the other methods, with root mean square errors of less than 6.9 mΩ. The AdaBoost and random forest regression models are regarded as alternative groups because of their relative instability. The linear regression, support vector machine regression, and k-nearest neighbor regression models are not recommended because of poor accuracy or excessively high computational requirements. This work can serve as a reference for subsequent battery degradation studies based on real-time operational data.


2021 ◽  
Author(s):  
Carlos Eduardo Beluzo ◽  
Luciana Correia Alves ◽  
Natália Martins Arruda ◽  
Cátia Sepetauskas ◽  
Everton Silva ◽  
...  

ABSTRACTReduction in child mortality is one of the United Nations Sustainable Development Goals for 2030. In Brazil, despite recent reduction in child mortality in the last decades, the neonatal mortality is a persistent problem and it is associated with the quality of prenatal, childbirth care and social-environmental factors. In a proper health system, the effect of some of these factors could be minimized by the appropriate number of newborn intensive care units, number of health care units, number of neonatal incubators and even by the correct level of instruction of mothers, which can lead to a proper care along the prenatal period. With the intent of providing knowledge resources for planning public health policies focused on neonatal mortality reduction, we propose a new data-driven machine leaning method for Neonatal Mortality Rate forecasting called NeMoR, which predicts neonatal mortality rates for 4 months ahead, using NeoDeathForecast, a monthly base time series dataset composed by these factors and by neonatal mortality rates history (2006-2016), having 57,816 samples, for all 438 Brazilian administrative health regions. In order to build the model, Extra-Tree, XGBoost Regressor, Gradient Boosting Regressor and Lasso machine learning regression models were evaluated and a hyperparameters search was also performed as a fine tune step. The method has been validated using São Paulo city data, mainly because of data quality. On the better configuration the method predicted the neonatal mortality rates with a Mean Square Error lower than 0.18. Besides that, the forecast results may be useful as it provides a way for policy makers to anticipate trends on neonatal mortality rates curves, an important resource for planning public health policies.Graphical AbstractHighlightsProposition of a new data-driven approach for neonatal mortality rate forecast, which provides a way for policy-makers to anticipate trends on neonatal mortality rates curves, making a better planning of health policies focused on NMR reduction possible;a method for NMR forecasting with a MSE lower than 0.18;an extensive evaluation of different Machine Learning (ML) regression models, as well as hyperparameters search, which accounts for the last stage in NeMoR;a new time series database for NMR prediction problems;a new features projection space for NMR forecasting problems, which considerably reduces errors in NRM prediction.


2016 ◽  
Author(s):  
Geoffrey Fouad ◽  
André Skupin ◽  
Christina L. Tague

Abstract. Percentile flows are statistics derived from the flow duration curve (FDC) that describe the flow equaled or exceeded for a given percent of time. These statistics provide important information for managing rivers, but are often unavailable since most basins are ungauged. A common approach for predicting percentile flows is to deploy regional regression models based on gauged percentile flows and related independent variables derived from physical and climatic data. The first step of this process identifies groups of basins through a cluster analysis of the independent variables, followed by the development of a regression model for each group. This entire process hinges on the independent variables selected to summarize the physical and climatic state of basins. Distributed physical and climatic datasets now exist for the contiguous United States (US). However, it remains unclear how to best represent these data for the development of regional regression models. The study presented here developed regional regression models for the contiguous US, and evaluated the effect of different approaches for selecting the initial set of independent variables on the predictive performance of the regional regression models. An expert assessment of the dominant controls on the FDC was used to identify a small set of independent variables likely related to percentile flows. A data-driven approach was also applied to evaluate two larger sets of variables that consist of either (1) the averages of data for each basin or (2) both the averages and statistical distribution of basin data distributed in space and time. The small set of variables from the expert assessment of the FDC and two larger sets of variables for the data-driven approach were each applied for a regional regression procedure. Differences in predictive performance were evaluated using 184 validation basins withheld from regression model development. The small set of independent variables selected through expert assessment produced similar, if not better, performance than the two larger sets of variables. A parsimonious set of variables only consisted of mean annual precipitation, potential evapotranspiration, and baseflow index. Additional variables in the two larger sets of variables added little to no predictive information. Regional regression models based on the parsimonious set of variables were developed using 734 calibration basins, and were converted into a tool for predicting 13 percentile flows in the contiguous US. Supplementary Material for this paper includes an R graphical user interface for predicting the percentile flows of basins within the range of conditions used to calibrate the regression models. The equations and performance statistics of the models are also supplied in tabular form.


2021 ◽  
Vol 11 (18) ◽  
pp. 8482
Author(s):  
Jie Li ◽  
Yuejin Tan ◽  
Bingfeng Ge ◽  
Hua Zhao ◽  
Xin Lu

This paper proposes a method on predicting the remaining useful life (RUL) of a concrete piston of a concrete pump truck based on probability statistics and data-driven approaches. Firstly, the average useful life of the concrete piston is determined by probability distribution fitting using actual life data. Secondly, according to condition monitoring data of the concrete pump truck, a concept of life coefficient of the concrete piston is proposed to represent the influence of the loading condition on the actual useful life of individual concrete pistons, and different regression models are established to predict the RUL of the concrete pistons. Finally, according to the prediction result of the concrete piston at different life stages, a replacement warning point is established to provide support for the inventory management and replacement plan of the concrete piston.


2019 ◽  
Vol 11 (24) ◽  
pp. 6976
Author(s):  
Suwon Song ◽  
Chun Gun Park

Change-point regression models are often used to develop building energy baselines that can be used to predict energy use and determine energy savings during a given performance period. However, the reliability of building energy baselines can depend on how well the change-point model fits the data measured during the baseline period. This research proposes the use of segmented linear regression models with one or two change points for automatically driving best-fit building energy baseline models, along with an algorithm using a data-driven grid search to find the optimal change point(s) within a given data boundary for the proposed models. The algorithm was programmed and tested with actual measured data (e.g., daily gas and electricity use) for case-study buildings. Graphical and statistical analysis was also performed to validate its reliability within acceptable deviations of an overall coefficient of variation of the root mean squared error (i.e., CV(RMSE)) of 1%, as compared to the results derived from the ASHRAE Inverse Model Toolkit (IMT) that was developed as a public domain program to manually derive the change-point model with user specified parameters. Consequently, it is expected that the algorithm can be applied for automatically deriving best-fit building energy baseline models with optimal change point(s) from measured data.


2005 ◽  
Vol 33 (2) ◽  
pp. 840-870 ◽  
Author(s):  
Emmanuel Guerre ◽  
Pascal Lavergne

2021 ◽  
Vol 3 ◽  
Author(s):  
Hans-Jörg Schmid ◽  
Quirin Würschinger ◽  
Sebastian Fischer ◽  
Helmut Küchenhoff

The present study deals with variation in the use of lexico-grammatical patterns and emphasizes the need to embrace individual variation. Targeting the pattern that’s adj (as in that’s right, that’s nice or that’s okay) as a case study, we use a tailor-made Python script to systematically retrieve grammatical and semantic information about all instances of this construction in BNC2014 as well as sociolinguistic information enabling us to study social and individual lexico-grammatical variation among speakers who have used this pattern. The dataset amounts to 4,394 tokens produced by 445 speakers using 159 adjective types in 931 conversations. Using detailed descriptive statistics and mixed-effects regression models, we show that while the choice of some adjectives is partly determined by social variables, situational and especially individual variation is rampant overall. Adopting a cognitive-linguistic perspective and relying on the notion of entrenchment, we interpret these findings as reflecting individual speakers' routines. We argue that computational sociolinguistics is in an ideal position to contribute to the data-driven investigation of individual lexico-grammatical variation and encourage computational sociolinguists to grab this opportunity. For the routines of individual speakers ultimately both underlie and compromise systematic social variation and trigger and steer well-known types of language change including grammaticalization, pragmaticalization and change by invited inference.


2021 ◽  
Author(s):  
Hyemin Han

Research has examined the association between people’s compliance with measures to prevent the spread of COVID-19 and personality traits. However, previous studies were conducted with relatively small-size datasets and employed frequentist analysis that does not allow data-driven model exploration. To address the limitations, a large-scale international dataset, COVIDiSTRESS Global Survey dataset, was explored with Bayesian generalized linear model that enables identification of the best regression model. The best regression models predicting participants’ compliance with Big Five traits were explored. The findings demonstrated first, all Big Five traits, except extroversion, were positively associated with compliance with general measures and distancing. Second, neuroticism, extroversion, and agreeableness were positively associated with the perceived cost of complying with the measures while conscientiousness showed negative association. The findings and the implications of the present study were discussed.


Sign in / Sign up

Export Citation Format

Share Document