regression coefficients
Recently Published Documents


TOTAL DOCUMENTS

1217
(FIVE YEARS 278)

H-INDEX

59
(FIVE YEARS 4)

2022 ◽  
Vol 15 (1) ◽  
pp. 32
Author(s):  
Hrishikesh D. Vinod

Quantitative researchers often use Student’s t-test (and its p-values) to claim that a particular regressor is important (statistically significantly) for explaining the variation in a response variable. A study is subject to the p-hacking problem when its author relies too much on formal statistical significance while ignoring the size of what is at stake. We suggest reporting estimates using nonlinear kernel regressions and the standardization of all variables to avoid p-hacking. We are filling an essential gap in the literature because p-hacking-related papers do not even mention kernel regressions or standardization. Although our methods have general applicability in all sciences, our illustrations refer to risk management for a cross-section of firms and financial management in macroeconomic time series. We estimate nonlinear, nonparametric kernel regressions for both examples to illustrate the computation of scale-free generalized partial correlation coefficients (GPCCs). We suggest supplementing the usual p-values by “practical significance” revealed by scale-free GPCCs. We show that GPCCs also yield new pseudo regression coefficients to measure each regressor’s relative (nonlinear) contribution in a kernel regression.


Risks ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 19
Author(s):  
Albert Pitarque ◽  
Montserrat Guillen

Quantile regression provides a way to estimate a driver’s risk of a traffic accident by means of predicting the percentile of observed distance driven above the legal speed limits over a one year time interval, conditional on some given characteristics such as total distance driven, age, gender, percent of urban zone driving and night time driving. This study proposes an approximation of quantile regression coefficients by interpolating only a few quantile levels, which can be chosen carefully from the unconditional empirical distribution function of the response. Choosing the levels before interpolation improves accuracy. This approximation method is convenient for real-time implementation of risky driving identification and provides a fast approximate calculation of a risk score. We illustrate our results with data on 9614 drivers observed over one year.


2022 ◽  
Author(s):  
Wenwen Guo ◽  
Wei Zhong ◽  
Sunpeng Duan ◽  
Hengjian Cui

Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 68
Author(s):  
Chunyan Li ◽  
Kevin Mershon Boswell

Acoustic Doppler current profilers (ADCP) are quasi-remote sensing instruments widely used in oceanography to measure velocity profiles continuously. One of the applications is the quantification of land–ocean exchange, which plays a key role in the global cycling of water, heat, and materials. This exchange mostly occurs through estuaries, lagoons, and bays. Studies on the subject thus require that observations of total volume or mass transport can be achieved. Alternatively, numerical modeling is needed for the computation of transport, which, however, also requires that the model is validated properly. Since flows across an estuary, lagoon, or bay are usually non-uniform and point measurements will not be sufficient, continuous measurements across a transect are desired but cannot be performed in the long run due to budget constraints. In this paper, we use a combination of short-term transect-based measurements from a vessel-mounted ADCP and relatively long-term point measurements from a moored ADCP at the bottom to obtain regression coefficients between the transport from the vessel-based observations and the depth-averaged velocity from the bottom-based observations. The method is applied to an Arctic lagoon by using an ADCP mounted on a buoyant platform towed by a small inflatable vessel and another ADCP mounted on a bottom deployed metal frame. The vessel-based measurements were performed continuously for nearly 5 h, which was sufficient to derive a linear regression between the datasets with an R2-value of 0.89. The regression coefficients were in turn applied to the entire time for the moored instrument measurements, which are used in the interpretation of the subtidal transport variations.


Polymers ◽  
2021 ◽  
Vol 14 (1) ◽  
pp. 13
Author(s):  
Omar A. El Seoud ◽  
Marc Kostag ◽  
Shirley Possidonio ◽  
Marcella T. Dignani ◽  
Paulo A. R. Pires ◽  
...  

We studied the dependence of dissolution of silk fibroin (SF) in mixtures of DMSO with ionic liquids (ILs) on the temperature (T = 40 to 80 °C) and DMSO mole fraction (χDMSO = 0.5 to 0.9). The ILs included BuMeImAcO, C3OMeImAcO, AlBzMe2NAcO, and Bu4NAcO; see the names and structures below. We used design of experiments (DOE) to determine the dependence of mass fraction of dissolved SF (SF-m%) on T and χDMSO. We successfully employed a second-order polynomial to fit the biopolymer dissolution data. The resulting regression coefficients showed that the dissolution of SF in BuMeImAcO-DMSO and C3OMeImAcO-DMSO is more sensitive to variation of T than of χDMSO; the inverse is observed for the quaternary ammonium ILs. Using BuMeImAcO, AlBzMe2NAcO, and molecular dynamics simulations, we attribute the difference in IL efficiency to stronger SF-IL hydrogen bonding with the former IL, which is coupled with the difference in the molecular volumes and the rigidity of the phenyl ring of the latter IL. The order of SF dissolution is BuMeImAcO-DMSO > C3OMeImAcO-DMSO; this was attributed to the formation of intramolecular H-bonding between the ether oxygen in the side chain of the latter IL and the relatively acidic hydrogens of the imidazolium cation. Using DOE, we were able to predict values of SF-m%; this is satisfactory and important because it results in economy of labor, time, and material.


2021 ◽  
pp. 096228022110654
Author(s):  
Ashwini Joshi ◽  
Angelika Geroldinger ◽  
Lena Jiricka ◽  
Pralay Senchaudhuri ◽  
Christopher Corcoran ◽  
...  

Poisson regression can be challenging with sparse data, in particular with certain data constellations where maximum likelihood estimates of regression coefficients do not exist. This paper provides a comprehensive evaluation of methods that give finite regression coefficients when maximum likelihood estimates do not exist, including Firth’s general approach to bias reduction, exact conditional Poisson regression, and a Bayesian estimator using weakly informative priors that can be obtained via data augmentation. Furthermore, we include in our evaluation a new proposal for a modification of Firth’s approach, improving its performance for predictions without compromising its attractive bias-correcting properties for regression coefficients. We illustrate the issue of the nonexistence of maximum likelihood estimates with a dataset arising from the recent outbreak of COVID-19 and an example from implant dentistry. All methods are evaluated in a comprehensive simulation study under a variety of realistic scenarios, evaluating their performance for prediction and estimation. To conclude, while exact conditional Poisson regression may be confined to small data sets only, both the modification of Firth’s approach and the Bayesian estimator are universally applicable solutions with attractive properties for prediction and estimation. While the Bayesian method needs specification of prior variances for the regression coefficients, the modified Firth approach does not require any user input.


Author(s):  
J. K. Mhango ◽  
W. Hartley ◽  
W. E. Harris ◽  
J. M. Monaghan

Abstract Accurate estimation of tuber size distribution (TSD) parameters in discretely categorized potato (Solanum tuberosum L) yield samples is desirable for estimating modal tuber sizes, which is fundamental to yield prediction. In the current work, systematic yield digs were conducted on five commercial fields (N = 119) to compare the Weibull, Gamma and Gaussian distribution functions for relative-likelihood-based goodness-of-fit to the observed discrete distributions. Parameters were estimated using maximum likelihood estimation (MLE) for the three distributions but were also derived using the percentiles approach for the Weibull distribution to compare accuracy against the MLE approaches. The relationship between TSD and soil nutrient variability was examined using the best-fitting model's parameters. The percentiles approach had lower overall relative likelihood than the MLE approaches across five locations, but had consistently lower Root Mean Square Error in the marketable tuber size range. Negative relationships were observed between the percentile-based shape parameter and the concentrations of Phosphorus and Nitrogen, with significant (non-zero-overlapping 95% confidence interval) regression coefficients for P (−0.74 ± 0.33 for distribution of proportional tuber numbers and −1.3 ± 0.62 for tuber weights). Stem density was negatively associated with the scale and mode of tuber number (regression coefficients −0.98 ± 0.63 and −1.08 ± 0.78 respectively) and tuber weight (regression coefficients −0.99 ± 0.78 and −1.04 ± 0.69 respectively) distributions. Phosphorus is negatively related to the scale of the tuber-number-based distribution while positively associating with the tuber weight distribution. The results suggest that excess P application was associated with the increase in small tubers that did not contribute significant weight to the final yield.


2021 ◽  
Author(s):  
Isaac Goldstein ◽  
Damon Bayer ◽  
Ivan Barilar ◽  
Balladiah Kizito ◽  
Ogopotse Matsiri ◽  
...  

Identifying host factors that influence infectious disease transmission is an important step toward developing interventions to reduce disease incidence. Recent advances in methods for reconstructing infectious disease transmission events using pathogen genomic and epidemiological data open the door for investigation of host factors that affect onward transmission. While most transmission reconstruction methods are designed to work with densely sampled outbreaks, these methods are making their way into surveillance studies, where the fraction of sampled cases with sequenced pathogens could be relatively low. Surveillance studies that use transmission event reconstruction then use the reconstructed events as response variables (i.e., infection source status of each sampled case) and use host characteristics as predictors (e.g., presence of HIV infection) in regression models. We use simulations to study estimation of the effect of a host factor on probability of being an infection source via this multi-step inferential procedure. Using TransPhylo - a widely-used method for Bayesian estimation of infectious disease transmission events - and logistic regression, we find that low sensitivity of identifying infection sources leads to dilution of the signal, biasing logistic regression coefficients toward zero. We show that increasing the proportion of sampled cases improves sensitivity and estimation of logistic regression coefficients. Application of these approaches to real world data from a population-based TB study in Botswana fails to detect an association between HIV infection and probability of being a TB infection source. We conclude that application of a pipeline, where one first uses TransPhylo and sparsely sampled surveillance data to infer transmission events and then estimates effects of host characteristics on probabilities of these events, should be accompanied by a realistic simulation study to better understand biases stemming from imprecise transmission event inference.


2021 ◽  
Vol 9 ◽  
Author(s):  
Kei Hirose

We consider the problem of short- and medium-term electricity demand forecasting by using past demand and daily weather forecast information. Conventionally, many researchers have directly applied regression analysis. However, interpreting the effect of weather on the demand is difficult with the existing methods. In this study, we build a statistical model that resolves this interpretation issue. A varying coefficient model with basis expansion is used to capture the nonlinear structure of the weather effect. This approach results in an interpretable model when the regression coefficients are nonnegative. To estimate the nonnegative regression coefficients, we employ nonnegative least squares. Three real data analyses show the practicality of our proposed statistical modeling. Two of them demonstrate good forecast accuracy and interpretability of our proposed method. In the third example, we investigate the effect of COVID-19 on electricity demand. The interpretation would help make strategies for energy-saving interventions and demand response.


2021 ◽  
pp. 1-23
Author(s):  
Mikaeel Mokhtari ◽  
Tofigh Allahviranloo ◽  
Mohammad Hassan Behzadi ◽  
Farhad Hoseinzadeh Lotfi

The uncertainty is an important attribute about data that can arise from different sources including randomness and fuzziness, therefore in uncertain environments, especially, in modeling, planning, decision-making, and control under uncertainty, most data available contain some degree of fuzziness, randomness, or both, and at the same time, some of this data may be anomalous (outliers). In this regard, the new fuzzy regression approaches by creating a functional relationship between response and explanatory variables can provide efficient tools to explanation, prediction and possibly control of randomness, fuzziness, and outliers in the data obtained from uncertain environments. In the present study, we propose a new two-stage fuzzy linear regression model based on a new interval type-2 (IT2) fuzzy least absolute deviation (FLAD) method so that regression coefficients and dependent variables are trapezoidal IT2 fuzzy numbers and independent variables are crisp. In the first stage, to estimate the IT2 fuzzy regression coefficients and provide an initial model (by original dataset), we introduce two new distance measures for comparison of IT2 fuzzy numbers and propose a novel framework for solving fuzzy mathematical programming problems. In the second stage, we introduce a new procedure to determine the mild and extreme fuzzy outlier cutoffs and apply them to remove the outliers, and then provide the final model based on a clean dataset. Furthermore, to evaluate the performance of the proposed methodology, we introduce and employ suitable goodness of fit indices. Finally, to illustrate the theoretical results of the proposed method and explain how it can be used to derive the regression model with IT2 trapezoidal fuzzy data, as well as compare the performance of the proposed model with some well-known models using training data designed by Tanaka et al. [55], we provide two numerical examples.


Sign in / Sign up

Export Citation Format

Share Document