out of sample
Recently Published Documents


TOTAL DOCUMENTS

1576
(FIVE YEARS 724)

H-INDEX

59
(FIVE YEARS 12)

2022 ◽  
Author(s):  
Cameron I. Cooper

Abstract Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identifies \(\approx\)99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in the DFW rate. More importantly, this study identified academic, early alert triggers for CS101. Specifically, the first two graded programs are of paramount importance for student success in the course.


2022 ◽  
Author(s):  
Pawan Kumar Singh ◽  
Alok Kumar Pandey ◽  
Anushka Chouhan

Abstract The increase in surface temperature and CO2 emissions are two of the most important issues in climate studies and global warming. The ‘Global Emissions 2021’ report identifies the six biggest contributors to CO2­ emissions; China, USA, India, Russia, Japan, and Germany. The current study projects the increase in surface temperature and the CO­2 emissions of these six countries by 2028. The EGM (1,1,α,θ) grey model is an even form of the model with a first order differential equation, that has one variable and a weightage background value that contains conformable fractional accumulation. The results show that while the CO2 emissions for Japan, Germany, USA and Russia show a downward projection, they are expected to increase in India and remain nearly constant in China by 2028. The surface temperature has been projected to increase at a significant rate in all these countries. By comparing with the EGM (1,1) grey model, the results show that the EGM (1,1, α, θ) model performs better in both in-sample and out-of-sample forecasting. The paper also puts forward some policy suggestions to mitigate, manage and reduce increases in surface temperature as well as CO2 emissions.


Mathematics ◽  
2022 ◽  
Vol 10 (2) ◽  
pp. 228
Author(s):  
Pablo Pincheira ◽  
Nicolas Hardy ◽  
Andrea Bentancor

We show that a straightforward modification of a trading-based test for predictability displays interesting advantages over the Excess Profitability (EP) test proposed by Anatolyev and Gerco when testing the Driftless Random Walk Hypothesis. Our statistic is called the Straightforward Excess Profitability (SEP) test, and it avoids the calculation of a term that under the null of no predictability should be zero but in practice may be sizable. In addition, our test does not require the strong assumption of independence used to derive the EP test. We claim that dependence is the rule and not the exception. We show via Monte Carlo simulations that the SEP test outperforms the EP test in terms of size and power. Finally, we illustrate the use of our test in an empirical application within the context of the commodity-currencies literature.


2022 ◽  
Vol 14 (2) ◽  
pp. 798
Author(s):  
Snezhana Gocheva-Ilieva ◽  
Atanas Ivanov ◽  
Maya Stoimenova-Minova

A novel framework for stacked regression based on machine learning was developed to predict the daily average concentrations of particulate matter (PM10), one of Bulgaria’s primary health concerns. The measurements of nine meteorological parameters were introduced as independent variables. The goal was to carefully study a limited number of initial predictors and extract stochastic information from them to build an extended set of data that allowed the creation of highly efficient predictive models. Four base models using random forest, CART ensemble and bagging, and their rotation variants, were built and evaluated. The heterogeneity of these base models was achieved by introducing five types of diversities, including a new simplified selective ensemble algorithm. The predictions from the four base models were then used as predictors in multivariate adaptive regression splines (MARS) models. All models were statistically tested using out-of-bag or with 5-fold and 10-fold cross-validation. In addition, a variable importance analysis was conducted. The proposed framework was used for short-term forecasting of out-of-sample data for seven days. It was shown that the stacked models outperformed all single base models. An index of agreement IA = 0.986 and a coefficient of determination of about 95% were achieved.


Complexity ◽  
2022 ◽  
Vol 2022 ◽  
pp. 1-10
Author(s):  
Sara Muhammadullah ◽  
Amena Urooj ◽  
Faridoon Khan ◽  
Mohammed N Alshahrani ◽  
Mohammed Alqawba ◽  
...  

In order to reduce the dimensionality of parameter space and enhance out-of-sample forecasting performance, this research compares regularization techniques with Autometrics in time-series modeling. We mainly focus on comparing weighted lag adaptive LASSO (WLAdaLASSO) with Autometrics, but as a benchmark, we estimate other popular regularization methods LASSO, AdaLASSO, SCAD, and MCP. For analytical comparison, we implement Monte Carlo simulation and assess the performance of these techniques in terms of out-of-sample Root Mean Square Error, Gauge, and Potency. The comparison is assessed with varying autocorrelation coefficients and sample sizes. The simulation experiment indicates that, compared to Autometrics and other regularization approaches, the WLAdaLASSO outperforms the others in covariate selection and forecasting, especially when there is a greater linear dependency between predictors. In contrast, the computational efficiency of Autometrics decreases with a strong linear dependency between predictors. However, under the large sample and weak linear dependency between predictors, the Autometrics potency ⟶ 1 and gauge ⟶ α. In contrast, LASSO, AdaLASSO, SCAD, and MCP select more covariates and possess higher RMSE than Autometrics and WLAdaLASSO. To compare the considered techniques, we made the Generalized Unidentified Model for covariate selection and out-of-sample forecasting for the trade balance of Pakistan. We train the model on 1985–2015 observations and 2016–2020 observations as test data for the out-of-sample forecast.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Anja Vinzelberg ◽  
Benjamin Rainer Auer

PurposeMotivated by the recent theoretical rehabilitation of mean-variance analysis, the authors revisit the question of whether minimum variance (MinVar) or maximum Sharpe ratio (MaxSR) investment weights are preferable in practical portfolio formation.Design/methodology/approachThe authors answer this question with a focus on mainstream investors which can be modeled by a preference for simple portfolio optimization techniques, a tendency to cling to past asset characteristics and a strong interest in index products. Specifically, in a rolling-window approach, the study compares the out-of-sample performance of MinVar and MaxSR portfolios in two asset universes covering multiple asset classes (via investable indices and their subindices) and for two popular input estimation methods (full covariance and single-index model).FindingsThe authors find that, regardless of the setting, there is no statistically significant difference between MinVar and MaxSR portfolio performance. Thus, the choice of approach does not matter for mainstream investors. In addition, the analysis reveals that, contrary to previous research, using a single-index model does not necessarily improve out-of-sample Sharpe ratios.Originality/valueThe study is the first to provide an in-depth comparison of MinVar and MaxSR returns which considers (1) multiple asset classes, (2) a single-index model and (3) state-of-the-art bootstrap performance tests.


2022 ◽  
Author(s):  
Loic Yengo ◽  
Sailaja Vedantam ◽  
Eirini Marouli ◽  
Julia Sidorenko ◽  
Eric Bartell ◽  
...  

Common SNPs are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes. Here we show, using GWAS data from 5.4 million individuals of diverse ancestries, that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a median size of ~90 kb, covering ~21% of the genome. The density of independent associations varies across the genome and the regions of elevated density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs account for 40% of phenotypic variance in European ancestry populations but only ~10%-20% in other ancestries. Effect sizes, associated regions, and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely explained by linkage disequilibrium and allele frequency differences within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than needed to implicate causal genes and variants. Overall, this study, the largest GWAS to date, provides an unprecedented saturated map of specific genomic regions containing the vast majority of common height-associated variants.


Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 95
Author(s):  
Pontus Söderbäck ◽  
Jörgen Blomvall ◽  
Martin Singull

Liquid financial markets, such as the options market of the S&P 500 index, create vast amounts of data every day, i.e., so-called intraday data. However, this highly granular data is often reduced to single-time when used to estimate financial quantities. This under-utilization of the data may reduce the quality of the estimates. In this paper, we study the impacts on estimation quality when using intraday data to estimate dividends. The methodology is based on earlier linear regression (ordinary least squares) estimates, which have been adapted to intraday data. Further, the method is also generalized in two aspects. First, the dividends are expressed as present values of future dividends rather than dividend yields. Second, to account for heteroscedasticity, the estimation methodology was formulated as a weighted least squares, where the weights are determined from the market data. This method is compared with a traditional method on out-of-sample S&P 500 European options market data. The results show that estimations based on intraday data have, with statistical significance, a higher quality than the corresponding single-times estimates. Additionally, the two generalizations of the methodology are shown to improve the estimation quality further.


2022 ◽  
Vol 158 (1) ◽  
Author(s):  
Jakob A. Dambon ◽  
Stefan S. Fahrländer ◽  
Saira Karlen ◽  
Manuel Lehner ◽  
Jaron Schlesinger ◽  
...  

AbstractThis article examines the spatially varying effect of age on single-family house (SFH) prices. Age has been shown to be a key driver for house depreciation and is usually associated with a negative price effect. In practice, however, there exist deviations from this behavior which are referred to as vintage effects. We estimate a spatially varying coefficients (SVC) model to investigate the spatial structures of vintage effects on SFH pricing. For SFHs in the Canton of Zurich, Switzerland, we find substantial spatial variation in the age effect. In particular, we find a local, strong vintage effect primarily in urban areas compared to pure depreciative age effects in rural locations. Using cross validation, we assess the potential improvement in predictive performance by incorporating spatially varying vintage effects in hedonic models. We find a substantial improvement in out-of-sample predictive performance of SVC models over classical spatial hedonic models.


2022 ◽  
Vol 15 (1) ◽  
pp. 45-73
Author(s):  
Andrew Zammit-Mangion ◽  
Michael Bertolacci ◽  
Jenny Fisher ◽  
Ann Stavert ◽  
Matthew Rigby ◽  
...  

Abstract. WOMBAT (the WOllongong Methodology for Bayesian Assimilation of Trace-gases) is a fully Bayesian hierarchical statistical framework for flux inversion of trace gases from flask, in situ, and remotely sensed data. WOMBAT extends the conventional Bayesian synthesis framework through the consideration of a correlated error term, the capacity for online bias correction, and the provision of uncertainty quantification on all unknowns that appear in the Bayesian statistical model. We show, in an observing system simulation experiment (OSSE), that these extensions are crucial when the data are indeed biased and have errors that are spatio-temporally correlated. Using the GEOS-Chem atmospheric transport model, we show that WOMBAT is able to obtain posterior means and variances on non-fossil-fuel CO2 fluxes from Orbiting Carbon Observatory-2 (OCO-2) data that are comparable to those from the Model Intercomparison Project (MIP) reported in Crowell et al. (2019). We also find that WOMBAT's predictions of out-of-sample retrievals obtained from the Total Column Carbon Observing Network (TCCON) are, for the most part, more accurate than those made by the MIP participants.


Sign in / Sign up

Export Citation Format

Share Document