An alternative approach to conventional soil–site regression modeling

1989 ◽  
Vol 19 (2) ◽  
pp. 179-184 ◽  
Author(s):  
David L. Verbyla ◽  
Richard F. Fisher

The conventional approach in site-quality studies has been to develop a multiple regression site index model with soil–site measurements from randomly selected plots. This approach has several weaknessess: (i) a potential prediction bias associated with most stepwise regression procedures; (ii) low precision of soil–site regression models developed in areas with diverse topography and geologic formations; and (iii) poor representation of rare prime sites by random sampling. An alternative approach, aimed at minimizing these problems, is presented. Prediction bias potential (due to overfitting a model with too many predictor variables) can be reduced by using cross validation during model development. Models that accurately predict prime sites can be more useful than imprecise soil–site regression models. This can be accomplished by stratified random sampling from prime and nonprime site areas. Classification-tree analysis was used to develop a model that predicts prime ponderosa pine (Pinusponderosa Laws.) sites on the basis of vegetation and soil variables. Forest habitat type, percent sand content, and soil pH were model predictor variables. Cross-validation was used to estimate the accuracy of the classification tree as 88%. A multiple regression model developed from randomly selected plots consistently underestimated site index when it was applied to plots randomly selected from prime site areas. The conventional regression model was also misleading because it contained a predictor variable that was not significantly different between prime and nonprime sites.

2021 ◽  
Author(s):  
AGMAS SISAY ABERA ◽  
HUNACHEW KIBRET YOHANNIS

Abstract Background: Under-five mortality rate, often known by its acronym U5MR, indicates the probability of dying between birth and five years of age, expressed per 1,000 live births. Globally, 16,000 children under-five still die every day. Especially in Sub-Saharan Africa every 1 child in 12, dying before his or her fifth birthday. This study aims to identify the determinants of under-five mortality among women in child bearing age group of Tach-Armachiho district using count regression models. Methods: For achieving the objective, a two stage random sampling technique (simple random sampling and systematic random sampling techniques in the first and second stages respectively) was used to select women respondents. The sample survey conducted in Tach-Armachiho district considered a total of 3815 households of women aged 15 to 49 years out of which the information was collected from 446 selected women through interviewer administrated questionnaire. Results: The descriptive statistics result showed that in the district 16.6% of mothers have faced the problem of at least one under-five death. In this study, Poisson regression, negative binomial, zero-inflated Poisson and zero-inflated negative binomial regression models were applied for data analysis. Each of these count models were compared by different statistical tests. So that, zero-inflated poisson regression model was found to be the best fit for the collected data. Results of the zero-inflated Poisson regression model showed that education of husband, source of water, mother occupation, kebele of mother, prenatal care, place of delivery, place of residence, wealth of house hold, average birth interval and average breast feeding were found to be statistically significant determinants of under-five mortality. Conclusions: In this study, it was found that the factors like average birth interval and average breast feeding were found to be statistically significant factors in both groups (not always zero category and always zero category) with under-five child death whereas education of husband, source of water, place of delivery, mother occupation and wealth index of the household have significant effect on under-five mortality under not always zero group. Place of residence, kebele of mother and prenatal care have a significant effect on under-five mortality in Tach-Armachiho district on inflated group.


Author(s):  
Dhamodharavadhani S. ◽  
Rathipriya R.

Regression model (RM) is an important tool for modeling and analyzing data. It is one of the popular predictive modeling techniques which explore the relationship between a dependent (target) and independent (predictor) variables. The variable selection method is used to form a good and effective regression model. Many variable selection methods existing for regression model such as filter method, wrapper method, embedded methods, forward selection method, Backward Elimination methods, stepwise methods, and so on. In this chapter, computational intelligence-based variable selection method is discussed with respect to the regression model in cybersecurity. Generally, these regression models depend on the set of (predictor) variables. Therefore, variable selection methods are used to select the best subset of predictors from the entire set of variables. Genetic algorithm-based quick-reduct method is proposed to extract optimal predictor subset from the given data to form an optimal regression model.


1973 ◽  
Vol 33 (3) ◽  
pp. 917-918 ◽  
Author(s):  
Leroy A. Stone ◽  
James D. Brosseau

An already developed multiple-regression model for predicting success of Medex trainees in their training program was cross-validated using a new group of Medex trainees. Six psychological test predictor variables (2 on the MMPI and 4 on the Strong) “held up” upon cross-validation. The results lent credence to the use of multidimensional judgment scaling for establishment of a personnel evaluation-grading criterion measure.


Author(s):  
Dhamodharavadhani S. ◽  
Rathipriya R.

Regression model (RM) is an important tool for modeling and analyzing data. It is one of the popular predictive modeling techniques which explore the relationship between a dependent (target) and independent (predictor) variables. The variable selection method is used to form a good and effective regression model. Many variable selection methods existing for regression model such as filter method, wrapper method, embedded methods, forward selection method, Backward Elimination methods, stepwise methods, and so on. In this chapter, computational intelligence-based variable selection method is discussed with respect to the regression model in cybersecurity. Generally, these regression models depend on the set of (predictor) variables. Therefore, variable selection methods are used to select the best subset of predictors from the entire set of variables. Genetic algorithm-based quick-reduct method is proposed to extract optimal predictor subset from the given data to form an optimal regression model.


Author(s):  
Carlos Alberto Huaira Contreras ◽  
Carlos Cristiano Hasenclever Borges ◽  
Camila Borelli Zeller ◽  
Amanda Romanelli

The paper proposes a weighted cross-validation (WCV) algorithm  to select a linear regression model with change-point under a scale mixtures of normal (SMN) distribution that yields the best prediction results. SMN distributions are used to construct robust regression models to the influence of outliers on the parameter estimation process. Thus, we relaxed the usual assumption of normality of the regression models and considered that the random errors follow a SMN distribution, specifically the Student-t distribution. In addition, we consider the fact that the parameters of the regression model can change from a specific and unknown point, called change-point. In this context, the estimations of the model parameters, which include the change-point, are obtained via the EM-type algorithm (Expectation-Maximization). The WCV method is used in the selection of the model that presents greater robustness and that offers a smaller prediction error, considering that the weighting values come from step E of the EM-type algorithm. Finally, numerical examples considering simulated and real data (data from television audiences) are presented to illustrate the proposed methodology.


1987 ◽  
Vol 17 (9) ◽  
pp. 1150-1152 ◽  
Author(s):  
David L. Verbyla

Classification trees are discriminant models structured as dichtomous keys. A simple classification tree is presented and contrasted with a linear discriminant function. Classification trees have several advantages when compared with linear discriminant analysis. The method is robust with respect to outlier cases. It is nonparametric and can use nominal, ordinal, interval, and ratio scaled predictor variables. Cross-validation is used during tree development to prevent overrating the tree with too many predictor variables. Missing values are handled by using surrogate splits based on nonmissing predictor variables. Classification trees, like linear discriminant analysis, have potential prediction bias and therefore should be validated before being accepted.


2003 ◽  
Vol 33 (6) ◽  
pp. 976-987 ◽  
Author(s):  
Antal Kozak ◽  
Robert Kozak

A detailed study using seven data sets, two standing tree volume estimating models, and a height–diameter model showed that fit statistics and lack of fit statistics calculated directly from a regression model can be well estimated using simulations of cross validation or double cross validation. These results suggest that cross validation by data splitting and double cross validation provide little, if any, additional information in the process of evaluating regression models.


2016 ◽  
Vol 7 (2) ◽  
pp. 75-80
Author(s):  
Adhi Kusnadi ◽  
Risyad Ananda Putra

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Johannes Schumacher ◽  
Marius Hauglin ◽  
Rasmus Astrup ◽  
Johannes Breidenbach

Abstract Background The age of forest stands is critical information for forest management and conservation, for example for growth modelling, timing of management activities and harvesting, or decisions about protection areas. However, area-wide information about forest stand age often does not exist. In this study, we developed regression models for large-scale area-wide prediction of age in Norwegian forests. For model development we used more than 4800 plots of the Norwegian National Forest Inventory (NFI) distributed over Norway between latitudes 58° and 65° N in an 18.2 Mha study area. Predictor variables were based on airborne laser scanning (ALS), Sentinel-2, and existing public map data. We performed model validation on an independent data set consisting of 63 spruce stands with known age. Results The best modelling strategy was to fit independent linear regression models to each observed site index (SI) level and using a SI prediction map in the application of the models. The most important predictor variable was an upper percentile of the ALS heights, and root mean squared errors (RMSEs) ranged between 3 and 31 years (6% to 26%) for SI-specific models, and 21 years (25%) on average. Mean deviance (MD) ranged between − 1 and 3 years. The models improved with increasing SI and the RMSEs were largest for low SI stands older than 100 years. Using a mapped SI, which is required for practical applications, RMSE and MD on plot level ranged from 19 to 56 years (29% to 53%), and 5 to 37 years (5% to 31%), respectively. For the validation stands, the RMSE and MD were 12 (22%) and 2 years (3%), respectively. Conclusions Tree height estimated from airborne laser scanning and predicted site index were the most important variables in the models describing age. Overall, we obtained good results, especially for stands with high SI. The models could be considered for practical applications, although we see considerable potential for improvements if better SI maps were available.


2021 ◽  
Vol 11 (4) ◽  
pp. 1776
Author(s):  
Young Seo Kim ◽  
Han Young Joo ◽  
Jae Wook Kim ◽  
So Yun Jeong ◽  
Joo Hyun Moon

This study identified the meteorological variables that significantly impact the power generation of a solar power plant in Samcheonpo, Korea. To this end, multiple regression models were developed to estimate the power generation of the solar power plant with changing weather conditions. The meteorological data for the regression models were the daily data from January 2011 to December 2019. The dependent variable was the daily power generation of the solar power plant in kWh, and the independent variables were the insolation intensity during daylight hours (MJ/m2), daylight time (h), average relative humidity (%), minimum relative humidity (%), and quantity of evaporation (mm). A regression model for the entire data and 12 monthly regression models for the monthly data were constructed using R, a large data analysis software. The 12 monthly regression models estimated the solar power generation better than the entire regression model. The variables with the highest influence on solar power generation were the insolation intensity variables during daylight hours and daylight time.


Sign in / Sign up

Export Citation Format

Share Document