A Unifying Classification for Functional Regression Modeling

Author(s):  
Frédéric Ferraty ◽  
Philippe Vieu

This article presents a unifying classification for functional regression modeling, and more specifically for modeling the link between two variables X and Y, when the explanatory variable (X) is of a functional nature. It first provides a background on the proposed classification of regression models, focusing on the regression problem and defining parametric, semiparametric, and nonparametric models, and explains how semiparametric modeling can be interpreted in terms of dimension reduction. It then gives four examples of functional regression models, namely: functional linear regression model, additive functional regression model, smooth nonparametric functional model, and single functional index model. It also considers a number of new models, directly adapted to functional variables from the existing standard multivariate literature.

2021 ◽  
Author(s):  
Gholamreza Hesamian ◽  
Mohammad Ghasem Akbari

Abstract A novel functional regression model was introduced, where the predictor was a curve linked to a scalar fuzzy response variable. An absolute error-based penalized method with SCAD loss function was proposed to evaluate the unknown components of the model. For this purpose, a concept of fuzzy-valued function was developed and discussed. Then, a fuzzy large number notion was proposed to estimate the fuzzyvalued function. Some common goodness-of-fit criteria were also used to examine the performance of the proposed method. Efficiency of the proposed method was then evaluated through two numerical examples, including a simulation study and an applied example in the scope of watershed management. The proposed method was also compared with several common fuzzy regression models in cases where the functional data was converted to scalar ones.


2018 ◽  
Vol 27 (3) ◽  
pp. 455-477 ◽  
Author(s):  
Łukasz Smaga ◽  
Hidetoshi Matsui

Abstract Variable selection problem is one of the most important tasks in regression analysis, especially in a high-dimensional setting. In this paper, we study this problem in the context of scalar response functional regression model, which is a linear model with scalar response and functional regressors. The functional model can be represented by certain multiple linear regression model via basis expansions of functional variables. Based on this model and random subspace method of Mielniczuk and Teisseyre (Comput Stat Data Anal 71:725–742, 2014), two simple variable selection procedures for scalar response functional regression model are proposed. The final functional model is selected by using generalized information criteria. Monte Carlo simulation studies conducted and a real data example show very satisfactory performance of new variable selection methods under finite samples. Moreover, they suggest that considered procedures outperform solutions found in the literature in terms of correctly selected model, false discovery rate control and prediction error.


2021 ◽  
Vol 11 (4) ◽  
pp. 1776
Author(s):  
Young Seo Kim ◽  
Han Young Joo ◽  
Jae Wook Kim ◽  
So Yun Jeong ◽  
Joo Hyun Moon

This study identified the meteorological variables that significantly impact the power generation of a solar power plant in Samcheonpo, Korea. To this end, multiple regression models were developed to estimate the power generation of the solar power plant with changing weather conditions. The meteorological data for the regression models were the daily data from January 2011 to December 2019. The dependent variable was the daily power generation of the solar power plant in kWh, and the independent variables were the insolation intensity during daylight hours (MJ/m2), daylight time (h), average relative humidity (%), minimum relative humidity (%), and quantity of evaporation (mm). A regression model for the entire data and 12 monthly regression models for the monthly data were constructed using R, a large data analysis software. The 12 monthly regression models estimated the solar power generation better than the entire regression model. The variables with the highest influence on solar power generation were the insolation intensity variables during daylight hours and daylight time.


2013 ◽  
Vol 31 (3) ◽  
pp. 306-314 ◽  
Author(s):  
Edson Theodoro dos S. Neto ◽  
Eliana Zandonade ◽  
Adauto Oliveira Emmerich

OBJECTIVE To analyze the factors associated with breastfeeding duration by two statistical models. METHODS A population-based cohort study was conducted with 86 mothers and newborns from two areas primary covered by the National Health System, with high rates of infant mortality in Vitória, Espírito Santo, Brazil. During 30 months, 67 (78%) children and mothers were visited seven times at home by trained interviewers, who filled out survey forms. Data on food and sucking habits, socioeconomic and maternal characteristics were collected. Variables were analyzed by Cox regression models, considering duration of breastfeeding as the dependent variable, and logistic regression (dependent variables, was the presence of a breastfeeding child in different post-natal ages). RESULTS In the logistic regression model, the pacifier sucking (adjusted Odds Ratio: 3.4; 95%CI 1.2-9.55) and bottle feeding (adjusted Odds Ratio: 4.4; 95%CI 1.6-12.1) increased the chance of weaning a child before one year of age. Variables associated to breastfeeding duration in the Cox regression model were: pacifier sucking (adjusted Hazard Ratio 2.0; 95%CI 1.2-3.3) and bottle feeding (adjusted Hazard Ratio 2.0; 95%CI 1.2-3.5). However, protective factors (maternal age and family income) differed between both models. CONCLUSIONS Risk and protective factors associated with cessation of breastfeeding may be analyzed by different models of statistical regression. Cox Regression Models are adequate to analyze such factors in longitudinal studies.


2017 ◽  
Vol 47 (5) ◽  
Author(s):  
Priscila Becker Ferreira ◽  
Paulo Roberto Nogara Rorato ◽  
Fernanda Cristina Breda ◽  
Vanessa Tomazetti Michelotti ◽  
Alexandre Pires Rosa ◽  
...  

ABSTRACT: This study aimed to test different genotypic and residual covariance matrix structures in random regression models to model the egg production of Barred Plymouth Rock and White Plymouth Rock hens aged between 5 and 12 months. In addition, we estimated broad-sense heritability, and environmental and genotypic correlations. Six random regression models were evaluated, and for each model, 12 genotypic and residual matrix structures were tested. The random regression model with linear intercept and unstructured covariance (UN) for a matrix of random effects and unstructured correlation (UNR) for residual matrix adequately model the egg production curve of hens of the two study breeds. Genotypic correlations ranged from 0.15 (between age of 5 and 12 months) to 0.99 (between age of 10 and 11 months) and increased based on the time elapsed. Egg production heritability between 5- and 12-month-old hens increased with age, varying from 0.15 to 0.51. From the age of 9 months onward, heritability was moderate with estimates of genotypic correlations higher than 90% at the age of 10, 11, and 12 months. Results suggested that selection of hens to improve egg production should commence at the ninth month of age.


2020 ◽  
Vol 11 (1) ◽  
pp. 21
Author(s):  
Zahrotul Aflakhah ◽  
Jajang Jajang ◽  
Agustini Tripena Br. Sb.

This research discusses about the Ordinary Least Squares (OLS) method and robust M-estimation method; compare between the Tukey bisquare and Huber weighting from simple linier regression models that contain outliers. Data are generated through simulation with the percentages of outliers and sample sizes. Each data will be formed into a simple linier regression model, then the percentage of outliers, RSE and MAD values are calculated. The results show that RSE and MAD values produced by a simple linear regression model with the OLS method are influenced by the percentage of outliers. However, the regression model of robust M-estimation with sample size 30, 60, 90, 120, and 150 results an unstable RSE values with the change of the percentage of outlier and the MAD values that are not affected by the percentage of outliers and sample size. The robust M-estimation method with Tukey Bisquare weighting is as good as the Huber weighting. Full Article


Author(s):  
Rati WONGSATHAN

The novel coronavirus 2019 (COVID-19) pandemic was declared a global health crisis. The real-time accurate and predictive model of the number of infected cases could help inform the government of providing medical assistance and public health decision-making. This work is to model the ongoing COVID-19 spread in Thailand during the 1st and 2nd phases of the pandemic using the simple but powerful method based on the model-free and time series regression models. By employing the curve fitting, the model-free method using the logistic function, hyperbolic tangent function, and Gaussian function was applied to predict the number of newly infected patients and accumulate the total number of cases, including peak and viral cessation (ending) date. Alternatively, with a significant time-lag of historical data input, the regression model predicts those parameters from 1-day-ahead to 1-month-ahead. To obtain optimal prediction models, the parameters of the model-free method are fine-tuned through the genetic algorithm, whereas the generalized least squares update the parameters of the regression model. Assuming the future trend continues to follow the past pattern, the expected total number of patients is approximately 2,689 - 3,000 cases. The estimated viral cessation dates are May 2, 2020 (using Gaussian function), May 4, 2020 (using a hyperbolic function), and June 5, 2020 (using a logistic function), whereas the peak time occurred on April 5, 2020. Moreover, the model-free method performs well for long-term prediction, whereas the regression model is suitable for short-term prediction. Furthermore, the performances of the regression models yield a highly accurate forecast with lower RMSE and higher R2 up to 1-week-ahead. HIGHLIGHTS COVID-19 model for Thailand during the first and second phases of the epidemic The model-free method using the logistic function, hyperbolic tangent function, and Gaussian function  applied to predict the basic measures of the outbreak Regression model predicts those measures from one-day-ahead to one-month-ahead The parameters of the model-free method are fine-tuned through the genetic algorithm  GRAPHICAL ABSTRACT


2016 ◽  
Author(s):  
Geoffrey Fouad ◽  
André Skupin ◽  
Christina L. Tague

Abstract. Percentile flows are statistics derived from the flow duration curve (FDC) that describe the flow equaled or exceeded for a given percent of time. These statistics provide important information for managing rivers, but are often unavailable since most basins are ungauged. A common approach for predicting percentile flows is to deploy regional regression models based on gauged percentile flows and related independent variables derived from physical and climatic data. The first step of this process identifies groups of basins through a cluster analysis of the independent variables, followed by the development of a regression model for each group. This entire process hinges on the independent variables selected to summarize the physical and climatic state of basins. Distributed physical and climatic datasets now exist for the contiguous United States (US). However, it remains unclear how to best represent these data for the development of regional regression models. The study presented here developed regional regression models for the contiguous US, and evaluated the effect of different approaches for selecting the initial set of independent variables on the predictive performance of the regional regression models. An expert assessment of the dominant controls on the FDC was used to identify a small set of independent variables likely related to percentile flows. A data-driven approach was also applied to evaluate two larger sets of variables that consist of either (1) the averages of data for each basin or (2) both the averages and statistical distribution of basin data distributed in space and time. The small set of variables from the expert assessment of the FDC and two larger sets of variables for the data-driven approach were each applied for a regional regression procedure. Differences in predictive performance were evaluated using 184 validation basins withheld from regression model development. The small set of independent variables selected through expert assessment produced similar, if not better, performance than the two larger sets of variables. A parsimonious set of variables only consisted of mean annual precipitation, potential evapotranspiration, and baseflow index. Additional variables in the two larger sets of variables added little to no predictive information. Regional regression models based on the parsimonious set of variables were developed using 734 calibration basins, and were converted into a tool for predicting 13 percentile flows in the contiguous US. Supplementary Material for this paper includes an R graphical user interface for predicting the percentile flows of basins within the range of conditions used to calibrate the regression models. The equations and performance statistics of the models are also supplied in tabular form.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Jin Xu ◽  
Chao Yi

Cluster regression analysis model is an effective theory for a reasonable and fair player scoring game. It can roughly predict and evaluate the performance of athletes after the game with limited data and provide scientific predictions for the performance of athletes. The purpose of this research is to achieve the player’s postmatch scoring through the cluster regression model. Through the research and analysis of past ball games, the comparison and experiment of multiple objects based on different regression analysis theories, the following conclusions are drawn. Different regression models have different standard errors, but if the data in other model categories are put into the centroid model expression, the standard error and the error of the original model are within 0.3, which can replace other models for calculation. In the player’s postmatch scoring, although the expert’s prediction of the result is very accurate, within the error range of 1 copy, the player’s postmatch scoring mechanism based on the cluster regression analysis model is more accurate, and the error formula is in the 0.5 range. It is best to switch the data of the regression model twice to compare the scoring mechanism using different regression experiments.


Sign in / Sign up

Export Citation Format

Share Document