Regression and correlation analyses.

Author(s):  
Donald Quicke ◽  
Buntika A. Butcher ◽  
Rachel Kruft Welton

Abstract This chapter focuses on regression and correlation analyses. Correlation and regression analyses are used to test whether, and to what degree, variation in one continuous variable is related to variation in another continuous variable. In correlation analysis, there are no control over either variable, they are just data collected, and indeed, even if two variables are strongly correlated, they may not be influencing one another but simply both being affected by a third which perhaps was not measured. The initial assumption of the analysis is that the values of both variables are drawn from a normal distribution. In regression analysis one of the variables are being controlled seeing whether changing its value affects the other. The variable being controlled is the explanatory variable (sometimes called the treatment) and the other is the response variable. As the explanatory variables are being controlled, they are probably going to be set at specified values or set increments and are therefore not normally distributed. There may be more than one explanatory variable. If all the explanatory variables are categorical then the regression is called an ANOVA.

Author(s):  
Donald Quicke ◽  
Buntika A. Butcher ◽  
Rachel Kruft Welton

Abstract This chapter focuses on regression and correlation analyses. Correlation and regression analyses are used to test whether, and to what degree, variation in one continuous variable is related to variation in another continuous variable. In correlation analysis, there are no control over either variable, they are just data collected, and indeed, even if two variables are strongly correlated, they may not be influencing one another but simply both being affected by a third which perhaps was not measured. The initial assumption of the analysis is that the values of both variables are drawn from a normal distribution. In regression analysis one of the variables are being controlled seeing whether changing its value affects the other. The variable being controlled is the explanatory variable (sometimes called the treatment) and the other is the response variable. As the explanatory variables are being controlled, they are probably going to be set at specified values or set increments and are therefore not normally distributed. There may be more than one explanatory variable. If all the explanatory variables are categorical then the regression is called an ANOVA.


Author(s):  
Donald Quicke ◽  
Buntika A. Butcher ◽  
Rachel Kruft Welton

Abstract Analysis of variance is used to analyze the differences between group means in a sample, when the response variable is numeric (real numbers) and the explanatory variable(s) are all categorical. Each explanatory variable may have two or more factor levels, but if there is only one explanatory variable and it has only two factor levels, one should use Student's t-test and the result will be identical. Basically an ANOVA fits an intercept and slopes for one or more of the categorical explanatory variables. ANOVA is usually performed using the linear model function lm, or the more specific function aov, but there is a special function oneway.test when there is only a single explanatory variable. For a one-way ANOVA the non-parametric equivalent (if variance assumptions are not met) is the kruskal.test.


Author(s):  
Donald Quicke ◽  
Buntika A. Butcher ◽  
Rachel Kruft Welton

Abstract This chapter employs generalized linear modelling using the function glm when we know that variances are not constant with one or more explanatory variables and/or we know that the errors cannot be normally distributed, for example, they may be binary data, or count data where negative values are impossible, or proportions which are constrained between 0 and 1. A glm seeks to determine how much of the variation in the response variable can be explained by each explanatory variable, and whether such relationships are statistically significant. The data for generalized linear models take the form of a continuous response variable and a combination of continuous and discrete explanatory variables.


2020 ◽  
Vol 11 (3) ◽  
pp. 835-853
Author(s):  
Yu Huang ◽  
Lichao Yang ◽  
Zuntao Fu

Abstract. Despite the great success of machine learning, its application in climate dynamics has not been well developed. One concern might be how well the trained neural networks could learn a dynamical system and what will be the potential application of this kind of learning. In this paper, three machine-learning methods are used: reservoir computer (RC), backpropagation-based (BP) artificial neural network, and long short-term memory (LSTM) neural network. It shows that the coupling relations or dynamics among variables in linear or nonlinear systems can be inferred by RC and LSTM, which can be further applied to reconstruct one time series from the other. Specifically, we analyzed the climatic toy models to address two questions: (i) what factors significantly influence machine-learning reconstruction and (ii) how do we select suitable explanatory variables for machine-learning reconstruction. The results reveal that both linear and nonlinear coupling relations between variables do influence the reconstruction quality of machine learning. If there is a strong linear coupling between two variables, then the reconstruction can be bidirectional, and both of these two variables can be an explanatory variable for reconstructing the other. When the linear coupling among variables is absent but with the significant nonlinear coupling, the machine-learning reconstruction between two variables is direction dependent, and it may be only unidirectional. Then the convergent cross mapping (CCM) causality index is proposed to determine which variable can be taken as the reconstructed one and which as the explanatory variable. In a real-world example, the Pearson correlation between the average tropical surface air temperature (TSAT) and the average Northern Hemisphere SAT (NHSAT) is weak (0.08), but the CCM index of NHSAT cross mapped with TSAT is large (0.70). And this indicates that TSAT can be well reconstructed from NHSAT through machine learning. All results shown in this study could provide insights into machine-learning approaches for paleoclimate reconstruction, parameterization scheme, and prediction in related climate research.Highlights: i The coupling dynamics learned by machine learning can be used to reconstruct time series. ii Reconstruction quality is direction dependent and variable dependent for nonlinear systems. iii The CCM index is a potential indicator to choose reconstructed and explanatory variables. iv The tropical average SAT can be well reconstructed from the average Northern Hemisphere SAT.


Author(s):  
Donald Quicke ◽  
Buntika A. Butcher ◽  
Rachel Kruft Welton

Abstract Analysis of variance is used to analyze the differences between group means in a sample, when the response variable is numeric (real numbers) and the explanatory variable(s) are all categorical. Each explanatory variable may have two or more factor levels, but if there is only one explanatory variable and it has only two factor levels, one should use Student's t-test and the result will be identical. Basically an ANOVA fits an intercept and slopes for one or more of the categorical explanatory variables. ANOVA is usually performed using the linear model function lm, or the more specific function aov, but there is a special function oneway.test when there is only a single explanatory variable. For a one-way ANOVA the non-parametric equivalent (if variance assumptions are not met) is the kruskal.test.


Animals ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 1033 ◽  
Author(s):  
Jörg Aurich ◽  
Juliane Kuhl ◽  
Alexander Tichy ◽  
Christine Aurich

Differences in the cryotolerance of spermatozoa exist among stallions, but it remains to be determined to what extent such differences are affected by breed. In this study, post-thaw semen quality in stallions presented for semen cryopreservation was analysed retrospectively (1012 ejaculates from 134 stallions of 5 breeds). The percentage of frozen–thawed ejaculates acceptable for artificial insemination (AI) and the number of insemination doses per cryopreserved ejaculate was calculated. Logistic regression analysis revealed sperm motility in raw semen as the most important explanatory variable for the percentage of cryopreserved ejaculates with a post-thaw quality acceptable for AI. Of the other variables included into the model, stallion age was the most important parameter with more acceptable ejaculates in younger than in older stallions. Logistic regression also showed more acceptable frozen–thawed ejaculates in Arab stallions versus Warmbloods, Quarter Horses and Icelandic horses. The analysis thus demonstrates differences in the percentage of acceptable cryopreserved ejaculates among horse breeds. Season was a less relevant explanatory variable for percentage of acceptable cryopreserved ejaculates. Logistic regression revealed total sperm count as the most important variable determining the number of cryopreserved semen doses obtained per acceptable ejaculate. In conclusion, logistic regression analysis revealed stallion age and breed as explanatory variables for the percentage of cryopreserved ejaculates acceptable for AI.


2017 ◽  
Vol 7 (2) ◽  
pp. 124
Author(s):  
Yani Arthayanti ◽  
I Gusti Ayu Made Srinadi ◽  
G.K. Gandhiadi

Linear Regression Analysis is a statistical method for modeling relation between two variable, response and explanatory variable. Geograpically Weighted Regression (GWR) is the development of linier regression analysis if the case of spatial divers case. Local multicollinearity is a condition when explanatory variables had correlated with each observation location. Geograpically Weighted Ridge Regression (GWRR) is a method used to model data containing local multicollinearity on spatial data. GWRR model was developed from ridge regression by adding weight as additional information. The study aims to model spatial data containing local multicollinearity to the Human Development Index (HDI) in the districts/municipalities of eastern Java Province in 2015. The result of this study was indicate that the indicator of the average length of school is a dominant indicator that  affects HDI.  


Author(s):  
Friedrich Liebau ◽  
Ilse Pallas

AbstractThe shape of silicate single chains is described by their periodicity (number of tetrahedra in the repeat unit of the chain) and the degree of shrinkage compared with a maximum stretched chain.From a regression analysis of 54 single chain silicates it is concluded that such silicates can be divided into two groups: (1) Silicates with odd-periodic chains (pyroxenoids and pyroxenes) and (2) silicates with even-periodic chains.Although the results of the analysis are not accurate enough to make reliable quantitative predictions about the shape of a silicate chain merely from the chemical composition of the silicate, some general relations could be found. So it turned out that even-periodic chains become less stretched with higher mean electronegativity and higher mean valence of their cations. In contrast, for odd-periodic chain silicates the degree of chain shrinkage is strongly correlated with the mean electronegativity and less so with the mean radius of the cations. On the other hand, the periodicity of the silicate chains is directly correlated with their degree of shrinkage. These results of the regression analysis are explained in terms of simple crystal chemical considerations.


1976 ◽  
Vol 36 (2) ◽  
pp. 399-415 ◽  
Author(s):  
Richard Pomfret

This paper aims to provide an economic explanation of the pace and causes of the diffusion of the mechanical reaper in Ontario, 1850–1870. The analysis is based on Paul David's diffusion model, extended by the introduction of the size distribution of farms. The model is able to capture the reaper's S-shaped diffusion path. The major explanatory variable is improvements in reaper design, followed in importance by increased scale of operations and changes in factor prices. A third finding is that the effect of change in one of the three explanatory variables depends on the level of the other variables.


2020 ◽  
Vol 18 (1, Special Issue) ◽  
pp. 346-354
Author(s):  
Sylvie Berthelot ◽  
Michel Coulmont

The purpose of this study is to determine whether shareholders take directors’ independence, gender, expertise, and reputation into account when voting in directors’ elections. To this end, we regressed several explanatory variables representing these characteristics on the percentage of “in favour” votes cast during annual elections in 2017 for each director, based on a sample of 60 Canadian firms. Among these explanatory variables, we used two measures of their reputation, one measure of their level of education, several measures of their area of expertise, and one measure of their independence. Their reputation was assessed based on their inclusion in the Canadian Who’s Who directory and their membership on another board of directors of a Canadian public company. The other explanatory variables were collected from official company documents, especially the proxy circulars available on the Canadian Securities Administrators website. The accounting and financial variables were drawn from the Research Insights database. The results of the regression analysis indicate that although shareholders do not seem to consider directors’ reputation and expertise when casting their vote, they do take their independence and gender into account


Sign in / Sign up

Export Citation Format

Share Document