Multivariate Multiple Regression Prediction Models: A Euclidean Distance Approach

An extension of a multiple regression prediction model to multiple response variables is presented. An algorithm using least sum of Euclidean distances between the multivariate observed and model-predicted response values provides regression coefficients, a measure of effect size, and inferential procedures for evaluating the extended multivariate multiple regression prediction model.

Download Full-text

Multivariate Multiple Regression Analyses: A Permutation Method for Linear Models

Psychological Reports ◽

10.2466/pr0.2002.91.1.3 ◽

2002 ◽

Vol 91 (1) ◽

pp. 3-9 ◽

Cited By ~ 6

Author(s):

Paul W. Mielke ◽

Kenneth J. Berry

Keyword(s):

Multiple Regression ◽

Euclidean Distance ◽

Linear Models ◽

Latin Square ◽

Experimental Designs ◽

Regression Analyses ◽

Permutation Procedure ◽

Multivariate Extension ◽

Multivariate Multiple Regression ◽

Euclidean Distances

A multivariate extension of a univariate procedure for the analysis of experimental designs is presented. A Euclidean-distance permutation procedure is used to evaluate multivariate residuals obtained from a regression algorithm, also based on Euclidean distances. Applications include various completely randomized and randomized block experimental designs such as one-way, Latin square, factorial, nested, and split-plot designs, with and without covariates. Unlike parametric procedures, the only required assumption is the randomization of subjects to treatments.

Download Full-text

Dynamic Modulus Prediction of a High-Modulus Asphalt Mixture

Advances in Civil Engineering ◽

10.1155/2021/9944415 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Chaohui Wang ◽

Songyuan Tan ◽

Qian Chen ◽

Jiguo Han ◽

Liang Song ◽

...

Keyword(s):

Neural Network ◽

Prediction Model ◽

Multiple Regression ◽

Dynamic Modulus ◽

Prediction Error ◽

Prediction Models ◽

Asphalt Mixture ◽

High Modulus ◽

Output Stability ◽

Svm Model

Dynamic modulus is a key evaluation index of the high-modulus asphalt mixture, but it is relatively difficult to test and collect its data. The purpose is to achieve the accurate prediction of the dynamic modulus of the high-modulus asphalt mixture and further optimize the design process of the high-modulus asphalt mixture. Five high-temperature performance indexes of high-modulus asphalt and its mixture were selected. The correlation between the above five indexes and the dynamic modulus of the high-modulus asphalt mixture was analyzed. On this basis, the dynamic modulus prediction models of the high-modulus asphalt mixture based on small sample data were established by multiple regression, general regression neural network (GRNN), and support vector machine (SVM) neural network. According to parameter adjustment and cross-validation, the output stability and accuracy of different prediction models were compared and evaluated. The most effective prediction model was recommended. The results show that the SVM model has more significant prediction accuracy and output stability than the multiple regression model and the GRNN model. Its prediction error was 0.98–9.71%. Compared with the other two models, the prediction error of the SVM model declined by 0.50–11.96% and 3.76–13.44%. The SVM neural network was recommended as the dynamic modulus prediction model of the high-modulus asphalt mixture.

Download Full-text

Correlation Simple correlation; Measurement of a correlation; Correlation coefficients (Pearson product-moment, Spearman’s rho); Significance and correlation coefficients;Variance estimates; SPSS procedures for correlation;What you can’t assume with a correlation; Categorical variables; Common uses of correlation in psychology; Regression and multiple regression; Multiple predictions; Partial and semi-partial correlation; Regression coefficients; Effect size and power; Conducting a regression analysis in SPSS

Research Methods and Statistics in Psychology ◽

10.4324/9780203769669-20 ◽

2013 ◽

pp. 438-484

Keyword(s):

Regression Analysis ◽

Multiple Regression ◽

Effect Size ◽

Partial Correlation ◽

Correlation Coefficients ◽

Correlation Measurement ◽

Regression Coefficients ◽

Categorical Variables ◽

Simple Correlation ◽

Variance Estimates

Download Full-text

Proposed Distance-Based Test for Testing Multivariate Multiple Regression Coefficients under Restricted Alternatives

International Journal of Statistics and Probability ◽

10.5539/ijsp.v7n2p56 ◽

2018 ◽

Vol 7 (2) ◽

pp. 56

Author(s):

M. J. Hossain ◽

A. K. Majumder

Keyword(s):

Hypothesis Testing ◽

Multiple Regression ◽

Multiple Regression Model ◽

Regression Coefficients ◽

Small Samples ◽

Test Statistic ◽

Chi Square ◽

Testing Procedures ◽

Power Of The Test ◽

Multivariate Multiple Regression

In constructing estimation and hypothesis testing procedures, it is important that all available information such as sign of parameter is used in order to maximize power of the test. Often prior information are known about the sign of regression coefficients (parameter) under test, the best example being that variances cannot be negative. Ignoring information about the signs of regression parameters can lead to loss of power in small samples. With this problem in mind, this paper concerned with developing restricted estimation and hypothesis testing approach in the context of multivariate multiple regression model. Developing the technique of estimating constraint regression coefficients and testing restricted parameters with the aid of information theoretic distance are the main contribution of this paper. The distribution of the existing two-sided test follows central chi-square distribution whereas the test statistic of our proposed distance-based one-sided test follows weighted mixture of chi-square distribution. Monte Carlo simulation indicates that our newly proposed test performs better than existing tests.

Download Full-text

Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

Current Pharmaceutical Design ◽

10.2174/1381612826666200427111309 ◽

2020 ◽

Vol 26 (33) ◽

pp. 4195-4205

Author(s):

Xiaoyu Ding ◽

Chen Cui ◽

Dingyan Wang ◽

Jihui Zhao ◽

Mingyue Zheng ◽

...

Keyword(s):

Prediction Model ◽

Large Scale ◽

Prediction Models ◽

Predictive Accuracy ◽

Lead Optimization ◽

Consensus Method ◽

Molecular Pair ◽

Bioactivity Prediction ◽

Compound Synthesis ◽

Consensus Modeling

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.

Download Full-text

Fire modelling in Tasmanian buttongrass moorlands. III. Dead fuel moisture

International Journal of Wildland Fire ◽

10.1071/wf01025 ◽

2001 ◽

Vol 10 (2) ◽

pp. 241 ◽

Cited By ~ 27

Author(s):

Jon B. Marsden-Smedley ◽

Wendy R. Catchpole

Keyword(s):

Prediction Model ◽

Regression Model ◽

Fire Management ◽

Prediction Models ◽

Dew Point ◽

Seasonal Effects ◽

Experimental Program ◽

Fuel Moisture ◽

Fire Behaviour ◽

Fire Modelling

An experimental program was carried out in Tasmanian buttongrass moorlands to develop fire behaviour prediction models for improving fire management. This paper describes the results of the fuel moisture modelling section of this project. A range of previously developed fuel moisture prediction models are examined and three empirical dead fuel moisture prediction models are developed. McArthur’s grassland fuel moisture model gave equally good predictions as a linear regression model using humidity and dew-point temperature. The regression model was preferred as a prediction model as it is inherently more robust. A prediction model based on hazard sticks was found to have strong seasonal effects which need further investigation before hazard sticks can be used operationally.

Download Full-text

A Genetic Algorithm Optimized RNN-LSTM Model for Remaining Useful Life Prediction of Turbofan Engine

Electronics ◽

10.3390/electronics10030285 ◽

2021 ◽

Vol 10 (3) ◽

pp. 285

Author(s):

Kwok Tai Chui ◽

Brij B. Gupta ◽

Pandian Vasant

Keyword(s):

Genetic Algorithm ◽

Feature Extraction ◽

Prediction Model ◽

Prediction Models ◽

Remaining Useful Life ◽

Prediction Algorithm ◽

Short Term ◽

Turbofan Engine ◽

Term Prediction ◽

Useful Life

Understanding the remaining useful life (RUL) of equipment is crucial for optimal predictive maintenance (PdM). This addresses the issues of equipment downtime and unnecessary maintenance checks in run-to-failure maintenance and preventive maintenance. Both feature extraction and prediction algorithm have played crucial roles on the performance of RUL prediction models. A benchmark dataset, namely Turbofan Engine Degradation Simulation Dataset, was selected for performance analysis and evaluation. The proposal of the combination of complete ensemble empirical mode decomposition and wavelet packet transform for feature extraction could reduce the average root-mean-square error (RMSE) by 5.14–27.15% compared with six approaches. When it comes to the prediction algorithm, the results of the RUL prediction model could be that the equipment needs to be repaired or replaced within a shorter or a longer period of time. Incorporating this characteristic could enhance the performance of the RUL prediction model. In this paper, we have proposed the RUL prediction algorithm in combination with recurrent neural network (RNN) and long short-term memory (LSTM). The former takes the advantages of short-term prediction whereas the latter manages better in long-term prediction. The weights to combine RNN and LSTM were designed by non-dominated sorting genetic algorithm II (NSGA-II). It achieved average RMSE of 17.2. It improved the RMSE by 6.07–14.72% compared with baseline models, stand-alone RNN, and stand-alone LSTM. Compared with existing works, the RMSE improvement by proposed work is 12.95–39.32%.

Download Full-text

The Role of Board Independence and Ownership Structure in Improving the Efficacy of Corporate Financial Distress Prediction Model Evidence from India

Journal of Risk and Financial Management ◽

10.3390/jrfm14070333 ◽

2021 ◽

Vol 14 (7) ◽

pp. 333

Author(s):

Shilpa H. Shetty ◽

Theresa Nithila Vincent

Keyword(s):

Prediction Model ◽

Ownership Structure ◽

Financial Distress ◽

Prediction Models ◽

Receiver Operating Curve ◽

Financial Measures ◽

Financial Variables ◽

Financial Distress Prediction ◽

Distress Prediction

The study aimed to investigate the role of non-financial measures in predicting corporate financial distress in the Indian industrial sector. The proportion of independent directors on the board and the proportion of the promoters’ share in the ownership structure of the business were the non-financial measures that were analysed, along with ten financial measures. For this, sample data consisted of 82 companies that had filed for bankruptcy under the Insolvency and Bankruptcy Code (IBC). An equal number of matching financially sound companies also constituted the sample. Therefore, the total sample size was 164 companies. Data for five years immediately preceding the bankruptcy filing was collected for the sample companies. The data of 120 companies evenly drawn from the two groups of companies were used for developing the model and the remaining data were used for validating the developed model. Two binary logistic regression models were developed, M1 and M2, where M1 was formulated with both financial and non-financial variables, and M2 only had financial variables as predictors. The diagnostic ability of the model was tested with the aid of the receiver operating curve (ROC), area under the curve (AUC), sensitivity, specificity and annual accuracy. The results of the study show that inclusion of the two non-financial variables improved the efficacy of the financial distress prediction model. This study made a unique attempt to provide empirical evidence on the role played by non-financial variables in improving the efficiency of corporate distress prediction models.

Download Full-text

Predicting Change Prone Classes in Open Source Software

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2018100101 ◽

2018 ◽

Vol 8 (4) ◽

pp. 1-23 ◽

Cited By ~ 2

Author(s):

Deepa Godara ◽

Amit Choudhary ◽

Rakesh Kumar Singh

Keyword(s):

Prediction Model ◽

Open Source ◽

Open Source Software ◽

Prediction Models ◽

New Technology ◽

Modern Technology ◽

Time Frequency ◽

Rigorous Testing ◽

Technology Changes ◽

Sensitivity Specificity

In today's world, the heart of modern technology is software. In order to compete with pace of new technology, changes in software are inevitable. This article aims at the association between changes and object-oriented metrics using different versions of open source software. Change prediction models can detect the probability of change in a class earlier in the software life cycle which would result in better effort allocation, more rigorous testing and easier maintenance of any software. Earlier, researchers have used various techniques such as statistical methods for the prediction of change-prone classes. In this article, some new metrics such as execution time, frequency, run time information, popularity and class dependency are proposed which can help in prediction of change prone classes. For evaluating the performance of the prediction model, the authors used Sensitivity, Specificity, and ROC Curve. Higher values of AUC indicate the prediction model gives significant accurate results. The proposed metrics contribute to the accurate prediction of change-prone classes.

Download Full-text

Building prediction models for coronary heart disease by synthesizing multiple longitudinal research findings

European Journal of Cardiovascular Prevention & Rehabilitation ◽

10.1097/01.hjr.0000173109.14228.71 ◽

2005 ◽

Vol 12 (5) ◽

pp. 459-464 ◽

Cited By ~ 4

Author(s):

Guizhou Hu ◽

Martin M. Root

Keyword(s):

Coronary Heart Disease ◽

Heart Disease ◽

Prediction Model ◽

Empirical Model ◽

Complex Disease ◽

Prediction Models ◽

Longitudinal Research ◽

Study Data ◽

Individual Risk ◽

Data Set

Background No methodology is currently available to allow the combining of individual risk factor information derived from different longitudinal studies for a chronic disease in a multivariate fashion. This paper introduces such a methodology, named Synthesis Analysis, which is essentially a multivariate meta-analytic technique. Design The construction and validation of statistical models using available data sets. Methods and results Two analyses are presented. (1) With the same data, Synthesis Analysis produced a similar prediction model to the conventional regression approach when using the same risk variables. Synthesis Analysis produced better prediction models when additional risk variables were added. (2) A four-variable empirical logistic model for death from coronary heart disease was developed with data from the Framingham Heart Study. A synthesized prediction model with five new variables added to this empirical model was developed using Synthesis Analysis and literature information. This model was then compared with the four-variable empirical model using the first National Health and Nutrition Examination Survey (NHANES I) Epidemiologic Follow-up Study data set. The synthesized model had significantly improved predictive power ( x2 = 43.8, P < 0.00001). Conclusions Synthesis Analysis provides a new means of developing complex disease predictive models from the medical literature.

Download Full-text