Multivariate Multiple Regression Prediction Models: A Euclidean Distance Approach

2003 ◽  
Vol 92 (3) ◽  
pp. 763-769 ◽  
Author(s):  
Paul W. Mielke ◽  
Kenneth J. Berry

An extension of a multiple regression prediction model to multiple response variables is presented. An algorithm using least sum of Euclidean distances between the multivariate observed and model-predicted response values provides regression coefficients, a measure of effect size, and inferential procedures for evaluating the extended multivariate multiple regression prediction model.

2002 ◽  
Vol 91 (1) ◽  
pp. 3-9 ◽  
Author(s):  
Paul W. Mielke ◽  
Kenneth J. Berry

A multivariate extension of a univariate procedure for the analysis of experimental designs is presented. A Euclidean-distance permutation procedure is used to evaluate multivariate residuals obtained from a regression algorithm, also based on Euclidean distances. Applications include various completely randomized and randomized block experimental designs such as one-way, Latin square, factorial, nested, and split-plot designs, with and without covariates. Unlike parametric procedures, the only required assumption is the randomization of subjects to treatments.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Chaohui Wang ◽  
Songyuan Tan ◽  
Qian Chen ◽  
Jiguo Han ◽  
Liang Song ◽  
...  

Dynamic modulus is a key evaluation index of the high-modulus asphalt mixture, but it is relatively difficult to test and collect its data. The purpose is to achieve the accurate prediction of the dynamic modulus of the high-modulus asphalt mixture and further optimize the design process of the high-modulus asphalt mixture. Five high-temperature performance indexes of high-modulus asphalt and its mixture were selected. The correlation between the above five indexes and the dynamic modulus of the high-modulus asphalt mixture was analyzed. On this basis, the dynamic modulus prediction models of the high-modulus asphalt mixture based on small sample data were established by multiple regression, general regression neural network (GRNN), and support vector machine (SVM) neural network. According to parameter adjustment and cross-validation, the output stability and accuracy of different prediction models were compared and evaluated. The most effective prediction model was recommended. The results show that the SVM model has more significant prediction accuracy and output stability than the multiple regression model and the GRNN model. Its prediction error was 0.98–9.71%. Compared with the other two models, the prediction error of the SVM model declined by 0.50–11.96% and 3.76–13.44%. The SVM neural network was recommended as the dynamic modulus prediction model of the high-modulus asphalt mixture.


2018 ◽  
Vol 7 (2) ◽  
pp. 56
Author(s):  
M. J. Hossain ◽  
A. K. Majumder

In constructing estimation and hypothesis testing procedures, it is important that all available information such as sign of parameter is used in order to maximize power of the test. Often prior information are known about the sign of regression coefficients (parameter) under test, the best example being that variances cannot be negative. Ignoring information about the signs of regression parameters can lead to loss of power in small samples. With this problem in mind, this paper concerned with developing restricted estimation and hypothesis testing approach in the context of multivariate multiple regression model. Developing the technique of estimating constraint regression coefficients and testing restricted parameters with the aid of information theoretic distance are the main contribution of this paper. The distribution of the existing two-sided test follows central chi-square distribution whereas the test statistic of our proposed distance-based one-sided test follows weighted mixture of chi-square distribution. Monte Carlo simulation indicates that our newly proposed test performs better than existing tests.


2020 ◽  
Vol 26 (33) ◽  
pp. 4195-4205
Author(s):  
Xiaoyu Ding ◽  
Chen Cui ◽  
Dingyan Wang ◽  
Jihui Zhao ◽  
Mingyue Zheng ◽  
...  

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.


2001 ◽  
Vol 10 (2) ◽  
pp. 241 ◽  
Author(s):  
Jon B. Marsden-Smedley ◽  
Wendy R. Catchpole

An experimental program was carried out in Tasmanian buttongrass moorlands to develop fire behaviour prediction models for improving fire management. This paper describes the results of the fuel moisture modelling section of this project. A range of previously developed fuel moisture prediction models are examined and three empirical dead fuel moisture prediction models are developed. McArthur’s grassland fuel moisture model gave equally good predictions as a linear regression model using humidity and dew-point temperature. The regression model was preferred as a prediction model as it is inherently more robust. A prediction model based on hazard sticks was found to have strong seasonal effects which need further investigation before hazard sticks can be used operationally.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 285
Author(s):  
Kwok Tai Chui ◽  
Brij B. Gupta ◽  
Pandian Vasant

Understanding the remaining useful life (RUL) of equipment is crucial for optimal predictive maintenance (PdM). This addresses the issues of equipment downtime and unnecessary maintenance checks in run-to-failure maintenance and preventive maintenance. Both feature extraction and prediction algorithm have played crucial roles on the performance of RUL prediction models. A benchmark dataset, namely Turbofan Engine Degradation Simulation Dataset, was selected for performance analysis and evaluation. The proposal of the combination of complete ensemble empirical mode decomposition and wavelet packet transform for feature extraction could reduce the average root-mean-square error (RMSE) by 5.14–27.15% compared with six approaches. When it comes to the prediction algorithm, the results of the RUL prediction model could be that the equipment needs to be repaired or replaced within a shorter or a longer period of time. Incorporating this characteristic could enhance the performance of the RUL prediction model. In this paper, we have proposed the RUL prediction algorithm in combination with recurrent neural network (RNN) and long short-term memory (LSTM). The former takes the advantages of short-term prediction whereas the latter manages better in long-term prediction. The weights to combine RNN and LSTM were designed by non-dominated sorting genetic algorithm II (NSGA-II). It achieved average RMSE of 17.2. It improved the RMSE by 6.07–14.72% compared with baseline models, stand-alone RNN, and stand-alone LSTM. Compared with existing works, the RMSE improvement by proposed work is 12.95–39.32%.


2021 ◽  
Vol 14 (7) ◽  
pp. 333
Author(s):  
Shilpa H. Shetty ◽  
Theresa Nithila Vincent

The study aimed to investigate the role of non-financial measures in predicting corporate financial distress in the Indian industrial sector. The proportion of independent directors on the board and the proportion of the promoters’ share in the ownership structure of the business were the non-financial measures that were analysed, along with ten financial measures. For this, sample data consisted of 82 companies that had filed for bankruptcy under the Insolvency and Bankruptcy Code (IBC). An equal number of matching financially sound companies also constituted the sample. Therefore, the total sample size was 164 companies. Data for five years immediately preceding the bankruptcy filing was collected for the sample companies. The data of 120 companies evenly drawn from the two groups of companies were used for developing the model and the remaining data were used for validating the developed model. Two binary logistic regression models were developed, M1 and M2, where M1 was formulated with both financial and non-financial variables, and M2 only had financial variables as predictors. The diagnostic ability of the model was tested with the aid of the receiver operating curve (ROC), area under the curve (AUC), sensitivity, specificity and annual accuracy. The results of the study show that inclusion of the two non-financial variables improved the efficacy of the financial distress prediction model. This study made a unique attempt to provide empirical evidence on the role played by non-financial variables in improving the efficiency of corporate distress prediction models.


2018 ◽  
Vol 8 (4) ◽  
pp. 1-23 ◽  
Author(s):  
Deepa Godara ◽  
Amit Choudhary ◽  
Rakesh Kumar Singh

In today's world, the heart of modern technology is software. In order to compete with pace of new technology, changes in software are inevitable. This article aims at the association between changes and object-oriented metrics using different versions of open source software. Change prediction models can detect the probability of change in a class earlier in the software life cycle which would result in better effort allocation, more rigorous testing and easier maintenance of any software. Earlier, researchers have used various techniques such as statistical methods for the prediction of change-prone classes. In this article, some new metrics such as execution time, frequency, run time information, popularity and class dependency are proposed which can help in prediction of change prone classes. For evaluating the performance of the prediction model, the authors used Sensitivity, Specificity, and ROC Curve. Higher values of AUC indicate the prediction model gives significant accurate results. The proposed metrics contribute to the accurate prediction of change-prone classes.


Author(s):  
Guizhou Hu ◽  
Martin M. Root

Background No methodology is currently available to allow the combining of individual risk factor information derived from different longitudinal studies for a chronic disease in a multivariate fashion. This paper introduces such a methodology, named Synthesis Analysis, which is essentially a multivariate meta-analytic technique. Design The construction and validation of statistical models using available data sets. Methods and results Two analyses are presented. (1) With the same data, Synthesis Analysis produced a similar prediction model to the conventional regression approach when using the same risk variables. Synthesis Analysis produced better prediction models when additional risk variables were added. (2) A four-variable empirical logistic model for death from coronary heart disease was developed with data from the Framingham Heart Study. A synthesized prediction model with five new variables added to this empirical model was developed using Synthesis Analysis and literature information. This model was then compared with the four-variable empirical model using the first National Health and Nutrition Examination Survey (NHANES I) Epidemiologic Follow-up Study data set. The synthesized model had significantly improved predictive power ( x2 = 43.8, P < 0.00001). Conclusions Synthesis Analysis provides a new means of developing complex disease predictive models from the medical literature.


Sign in / Sign up

Export Citation Format

Share Document