scholarly journals Predicting the performance of the players in NBA Players by divided regression analysis

2019 ◽  
Vol 15 (3) ◽  
pp. 441-446
Author(s):  
Yann Ling Goh ◽  
Yeh Huann Goh ◽  
Ling Leh Bin Raymond ◽  
Weng Hoong Chee

A divided regression model is built to predict the performance of the players in the National Basketball Association (NBA) from year 1997 until year 2017. The whole data set is divided into five groups of sub data sets and multiple linear regression model is employed to model each of the sub data set. In addition, the relationships among independent variables are checked by using variance inflation factor (VIF) to identify the risk of having multicollinearity in the data. Moreover, non-linearity of regression model, non-constancy of error variance and non-normality of error terms are investigated by plotting residual plots and quantile-quantile plots. Finally, a divided regression model is built by combining the results obtained from the sub data sets and the performance of the divided regression model is verified.

2018 ◽  
Vol 11 (7) ◽  
pp. 4239-4260 ◽  
Author(s):  
Richard Anthes ◽  
Therese Rieckh

Abstract. In this paper we show how multiple data sets, including observations and models, can be combined using the “three-cornered hat” (3CH) method to estimate vertical profiles of the errors of each system. Using data from 2007, we estimate the error variances of radio occultation (RO), radiosondes, ERA-Interim, and Global Forecast System (GFS) model data sets at four radiosonde locations in the tropics and subtropics. A key assumption is the neglect of error covariances among the different data sets, and we examine the consequences of this assumption on the resulting error estimates. Our results show that different combinations of the four data sets yield similar relative and specific humidity, temperature, and refractivity error variance profiles at the four stations, and these estimates are consistent with previous estimates where available. These results thus indicate that the correlations of the errors among all data sets are small and the 3CH method yields realistic error variance profiles. The estimated error variances of the ERA-Interim data set are smallest, a reasonable result considering the excellent model and data assimilation system and assimilation of high-quality observations. For the four locations studied, RO has smaller error variances than radiosondes, in agreement with previous studies. Part of the larger error variance of the radiosondes is associated with representativeness differences because radiosondes are point measurements, while the other data sets represent horizontal averages over scales of ∼ 100 km.


2015 ◽  
Vol 2015 ◽  
pp. 1-12
Author(s):  
Mohammed Alguraibawi ◽  
Habshah Midi ◽  
A. H. M. Rahmatullah Imon

Identification of high leverage point is crucial because it is responsible for inaccurate prediction and invalid inferential statement as it has a larger impact on the computed values of various estimates. It is essential to classify the high leverage points into good and bad leverage points because only the bad leverage points have an undue effect on the parameter estimates. It is now evident that when a group of high leverage points is present in a data set, the existing robust diagnostic plot fails to classify them correctly. This problem is due to the masking and swamping effects. In this paper, we propose a new robust diagnostic plot to correctly classify the good and bad leverage points by reducing both masking and swamping effects. The formulation of the proposed plot is based on the Modified Generalized Studentized Residuals. We investigate the performance of our proposed method by employing a Monte Carlo simulation study and some well-known data sets. The results indicate that the proposed method is able to improve the rate of detection of bad leverage points and also to reduce swamping and masking effects.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Qingqi Zhang

In this paper, the author first analyzes the major factors affecting housing prices with Spearman correlation coefficient, selects significant factors influencing general housing prices, and conducts a combined analysis algorithm. Then, the author establishes a multiple linear regression model for housing price prediction and applies the data set of real estate prices in Boston to test the method. Through the data analysis and test in this paper, it can be summarized that the multiple linear regression model can effectively predict and analyze the housing price to some extent, while the algorithm can still be improved through more advanced machine learning methods.


2019 ◽  
Vol 14 (2) ◽  
Author(s):  
Senthilkumar T ◽  
Venkatesh R ◽  
Sam Charles J ◽  
Senthil P ◽  
Praveen kumar V

Energy consumption forecasting is vitally important for the deregulated electricity industry in India, particularly in Tamilnadu state. A large variety of mathematical methods have been developed for energy forecasting. In this study, historical data set including population (POP), Gross state domestic Product (GSDP), Yearly peak demand (YPD), and Per Capita income (PCI) were considered from the year 2005 to 2011.Firstly, the multiple linear regression model (MLRM)has been developed. The regression model outputs were optimized using Neural network method.


2021 ◽  
Vol 2 (2) ◽  
pp. 40-47
Author(s):  
Sunil Kumar ◽  
Vaibhav Bhatnagar

Machine learning is one of the active fields and technologies to realize artificial intelligence (AI). The complexity of machine learning algorithms creates problems to predict the best algorithm. There are many complex algorithms in machine learning (ML) to determine the appropriate method for finding regression trends, thereby establishing the correlation association in the middle of variables is very difficult, we are going to review different types of regressions used in Machine Learning. There are mainly six types of regression model Linear, Logistic, Polynomial, Ridge, Bayesian Linear and Lasso. This paper overview the above-mentioned regression model and will try to find the comparison and suitability for Machine Learning. A data analysis prerequisite to launch an association amongst the innumerable considerations in a data set, association is essential for forecast and exploration of data. Regression Analysis is such a procedure to establish association among the datasets. The effort on this paper predominantly emphases on the diverse regression analysis model, how they binning to custom in context of different data sets in machine learning. Selection the accurate model for exploration is the most challenging assignment and hence, these models considered thoroughly in this study. In machine learning by these models in the perfect way and thru accurate data set, data exploration and forecast can provide the maximum exact outcomes.


2021 ◽  
Author(s):  
◽  
Nazrina Aziz

<p>This thesis investigates three research problems which arise in multivariate data and censored regression. The first is the identification of outliers in multivariate data. The second is a dissimilarity measure for clustering purposes. The third is the diagnostics analysis for the Buckley-James method in censored regression. Outliers can be defined simply as an observation (or a subset of observations) that is isolated from the other observations in the data set. There are two main reasons that motivate people to find outliers; the first is the researcher's intention. The second is the effects of an outlier on analyses, i.e. the existence of outliers will affect means, variances and regression coefficients; they will also cause a bias or distortion of estimates; likewise, they will inflate the sums of squares and hence, false conclusions are likely to be created. Sometimes, the identification of outliers is the main objective of the analysis, and whether to remove the outliers or for them to be down-weighted prior to fitting a non-robust model. This thesis does not differentiate between the various justifications for outlier detection. The aim is to advise the analyst of observations that are considerably different from the majority. Note that the techniques for identification of outliers introduce in this thesis is applicable to a wide variety of settings. Those techniques are performed on large and small data sets. In this thesis, observations that are located far away from the remaining data are considered to be outliers. Additionally, it is noted that some techniques for the identification of outliers are available for finding clusters. There are two major challenges in clustering. The first is identifying clusters in high-dimensional data sets is a difficult task because of the curse of dimensionality. The second is a new dissimilarity measure is needed as some traditional distance functions cannot capture the pattern dissimilarity among the objects. This thesis deals with the latter challenge. This thesis introduces Influence Angle Cluster Approach (iaca) that may be used as a dissimilarity matrix and the author has managed to show that iaca successfully develops a cluster when it is used in partitioning clustering, even if the data set has mixed variables, i.e. interval and categorical variables. The iaca is developed based on the influence eigenstructure. The first two problems in this thesis deal with a complete data set. It is also interesting to study about the incomplete data set, i.e. censored data set. The term 'censored' is mostly used in biological science areas such as a survival analysis. Nowadays, researchers are interested in comparing the survival distribution of two samples. Even though this can be done by using the logrank test, this method cannot examine the effects of more than one variable at a time. This difficulty can easily be overcome by using the survival regression model. Examples of the survival regression model are the Cox model, Miller's model, the Buckely James model and the Koul- Susarla-Van Ryzin model. The Buckley James model's performance is comparable with the Cox model and the former performs best when compared both to the Miller model and the Koul-Susarla-Van Ryzin model. Previous comparison studies proved that the Buckley-James estimator is more stable and easier to explain to non-statisticians than the Cox model. Today, researchers are interested in using the Cox model instead of the Buckley-James model. This is because of the lack of function of Buckley-James model in the computer software and choices of diagnostics analysis. Currently, there are only a few diagnostics analyses for Buckley James model that exist. Therefore, this thesis proposes two new diagnostics analyses for the Buckley-James model. The first proposed diagnostics analysis is called renovated Cook's distance. This method produces comparable results with the previous findings. Nevertheless, this method cannot identify influential observations from the censored group. It can only detect influential observations from the uncensored group. This issue needs further investigation because of the possibility of censored points becoming influential cases in censored regression. Secondly, the local influence approach for the Buckley-James model is proposed. This thesis presents the local influence diagnostics of the Buckley-James model which consist of variance perturbation, response variable perturbation, censoring status perturbation, and independent variables perturbation. The proposed diagnostics improves and also challenge findings of the previous ones by taking into account both censored and uncensored data to have a possibility to become an influential observation.</p>


Author(s):  
V. G. Jemilohun

This study investigates the impact of violation of the assumption of the hierarchical linear model where covariate of level – 1 collinear with the correct functional and omitted variable model. This was carried out via Monte Carlo simulation. In an attempt to achieve this omitted variable bias was introduced. The study considers the multicollinearity effects when the models are in the correct form and when they are not in the correct form.  Also, multicollinearity test was carried out on the data set to find out whether there is presence of multicollinearity among the data set using Variance Inflation Factor (VIF).  At the end of the study, the result shows that, omitted variable has tremendous impact on hierarchical linear model.


2019 ◽  
Vol 54 (3) ◽  
pp. 259-273
Author(s):  
Zirui Jia ◽  
Zengli Wang

Purpose Frequent itemset mining (FIM) is a basic topic in data mining. Most FIM methods build itemset database containing all possible itemsets, and use predefined thresholds to determine whether an itemset is frequent. However, the algorithm has some deficiencies. It is more fit for discrete data rather than ordinal/continuous data, which may result in computational redundancy, and some of the results are difficult to be interpreted. The purpose of this paper is to shed light on this gap by proposing a new data mining method. Design/methodology/approach Regression pattern (RP) model will be introduced, in which the regression model and FIM method will be combined to solve the existing problems. Using a survey data of computer technology and software professional qualification examination, the multiple linear regression model is selected to mine associations between items. Findings Some interesting associations mined by the proposed algorithm and the results show that the proposed method can be applied in ordinal/continuous data mining area. The experiment of RP model shows that, compared to FIM, the computational redundancy decreased and the results contain more information. Research limitations/implications The proposed algorithm is designed for ordinal/continuous data and is expected to provide inspiration for data stream mining and unstructured data mining. Practical implications Compared to FIM, which mines associations between discrete items, RP model could mine associations between ordinal/continuous data sets. Importantly, RP model performs well in saving computational resource and mining meaningful associations. Originality/value The proposed algorithms provide a novelty view to define and mine association.


2016 ◽  
Vol 77 (1) ◽  
pp. 165-178 ◽  
Author(s):  
Tenko Raykov ◽  
George A. Marcoulides ◽  
Tenglong Li

The measurement error in principal components extracted from a set of fallible measures is discussed and evaluated. It is shown that as long as one or more measures in a given set of observed variables contains error of measurement, so also does any principal component obtained from the set. The error variance in any principal component is shown to be (a) bounded from below by the smallest error variance in a variable from the analyzed set and (b) bounded from above by the largest error variance in a variable from that set. In the case of a unidimensional set of analyzed measures, it is pointed out that the reliability and criterion validity of any principal component are bounded from above by these respective coefficients of the optimal linear combination with maximal reliability and criterion validity (for a criterion unrelated to the error terms in the individual measures). The discussed psychometric features of principal components are illustrated on a numerical data set.


Sign in / Sign up

Export Citation Format

Share Document