Methods of Estimating Correlation Coefficients in the Presence of Influential Outlier(s)

Correlation methods are indispensable in the study of the linear relationship between two variables. However, many researchers often adopt inappropriate correlation methods in the study of linear relationships which usually leads to unreliable results. Recurrently, most researchers ignorantly employ the Pearson method in a dataset that contained outliers, instead of more appropriate correlation methods such as Spearman, Kendall Tau, Median and Quadrant which might be suitable in the calculation of correlation coefficient in the presence of influential outliers. It is noted that the accuracy of estimation of correlation coefficients under outliers has been a long-standing problem for methodological researchers. This is due to low knowledge of correlation methods and their assumptions which have led to inappropriate application of correlation methods in research analysis. Five different methods of estimating correlation coefficients in the presence of influential outlier (contaminated data) were considered: Pearson Correlation Coefficient, Spearman Correlation Coefficient, Kendall Tau Correlation Coefficient, Median Correlation Coefficient and Quadrant Correlation Coefficient.

Download Full-text

Single- versus multiple-item assessment of quality of life in patients with advanced cancer

Journal of Clinical Oncology ◽

10.1200/jco.2009.27.15_suppl.e20528 ◽

2009 ◽

Vol 27 (15_suppl) ◽

pp. e20528-e20528

Author(s):

S. H. Bush ◽

H. A. Parsons ◽

J. L. Palmer ◽

R. Chacko ◽

Z. Li ◽

...

Keyword(s):

Quality Of Life ◽

Regression Analysis ◽

Correlation Coefficient ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Well Being ◽

Pearson Correlation Coefficient ◽

Spearman Correlation ◽

Symptom Intensity

e20528 Background: The main objective of palliative cancer care is to improve quality of life (QOL). As multiple dimensions impact on the construct of QOL, multi-dimensional instruments are usually used in its measurement. These are time consuming and burdensome for repeated use. Recent authors have suggested that brief single-item global assessments can provide a reliable measure of QOL. We assessed the performance of the Edmonton Symptom Assessment System ‘feeling of well-being’ item (ESAS WB) using the Functional Assessment of Cancer Therapy - General (FACT-G) instrument as a gold standard. Methods: After obtaining IRB approval, we reviewed the data from 213 advanced cancer patients who had participated in six studies from March 2006 to June 2008 and determined the level of association between baseline ESAS WB and FACT-G total score and subscale domains (Physical (Pwb), Social/Family (Swb), Emotional (Ewb), and Functional (Fwb) Well-Being) and also the 9 ESAS symptom intensity scores using Spearman correlation coefficients. We also calculated the change between the baseline (T1) and second (T2) observations of ESAS WB and of FACT-G total score and determined their level of association using a Pearson correlation coefficient. In addition, we predicted the change in FACT-G as predicted by the change in ESAS WB score using regression analysis. Results: Mean age was 60 (SD 12) years and 48% were female. At T1, the Spearman correlation coefficient of ESAS WB and FACT-G was -0.48 (p<0.0001). Spearman correlation coefficients for ESAS WB and FACT-G subscale domains and ESAS symptom intensity scores were also highly significant (p<0.0001) for all physical and emotional symptoms (other than p=0.003 for nausea) except for FACT Swb (p=0.08). The Pearson correlation coefficient for difference between T1 and T2 in ESAS WB and FACT-G for 146 patients was -0.36 (p<0.0001). The regression analysis was highly significant (p<0.0001). The change in ESAS WB corresponding to FACT-G published minimally important difference (MID) was -0.24 for 3, -1.55 for 5, and -2.87 for 7, respectively. Conclusions: ESAS WB is a practical instrument for clinical use and best reflects the Pwb, Ewb and Fwb domains of FACT-G as compared to Swb. No significant financial relationships to disclose.

Download Full-text

Assessing the Motor Status Score: A Scale for the Evaluation of Upper Limb Motor Outcomes in Patients after Stroke

Neurorehabilitation and Neural Repair ◽

10.1177/154596830201600306 ◽

2002 ◽

Vol 16 (3) ◽

pp. 283-289 ◽

Cited By ~ 56

Author(s):

Mark Ferraro ◽

Jennifer Hogan Demaio ◽

Jennifer Krol ◽

Chris Trudell ◽

Keren Rannekleiv ◽

...

Keyword(s):

Correlation Coefficient ◽

Upper Limb ◽

Pearson Correlation ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Pearson Correlation Coefficient ◽

Maximum Score ◽

Intraclass Correlation Coefficients ◽

Upper Extremity Impairment ◽

Motor Outcomes

The Motor Status Scale (MSS) measures shoulder, elbow (maximum score = 40), wrist, hand, and finger movements (maximum score = 42), and expands the measurement of upper extremity impairment and disability provided by the Fugl-Meyer (FM) score. This work examines the interrater reliability and criterion validity of the MSS performed in patients admitted to a rehabilitation hospital 21 ± 4 days after stroke. Using the MSS and the FM, 7 occupational therapists masked to each other’s judgments, evaluated 12 consecutive patients with stroke. Two therapists evaluated 6 additional patients on consecutive days. Intraclass correlation coefficients were significant for each group of raters for the shoulder/elbow and for the wrist/hand (P < 0.0001); test-retest measures were also significant for the shoulder/elbow (Pearson correlation coefficient r = 0.99, P < 0.004) and for the wrist/hand (Pearson correlation coefficient r = 0.99, P < 0.003). The internal item consistency for the overall MSS was significant (Cronbach alpha = 0.98, P < 0.0001). Finally the correlation between the MSS and the FM (R 2 = 0.964) was significant (P < 0.0001). The MSS affords a reliable and valid assessment of upper limb impairment and disability following stroke.

Download Full-text

Measurement of flow harmonics correlations with mean transverse momentum in lead–lead and proton–lead collisions at $$\sqrt{s_{\mathrm{NN}}} = 5.02~\hbox {TeV}$$ with the ATLAS detector

The European Physical Journal C ◽

10.1140/epjc/s10052-019-7489-6 ◽

2019 ◽

Vol 79 (12) ◽

Cited By ~ 5

Author(s):

G. Aad ◽

◽

B. Abbott ◽

D. C. Abbott ◽

A. Abed Abud ◽

...

Keyword(s):

Charged Particle ◽

Transverse Momentum ◽

Correlation Coefficient ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Gluon Plasma ◽

Pearson Correlation Coefficient ◽

Order Flow ◽

Momentum Range ◽

Ion Collisions

AbstractTo assess the properties of the quark–gluon plasma formed in ultrarelativistic ion collisions, the ATLAS experiment at the LHC measures a correlation between the mean transverse momentum and the flow harmonics. The analysis uses data samples of lead–lead and proton–lead collisions obtained at the centre-of-mass energy per nucleon pair of 5.02 TeV, corresponding to total integrated luminosities of $$22~\upmu \text {b}^{-1}$$22μb-1 and $$28~\text {nb}^{-1}$$28nb-1, respectively. The measurement is performed using a modified Pearson correlation coefficient with the charged-particle tracks on an event-by-event basis. The modified Pearson correlation coefficients for the 2nd-, 3rd-, and 4th-order flow harmonics are measured in the lead–lead collisions as a function of event centrality quantified as the number of charged particles or the number of nucleons participating in the collision. The measurements are performed for several intervals of the charged-particle transverse momentum. The correlation coefficients for all studied harmonics exhibit a strong centrality evolution, which only weakly depends on the charged-particle momentum range. In the proton–lead collisions, the modified Pearson correlation coefficient measured for the 2nd-order flow harmonics shows only weak centrality dependence. The lead-lead data is qualitatively described by the predictions based on the hydrodynamical model.

Download Full-text

Breakdown Analysis of Pearson Correlation Coefficient and Robust Correlation Methods

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/917/1/012065 ◽

2020 ◽

Vol 917 ◽

pp. 012065

Author(s):

Friday Zinzendoff Okwonu ◽

Bolaji Laro Asaju ◽

Festus Irimisose Arunaye

Keyword(s):

Correlation Coefficient ◽

Pearson Correlation ◽

Pearson Correlation Coefficient ◽

Correlation Methods

Download Full-text

The effect of education and sporting experience of iranian premier league football players on their awareness of sports law

Revista Tempos e Espaços em Educação ◽

10.20952/revtee.v13i32.14924 ◽

2020 ◽

Vol 13 (32) ◽

pp. 1-15

Author(s):

Mahdi Azimi ◽

Seyed Amir Reza Hosseinipour Rafsanjani ◽

Mona Torkaman

Keyword(s):

Correlation Coefficient ◽

Pearson Correlation ◽

Percent Level ◽

Football Players ◽

Pearson Correlation Coefficient ◽

Spearman Correlation ◽

Sports Law ◽

Premier League ◽

History Of ◽

Kendall Correlation

The purpose of this study was to investigate the relationship between education and athletic background of Premier League football players on their awareness of sports Law. In this study, descriptive-analytical method and Spearman correlation coefficient were used. Ninety-five percent level was used as the criterion for rejecting the hypothesis. Questionnaires about variables were used to identify the sample group and the main questions about players' legal awareness were used for data collection. The results showed that Pearson correlation coefficient was 0.107, Spearman correlation coefficient was 0.204 and Kendall correlation coefficient was 0.139 and significant was 0.88. There is no relationship between awareness of sports law and the history of sports in the Premier League. Pearson correlation coefficient of 0.388, Spearman correlation coefficient of 0.204 and Kendall correlation coefficient of 0.139 and significant value of 0.001 indicated that there is a relationship between sport legal awareness and education of the Iranian Premier League players.

Download Full-text

Real-time Prediction of the Daily Incidence of COVID-19 in 215 Countries and Territories Using Machine Learning: Model Development and Validation (Preprint)

10.2196/preprints.24285 ◽

2020 ◽

Author(s):

Yuanyuan Peng ◽

Xinjian Chen ◽

Yibiao Rong ◽

Chi Pui Pang ◽

Xinjian Chen ◽

...

Keyword(s):

Machine Learning ◽

Real Time ◽

Correlation Coefficient ◽

Pearson Correlation ◽

Google Trends ◽

Pearson Correlation Coefficient ◽

Spearman Correlation ◽

Time Prediction ◽

Search Volume ◽

The Mean

BACKGROUND Advanced prediction of the daily incidence of COVID-19 can aid policy making on the prevention of disease spread, which can profoundly affect people's livelihood. In previous studies, predictions were investigated for single or several countries and territories. OBJECTIVE We aimed to develop models that can be applied for real-time prediction of COVID-19 activity in all individual countries and territories worldwide. METHODS Data of the previous daily incidence and infoveillance data (search volume data via Google Trends) from 215 individual countries and territories were collected. A random forest regression algorithm was used to train models to predict the daily new confirmed cases 7 days ahead. Several methods were used to optimize the models, including clustering the countries and territories, selecting features according to the importance scores, performing multiple-step forecasting, and upgrading the models at regular intervals. The performance of the models was assessed using the mean absolute error (MAE), root mean square error (RMSE), Pearson correlation coefficient, and Spearman correlation coefficient. RESULTS Our models can accurately predict the daily new confirmed cases of COVID-19 in most countries and territories. Of the 215 countries and territories under study, 198 (92.1%) had MAEs <10 and 187 (87.0%) had Pearson correlation coefficients >0.8. For the 215 countries and territories, the mean MAE was 5.42 (range 0.26-15.32), the mean RMSE was 9.27 (range 1.81-24.40), the mean Pearson correlation coefficient was 0.89 (range 0.08-0.99), and the mean Spearman correlation coefficient was 0.84 (range 0.2-1.00). CONCLUSIONS By integrating previous incidence and Google Trends data, our machine learning algorithm was able to predict the incidence of COVID-19 in most individual countries and territories accurately 7 days ahead.

Download Full-text

Generalized R-squared for detecting dependence

Biometrika ◽

10.1093/biomet/asw071 ◽

2017 ◽

Vol 104 (1) ◽

pp. 129-139 ◽

Cited By ~ 8

Author(s):

X. Wang ◽

B. Jiang ◽

J. S. Liu

Keyword(s):

Correlation Coefficient ◽

Constant Error ◽

Fundamental Problem ◽

Pearson Correlation ◽

Random Variables ◽

Error Variance ◽

Pearson Correlation Coefficient ◽

Test Statistics ◽

Linear Relationships ◽

Heteroscedastic Errors

SUMMARY Detecting dependence between two random variables is a fundamental problem. Although the Pearson correlation coefficient is effective for capturing linear dependence, it can be entirely powerless for detecting nonlinear and/or heteroscedastic patterns. We introduce a new measure, G-squared, to test whether two univariate random variables are independent and to measure the strength of their relationship. The G-squared statistic is almost identical to the square of the Pearson correlation coefficient, R-squared, for linear relationships with constant error variance, and has the intuitive meaning of the piecewise R-squared between the variables. It is particularly effective in handling nonlinearity and heteroscedastic errors. We propose two estimators of G-squared and show their consistency. Simulations demonstrate that G-squared estimators are among the most powerful test statistics compared with several state-of-the-art methods.

Download Full-text

Inter- and intra-observer reliability of quantitative sensory testing performed with the SMall animal ALGOmeter (SMALGO) to evaluate pain associated with feline gingivostomatitis

Journal of Feline Medicine and Surgery ◽

10.1177/1098612x19837343 ◽

2019 ◽

Vol 22 (4) ◽

pp. 271-276 ◽

Cited By ~ 1

Author(s):

Hanna Machin ◽

Serena Pevere ◽

Chiara Adami

Keyword(s):

Correlation Coefficient ◽

Quantitative Sensory Testing ◽

Statistical Tests ◽

Pearson Correlation ◽

Small Animal ◽

Correlation Coefficients ◽

Pearson Correlation Coefficient ◽

Sensory Testing ◽

Observer Reliability ◽

Mechanical Thresholds

Objectives The aim of this study was to evaluate the inter- and intra-observer reliability of quantitative sensory testing performed with the SMall animal ALGOmeter (SMALGO) in healthy cats and in cats with chronic gingivostomatitis (CGS), and to evaluate the SMALGO as a tool to detect and quantify pain in cats with CGS. Methods Thirty cats from a private shelter were included and assigned to one of two groups: group C (healthy cats; n = 15) or group CGS (cats with CGS; n = 15). In all cats the mechanical thresholds were measured with the SMALGO, with the sensor tip applied to the superior lip above the canine root, by two independent investigators (A, experienced; B, unexperienced), on two different occasions (day 1 and day 2) with a 24 h interval. A CGS scale was used in the diseased cats to assess the severity of the condition. For the reliability analysis, intra-class correlation coefficients (ICCs) were calculated. Other statistical tests used were Pearson correlation coefficient and a paired t-test. Results The inter- and intra-observer levels of agreement were fair (ICC = 0.50) and good, respectively (ICC = 0.73 for investigator A; ICC = 0.60 for investigator B). However, the thresholds measured in healthy cats (169 ± 59 g) did not differ from those obtained from diseased cats (156 ± 82 g; P = 0.35). There was no correlation between the scores of the CGS scale and the thresholds measured in diseased cats (Pearson correlation coefficient = 0.047; P = 0.87). Conclusions and relevance Quantitative sensory testing performed with the SMALGO in cats is repeatable and reliable, regardless of the expertise of the investigator. However, the findings of this study suggest that the mechanical thresholds measured with the SMALGO may not be a valuable indicator of pain in cats with CGS.

Download Full-text

Identifying the Source of Weathered Petroleum: Matching Infrared Spectra with Correlation Coefficients

Applied Spectroscopy ◽

10.1366/000370277774464138 ◽

1977 ◽

Vol 31 (6) ◽

pp. 524-527 ◽

Cited By ~ 10

Author(s):

Carl D. Baer ◽

Chris W. Brown

Keyword(s):

Correlation Coefficient ◽

Similarity Measure ◽

Infrared Spectra ◽

Nearest Neighbor ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Pearson Correlation Coefficient ◽

K Nearest Neighbor ◽

Data Set ◽

Novel Method

A novel method is presented for identifying the source of weathered petroleum by measuring the similarity between the spectrum of a weathered oil and spectra of artificially weathered samples. The Pearson correlation coefficient is used as the similarity measure, and three separate K-nearest neighbor approaches are tested on the data set.

Download Full-text

Measuring the Relationship of Bivariate Data Using Hodges-Lehman Estimator

ASM Science Journal ◽

10.32802/asmscj.2020.sm26(1.11) ◽

2020 ◽

pp. 1-5

Author(s):

Suhaida Abdullah ◽

Nur Amira Zakaria ◽

Nor Aishah Ahad ◽

Norhayati Yusof ◽

Sharipah Soaad Syed Yahaya

Keyword(s):

Correlation Coefficient ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Median Absolute Deviation ◽

Poor Correlation ◽

Pearson Correlation Coefficient ◽

Absolute Deviation ◽

Bivariate Data ◽

Relationship Of ◽

The Relationship

The relationship of bivariate data ordinarily measured using correlation coefficient. The most commonly used correlation coefficient is the Pearson correlation coefficient. This coefficient is well-known as the best coefficient for interval or ratio bivariate data with a linear relationship. Even though this coefficient is good under the mentioned condition, it also becomes very sensitive to a small departure from linearity. Usually, this is because of the existence of an outlier. For that reason, this paper provides new robust correlation coefficients which combine the elements of nonparametric technique from the Hodges Lehmann estimator and the parametric technique based on the Pearson correlation coefficient. This paper also introduces different scale estimators such as median and median absolute deviation (MADn) and denoted by rHL(med) and rHL(MADn) respectively. The performance of the proposed correlation coefficients is measured by the coefficient values and these values are also being compared to the Pearson correlation coefficient and several existing robust correlation coefficients. The results show that the Pearson correlation coefficient (r) with no doubt is very good under perfect data condition, but with only 10% outliers, it not only give poor correlation value but turns the direction of the relationship to negative. While the rHL(med) and rHL(MADn) offer the highest coefficient values and these values are robust to the existence of outliers by up to 30%. With very good performance under all data conditions yet simple in the calculation, the rHL(med) and rHL(MADn) is considered a good alternative to the r when need to deal with outliers. Keywords: correlation coefficient; Hodges Lehmann; median; median absolute deviation (MADn)

Download Full-text