scholarly journals Comparison of the power of statistical tests in connection with the discussion about the reproducibility criterion

2018 ◽  
Vol 15 (5) ◽  
pp. 4-14
Author(s):  
V. E. Osipov

The criterion of reproducibility, as well as its functioning in post-non-classical science, are discussed in the Russian methodology of science. At the same time, critics avoid statistical calculations in their arguments. This raises the following questions: “What is reproducibility?” and “What is the mathematical formulation of the reproducibility criterion?” Literature review has identified five indicators of reproducibility, which was proposed by foreign colleagues. These indicators are being tested and discussed. However, there is no General mathematical formulation of the reproducibility criterion (an integral criterion covering these indicators), and these indicators have not yet become a standard. In the present work, we compare two statistical tests, related to one of these five indicators of reproducibility.Purpose of the study. The aim of this paper is to compare the powers of two tests of statistical significance that can be used to reveal the effect with the requirement of reproducibility of research results. In this case, the reproducibility is estimated by the indicator “significance”. In accordance with the first criterion, the effect is considered to be revealed if the effect size in all studies is significant (i.e. if the significance of the effect size is reproduced in all studies). In accordance with the second criterion, the effect is considered to be revealed if the weighted mean of the effect size obtained as a result of meta-analysis is significant (the significance of the effect size may be absent in individual studies).Materials and methods. Methods of mathematical statistics are used to achieve this goal. The powers of two tests are compared by two estimates. The first estimate is theoretical. The second one was obtained during a statistical experiment. The powers are calculated: 1) for different values of the Cohen’s effect size: “small”, “medium” and “large”, 2) for different degree of heterogeneity: zero (fixed-effect primary studies (from 2 to 8).Results. The power of the first test is less or much less than the power of the second one. The power of the first test decreases with the growth of the number of primary studies, and the power of the second one increases. Taking into account the conventional power value equal to 80%, the first criterion is unsuitable for use in the considered values of the parameters of primary studies (that is, if a two-tailed t-test with the significance level of 0.05 and with two samples of the typical length n=25 is used to determine the significance of the effect size in individual studies), while the power of the second test can be increased if necessary by increasing the number of primary studies included in the meta-analysis.Conclusion. If the criterion of reproducibility, known from the philosophy of science, is intended to confirm the existence of the effect (connection) or, in other words, to reveal the effect, in conditions where there is a significant random component in the measurement process, it is advisable to apply not the first, but the second test.

F1000Research ◽  
2016 ◽  
Vol 4 ◽  
pp. 1188 ◽  
Author(s):  
Daryl Bem ◽  
Patrizio E. Tressoldi ◽  
Thomas Rabeyron ◽  
Michael Duggan

In 2011, one of the authors (DJB) published a report of nine experiments in the Journal of Personality and Social Psychology purporting to demonstrate that an individual’s cognitive and affective responses can be influenced by randomly selected stimulus events that do not occur until after his or her responses have already been made and recorded, a generalized variant of the phenomenon traditionally denoted by the term precognition. To encourage replications, all materials needed to conduct them were made available on request. We here report a meta-analysis of 90 experiments from 33 laboratories in 14 countries which yielded an overall effect greater than 6 sigma, z = 6.40, p = 1.2 × 10-10  with an effect size (Hedges’ g) of 0.09. A Bayesian analysis yielded a Bayes Factor of 5.1 × 109, greatly exceeding the criterion value of 100 for “decisive evidence” in support of the experimental hypothesis. When DJB’s original experiments are excluded, the combined effect size for replications by independent investigators is 0.06, z = 4.16, p = 1.1 × 10-5, and the BF value is 3,853, again exceeding the criterion for “decisive evidence.” The number of potentially unretrieved experiments required to reduce the overall effect size of the complete database to a trivial value of 0.01 is 544, and seven of eight additional statistical tests support the conclusion that the database is not significantly compromised by either selection bias or by intense “p-hacking”—the selective suppression of findings or analyses that failed to yield statistical significance. P-curve analysis, a recently introduced statistical technique, estimates the true effect size of the experiments to be 0.20 for the complete database and 0.24 for the independent replications, virtually identical to the effect size of DJB’s original experiments (0.22) and the closely related “presentiment” experiments (0.21). We discuss the controversial status of precognition and other anomalous effects collectively known as psi.


Nutrients ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 404
Author(s):  
Emma Altobelli ◽  
Paolo Matteo Angeletti ◽  
Ciro Marziliano ◽  
Marianna Mastrodomenico ◽  
Anna Rita Giuliani ◽  
...  

Diabetes mellitus is an important issue for public health, and it is growing in the world. In recent years, there has been a growing research interest on efficacy evidence of the curcumin use in the regulation of glycemia and lipidaemia. The molecular structure of curcumins allows to intercept reactive oxygen species (ROI) that are particularly harmful in chronic inflammation and tumorigenesis models. The aim of our study performed a systematic review and meta-analysis to evaluate the effect of curcumin on glycemic and lipid profile in subjects with uncomplicated type 2 diabetes. The papers included in the meta-analysis were sought in the MEDLINE, EMBASE, Scopus, Clinicaltrials.gov, Web of Science, and Cochrane Library databases as of October 2020. The sizes were pooled across studies in order to obtain an overall effect size. A random effects model was used to account for different sources of variation among studies. Cohen’s d, with 95% confidence interval (CI) was used as a measure of the effect size. Heterogeneity was assessed while using Q statistics. The ANOVA-Q test was used to value the differences among groups. Publication bias was analyzed and represented by a funnel plot. Curcumin treatment does not show a statistically significant reduction between treated and untreated patients. On the other hand, glycosylated hemoglobin, homeostasis model assessment (HOMA), and low-density lipoprotein (LDL) showed a statistically significant reduction in subjects that were treated with curcumin, respectively (p = 0.008, p < 0.001, p = 0.021). When considering HBA1c, the meta-regressions only showed statistical significance for gender (p = 0.034). Our meta-analysis seems to confirm the benefits on glucose metabolism, with results that appear to be more solid than those of lipid metabolism. However, further studies are needed in order to test the efficacy and safety of curcumin in uncomplicated type 2 diabetes.


1990 ◽  
Vol 24 (3) ◽  
pp. 405-415 ◽  
Author(s):  
Nathaniel McConaghy

Meta-analysis replaced statistical significance with effect size in the hope of resolving controversy concerning evaluation of treatment effects. Statistical significance measured reliability of the effect of treatment, not its efficacy. It was strongly influenced by the number of subjects investigated. Effect size as assessed originally, eliminated this influence but by standardizing the size of the treatment effect could distort it. Meta-analyses which combine the results of studies which employ different subject types, outcome measures, treatment aims, no-treatment rather than placebo controls or therapists with varying experience can be misleading. To ensure discussion of these variables meta-analyses should be used as an aid rather than a substitute for literature review. While meta-analyses produce contradictory findings, it seems unwise to rely on the conclusions of an individual analysis. Their consistent finding that placebo treatments obtain markedly higher effect sizes than no treatment hopefully will render the use of untreated control groups obsolete.


2006 ◽  
Vol 134 (5) ◽  
pp. 1442-1453 ◽  
Author(s):  
Kuan-Man Xu

Abstract A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries–Matusita distance, and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called “cloud objects.” Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object, and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.


2019 ◽  
Vol 2 (1) ◽  
pp. 64-78
Author(s):  
Susanti Susanti ◽  
Ardian Asyhari ◽  
Rijal Firdaos

Abstract: The purpose of this study is to determine the effectiveness of integrated LKPD Islamic values on problem-based learning on students' scientific literacy. The research method used was an experimental pre-experimental design with one group pretest-posttest design. Data is analyzed by using normalized gain (N-Gain) and Effect Size. Furthermore, the data were tested statistically on the value of the pretest and posttest of students by carrying out the normality, homogeneity, and T-test (paired sample test) using the SPSS 18 program. The average N-gain value obtained was 0.45% and in the medium category. The results of statistical tests show a significance level of 0.00 less than α = 0.05 (sign <0.05) which means that H0 is rejected and H1 is accepted there are differences. The results of this study indicate that with integrated LKPD Islamic values in problem-based learning are able to enhance students' scientific literacy skills in aspects of competence and knowledge in the material of environmental pollution in SMP Negeri 1 Kotaagung Tiimur.Abstrak:Tujuan dari penelitian ini adalah mengetahui efektivitas LKPD terintegrasi nilai Islami pada pembelajaran berbasis masalah terhadap literasi sains peserta didik. Metodepenelitian yang digunakan yaitu penelitian eksperimenpre experimental design denganone group pretest-postest design. Data dianalis dengan menggunkan gain ternormalisasi (N-Gain) dan Effect Size. Selanjutnya data di uji statistik terhadap nilai pretest dan posttest peserta didik dengan melakukan uji normalitas, homogenitas, dan Uji-T (paired sample test) dengan menggunakan program spss 18. Hasil rata-rata nilai N-gain yang diperoleh sebesar 0,45 % dan berada dalam kategori sedang. Hasil uji statistik menunjukan taraf signifikansi sebesar 0,00 lebih kecil dari  = 0,05 (sign < 0,05) yang berarti H0 ditolak dan H1 diterima terdapat perbedaan. Hasil penelitian ini menunjukan bahwa dengan LKPD terintegrasi nilai Islami dalam pembelajaran berbasis masalah mampu meningkatkan kemampuan literasi sains peserta didik pada aspek kompetensi dan pengetahuan pada materi pencemaran lingkungan di SMP Negeri 1 Kotaagung Tiimur.


2017 ◽  
Vol 4 (2) ◽  
pp. 160254 ◽  
Author(s):  
Estelle Dumas-Mallet ◽  
Katherine S. Button ◽  
Thomas Boraud ◽  
Francois Gonon ◽  
Marcus R. Munafò

Studies with low statistical power increase the likelihood that a statistically significant finding represents a false positive result. We conducted a review of meta-analyses of studies investigating the association of biological, environmental or cognitive parameters with neurological, psychiatric and somatic diseases, excluding treatment studies, in order to estimate the average statistical power across these domains. Taking the effect size indicated by a meta-analysis as the best estimate of the likely true effect size, and assuming a threshold for declaring statistical significance of 5%, we found that approximately 50% of studies have statistical power in the 0–10% or 11–20% range, well below the minimum of 80% that is often considered conventional. Studies with low statistical power appear to be common in the biomedical sciences, at least in the specific subject areas captured by our search strategy. However, we also observe evidence that this depends in part on research methodology, with candidate gene studies showing very low average power and studies using cognitive/behavioural measures showing high average power. This warrants further investigation.


2020 ◽  
Vol 41 (S1) ◽  
pp. s308-s308
Author(s):  
Ahmad Umar ◽  
Muawiyyah Sufiyan ◽  
Dahiru Tukur ◽  
Mary Onoja-Alexander ◽  
Lawal Amadu ◽  
...  

Background: Adverse events following immunization (AEFI) surveillance largely depends on the ability of the healthcare worker (HCW) to timely detect and report cases using the correct reporting tools through an appropriate system. AEFI surveillance is carried out regularly during both routine immunization services and supplemental immunization activities in the state. Objective: We assessed knowledge of adverse events following immunization reporting tools and system among primary HCWs in Jigawa state, northwestern Nigeria. Method: A descriptive cross-sectional design was used for this study. A multistage sampling technique was used to select 290 HCWs that had spent at least 6 months in immunization units of primary healthcare centers of Jigawa state. Data were collected using pretested self-administered structured questionnaire with open and closed ended questions and were analyzed using IBM SPSS version 20 software. All statistical tests were 2-tailed with P < .05 as the statistical significance level. Results: Most of the primary HCWs (93.2%) had AEFI reporting forms in their health facilities, and 68.9% said that the AEFI reporting form could be obtained from a focal or contact person in the health facility. Up to 96.4% of the primary HCWs were aware of how to report AEFI. Also, ~76.6% of primary HCWs knew the correct AEFI reporting flow, but only 15.8% knew that only serious AEFIs are reported. Furthermore, ~78.8% and 19.4% of HCWs mentioned telephone and filling forms as some of the appropriate methods of AEFI notification, respectively. Conclusions: Most primary HCWs had reporting forms in their health facilities and were aware of how to report an AEFI. Most of the respondents knew the correct AEFI reporting flow. The state in collaboration with local government authorities should provide quality training on AEFI reporting and reporting system.Funding: NoneDisclosures: None


2019 ◽  
Vol 147 (9-10) ◽  
pp. 534-540
Author(s):  
Zorica Popovic ◽  
Mirjana Djurickovic ◽  
Agima Ljaljevic ◽  
Snezana Matijevic ◽  
Kosovka Obradovic-Djuricic

Introduction/Objective. The quality of life of elderly individuals has an active function in oral health; it is of great importance to learn that elders over the age of 65 years demonstrate an increase in seeking dental services. Oral Health Impact Profile-14 (OHIP-14) is especially suitable for use in the elderly. The aim of this study is to examine the reliability and validity of OHIP-14 in the Montenegrin population aged 65 and over and to determine the influence of oral health on the quality of their life. Methods. The research was conducted from September to December 2016 in the central region of Montenegro, at the Medical University in Podgorica and in the nursing homes of the elderly. The study covered 170 individuals, both sexes, with an average age of 72.32 ? 6.85. The research instrument is OHIP-14 index. Standard statistical tests were used. The statistical significance level is 0.05. Results. The OHIP-14is linguistically and culturally adapted for the Montenegrin population. The value of the Cronbach Alpha Index is 0.892. The relationship between correlations for individual issues and total correlations ranges from 0.21 to 0.69. The value of OHIP-14 is 19.24 ? 7.49. Listed by domains: functional constraints 3.31 ? 1.75; physical pain 4.19 ? 1.31; psychological discomfort 2.52 ? 1.46; physical fitness 4.38 ? 1.40; mental incompetence 1.42 ? 1.23; social incapacity 1.18 ? 1.27 and handicap 2.21 ? 1.32. Conclusion. The OHIP-14 index is reliable and valid and is recommended for use in the Montenegrinspeaking area, for the elderly. There is a significant impact of oral health on the quality of life of the elderly in the central part of Montenegro.


2020 ◽  
Author(s):  
Abay Woday ◽  
Muluken Dessalegn ◽  
Setognal Birara ◽  
Kusse Urmale ◽  
Gebeyaw Biset ◽  
...  

Abstract Background: Birth asphyxia among newborns accounted for nearly fifty percent of neonatal mortality in sub-Saharan African countries. This scenario has been worst in Ethiopia where every two out of three deaths attributed to birth asphyxia among these babies. Moreover, studies conducted in Ethiopia are highly variable and inconclusive to estimate the pooled prevalence and risk factors of birth asphyxia. Objective: This study aims to analyses collectively and systematically the prevalence of birth asphyxia and associated factors among newborns in Ethiopia.Methods: The protocol for this review is registered at PROSPERO with registration number CRD42020158224. A comprehensive online databases (PubMed, HINARI, Scopus, EMBASE, Science direct, and Cochrane library database), Google Scholar, African Journals online, other gray and online repository accessed studies will be searched using different search engines. In addition, maternity & infant care databases uploaded at Ethiopian Health Development Journal and Ethiopian Journal of Health Sciences will be searched until June 30, 2020. Newcastle-Ottawa Quality Assessment Scale (NOS) will be used for critical appraisal of studies.. Three reviewers will screen all retrieved articles, conduct data extraction, and then critically appraise all identified studies. All identified observational studies reporting the prevalence of birth asphyxia and associated factors among neonates in Ethiopia will be considered. The analysis of data will be done using STATA 11.0 statistical software. We will demonstrate pooled estimates and determinants of birth asphyxia with effect size and 95% confidence interval. Heterogeneity among the included studies will be assessed through the Cochrane Q-test statistics and I2 test. Publication bias will be checked using funnel plot and egger’s test. Finally, statistical significance level will be declared at a p-value of less than 0.05. Discussion: the result from this systematic review will inform and guide health policy planners to invest limited resources on maternal and neonatal health. Furthermore, it will be a stimulus for future cumulative meta-analysis researchers in developing nations.


Sign in / Sign up

Export Citation Format

Share Document