Searching for clusters in population data

2021 ◽  
Vol 3 ◽  
Author(s):  
A.M. Pyatnitskiy ◽  
◽  
V.M. Gukasov ◽  
A.S. Smirnov

The article continues the series of publications developing new statistically motivated approach to data clustering. Proposed method is applied for searching clusters of increased or decreased frequencies of some events in sets of neighboring cells in two dimensional tessellations of plane. Such cells may correspond to administrative regions, counties etc. The case of simple frequency tables (histograms) with rectangular cells was considered earlier. The observed distribution of event frequencies in cells can be compared either with expected one (for instance uniform) or with distribution corresponding to the previous moment of time. The groups of neighboring cells with the same direction of changes are unified in clusters which are checked to be statistically significant with account on multiple comparisons. Each group of cells is characterized with two parameters – its size (the number of cells) and the intensity of changing. If the size of group or (and) its intensity are too pronounced then such group is considered to be statistically significant cluster. There are no a priori suggestions concerning the number, size or shape of potentially existing clusters. Method can be used for clustering any multidimensional arrays of p-values which are independent and uniformly distributed according null hypothesis, while alternative is that there are sets of neighboring cells where p-values are close to 0 or to 1.

2021 ◽  
Vol 2 ◽  
pp. 7-17
Author(s):  
Alexey Pyatnitskiy ◽  
◽  
V. Gukasov ◽  
Anton Smirnov ◽  
◽  
...  

The article continues the series of publications developing the new general approach to data clustering. Here proposed method is applied for searching clusters of increased or decreased frequencies in two dimensional frequency tables (histograms) of population-based data. The observed frequency table can be compared either with the expected one (for instance uniform) or with some table corresponding to the previous moment of time. The regions with significantly changed frequencies are revealed. It allows performing statistically based control of the dynamic of epidemiological process or ecological monitoring. The groups of neighboring cells of frequency tables with the same direction of changes are unified in clusters which are checked to be statistically significant with account on multiple comparisons. Each group of cells is characterized with two parameters – its size (the number of cells) and the intensity of changing. If the size of group or (and) its intensity are too pronounced then such group is considered to be statistically significant cluster. No a priori suggestions concerning the number or shape of potentially existing clusters are made. Method can be generalized to multidimensional tables, not needs Monte Carlo simulations and can be used while comparing any frequency tables being supplemental to global nonparametric criteria such as Pearson chi-square criteria.


Econometrics ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 26 ◽  
Author(s):  
David Trafimow

There has been much debate about null hypothesis significance testing, p-values without null hypothesis significance testing, and confidence intervals. The first major section of the present article addresses some of the main reasons these procedures are problematic. The conclusion is that none of them are satisfactory. However, there is a new procedure, termed the a priori procedure (APP), that validly aids researchers in obtaining sample statistics that have acceptable probabilities of being close to their corresponding population parameters. The second major section provides a description and review of APP advances. Not only does the APP avoid the problems that plague other inferential statistical procedures, but it is easy to perform too. Although the APP can be performed in conjunction with other procedures, the present recommendation is that it be used alone.


PEDIATRICS ◽  
1989 ◽  
Vol 84 (6) ◽  
pp. A30-A30
Author(s):  
Student

Often investigators report many P values in the same study. The expected number of P values smaller than 0.05 is 1 in 20 tests of true null hypotheses; therefore the probability that at least one P value will be smaller than 0.05 increases with the number of tests, even when the null hypothesis is correct for each test. This increase is known as the "multiple-comparisons" problem...One reasonable way to correct for multiplicity is simply to multiply the P value by the number of tests. Thus, with five tests, an orignal 0.05 level for each is increased, perhaps to a value as high as 0.25 for the set. To achieve a level of not more than 0.05 for the set, we need to choose a level of 0.05/5 = 0.01 for the individual tests. This adjustment is conservative. We know only that the probability does not exceed 0.05 for the set.


Genes ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 1160
Author(s):  
Atsuko Okazaki ◽  
Sukanya Horpaopan ◽  
Qingrun Zhang ◽  
Matthew Randesi ◽  
Jurg Ott

Some genetic diseases (“digenic traits”) are due to the interaction between two DNA variants, which presumably reflects biochemical interactions. For example, certain forms of Retinitis Pigmentosa, a type of blindness, occur in the presence of two mutant variants, one each in the ROM1 and RDS genes, while the occurrence of only one such variant results in a normal phenotype. Detecting variant pairs underlying digenic traits by standard genetic methods is difficult and is downright impossible when individual variants alone have minimal effects. Frequent pattern mining (FPM) methods are known to detect patterns of items. We make use of FPM approaches to find pairs of genotypes (from different variants) that can discriminate between cases and controls. Our method is based on genotype patterns of length two, and permutation testing allows assigning p-values to genotype patterns, where the null hypothesis refers to equal pattern frequencies in cases and controls. We compare different interaction search approaches and their properties on the basis of published datasets. Our implementation of FPM to case-control studies is freely available.


2021 ◽  
pp. 1-10
Author(s):  
Mansour H. Al-Askar ◽  
Fahad A. Abdullatif ◽  
Abdulmonem A. Alshihri ◽  
Asma Ahmed ◽  
Darshan Devang Divakar ◽  
...  

BACKGROUND AND OBJECTIVE: The aim of this study was to compare the efficacy of photobiomodulation therapy (PBMT) and photodynamic therapy (PDT) as adjuncts to mechanical debridement (MD) for the treatment of peri-implantitis. The present study is based on the null hypothesis that there is no difference in the peri-implant inflammatory parameters (modified plaque index [mPI], modified gingival index [mGI], probing depth [PD]) and crestal bone loss (CBL) following MD either with PBMT or PDT in patients with peri-implantitis. METHODS: Forty-nine patients with peri-implantitis were randomly categorized into three groups. In Groups 1 and 2, patients underwent MD with adjunct PBMT and PDT, respectively. In Group 3, patients underwent MD alone (controls). Peri-implant inflammatory parameters were measured at baseline and 3-months follow-up. P-values < 0.01 were considered statistically significant. RESULTS: At baseline, peri-implant clinicoradiographic parameters were comparable in all groups. Compared with baseline, there was a significant reduction in mPI (P< 0.001), mGI (P< 0.001) and PD (P< 0.001) in Groups 1 and 2 at 3-months follow-up. In Group 3, there was no difference in the scores of mPI, mGI and PD at follow-up. At 3-months follow-up, there was no difference in mPI, mGI and PD among patients in Groups 1 and 2. The mPI (P< 0.001), mGI (P< 0.001) and PD (P< 0.001) were significantly higher in Group 3 than Groups 1 and 2. The CBL was comparable in all groups at follow-up. CONCLUSION: PBMT and PDT seem to be useful adjuncts to MD for the treatment of peri-implant soft-tissue inflammation among patients with peri-implantitis.


2019 ◽  
Vol 18 (1) ◽  
pp. 46-62
Author(s):  
NOELLE M. CROOKS ◽  
ANNA N. BARTEL ◽  
MARTHA W. ALIBALI

In recent years, there have been calls for researchers to report and interpret confidence intervals (CIs) rather than relying solely on p-values. Such reforms, however, may be hindered by a general lack of understanding of CIs and how to interpret them. In this study, we assessed conceptual knowledge of CIs in undergraduate and graduate psychology students. CIs were difficult and prone to misconceptions for both groups. Connecting CIs to estimation and sample mean concepts was associated with greater conceptual knowledge of CIs. Connecting CIs to null hypothesis  significance testing, however, was not associated with conceptual knowledge of CIs. It may therefore be beneficial to focus on estimation and sample mean concepts in instruction about CIs. First published May 2019 at Statistics Education Research Journal Archives


2014 ◽  
Vol 13 (1) ◽  
pp. 53-65 ◽  
Author(s):  
ROBYN REABURN

This study aimed to gain knowledge of students’ beliefs and difficulties in understanding p-values, and to use this knowledge to develop improved teaching programs. This study took place over four consecutive teaching semesters of a one-semester tertiary statistics unit. The study was cyclical, in that the results of each semester were used to inform the instructional design for the following semester. Over the semesters, the following instructional techniques were introduced: computer simulation, the introduction of hypothetical probabilistic reasoning using a familiar context, and the use of alternative representations. The students were also encouraged to write about their work. As the interventions progressed, a higher proportion of students successfully defined and used p-values in Null Hypothesis Testing procedures. First published May 2014 at Statistics Education Research Journal Archives


Author(s):  
Atsuko Okazaki ◽  
Sukanya Horpaopan ◽  
Qingrun Zhang ◽  
Matthew Randesi ◽  
Jurg Ott

Some genetic diseases (&ldquo;digenic traits&rdquo;) are due to the interaction between two DNA variants, which presumably reflects biochemical interactions. For example, certain forms of Retinitis Pigmentosa, a type of blindness, occur in the presence of two mutant variants, one each in the ROM1 and RDS genes, while occurrence of only one such variant results in a normal phenotype. Detecting variant pairs underlying digenic traits by standard genetic methods is difficult and is downright impossible when individual variants alone have minimal effects. Frequent Pattern Mining (FPM) methods are known to detect patterns of items. We make use of FPM approaches to find pairs of genotypes (from different variants) that can discriminate between cases and controls. Our method is based on genotype patterns of length two, and permutation testing allows assigning p-values to genotype patterns, where the null hypothesis refers to equal pattern frequencies in cases and controls. We compare different interaction search approaches and their properties on the basis of published datasets. Our implementation of FPM to case-control studies is freely available.


Author(s):  
Valentin Amrhein ◽  
Fränzi Korner-Nievergelt ◽  
Tobias Roth

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (American Statistical Association, Wasserstein & Lazar 2016). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values can tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p≤0.05) is hardly replicable: at a realistic statistical power of 40%, given that there is a true effect, only one in six studies will significantly replicate the significant result of another study. Even at a good power of 80%, results from two studies will be conflicting, in terms of significance, in one third of the cases if there is a true effect. This means that a replication cannot be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgement based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to publication bias against nonsignificant findings. Data dredging, p-hacking and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the observed effect size, e.g., from a sample average, and from a measure of uncertainty, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, such as 'we need more stringent decision rules', 'sample sizes will decrease' or 'we need to get rid of p-values'.


2018 ◽  
Vol 7 (3) ◽  
pp. 63-69
Author(s):  
Suzanne L. Havstad ◽  
George W. Divine

ABSTRACT In this first of a two-part series on introductory biostatistics, we briefly describe common designs. The advantages and disadvantages of six design types are highlighted. The randomized clinical trial is the gold standard to which other designs are compared. We present the benefits of randomization and discuss the importance of power and sample size. Sample size and power calculations for any design need to be based on meaningful effects of interest. We give examples of how the effect of interest and the sample size interrelate. We also define concepts helpful to the statistical inference process. When drawing conclusions from a completed study, P values, point estimates, and confidence intervals will all assist the researcher. Finally, the issue of multiple comparisons is briefly explored. The second paper in this series will describe basic analytical techniques and discuss some common mistakes in the interpretation of data.


Sign in / Sign up

Export Citation Format

Share Document