significance measures
Recently Published Documents


TOTAL DOCUMENTS

28
(FIVE YEARS 7)

H-INDEX

8
(FIVE YEARS 1)

PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0252991
Author(s):  
Werner A. Stahel

The p-value has been debated exorbitantly in the last decades, experiencing fierce critique, but also finding some advocates. The fundamental issue with its misleading interpretation stems from its common use for testing the unrealistic null hypothesis of an effect that is precisely zero. A meaningful question asks instead whether the effect is relevant. It is then unavoidable that a threshold for relevance is chosen. Considerations that can lead to agreeable conventions for this choice are presented for several commonly used statistical situations. Based on the threshold, a simple quantitative measure of relevance emerges naturally. Statistical inference for the effect should be based on the confidence interval for the relevance measure. A classification of results that goes beyond a simple distinction like “significant / non-significant” is proposed. On the other hand, if desired, a single number called the “secured relevance” may summarize the result, like the p-value does it, but with a scientifically meaningful interpretation.


Mathematics ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 958
Author(s):  
Maike Tormählen ◽  
Galiya Klinkova ◽  
Michael Grabinski

Statistical significance measures the reliability of a result obtained from a random experiment. We investigate the number of repetitions needed for a statistical result to have a certain significance. In the first step, we consider binomially distributed variables in the example of medication testing with fixed placebo efficacy, asking how many experiments are needed in order to achieve a significance of 95%. In the next step, we take the probability distribution of the placebo efficacy into account, which to the best of our knowledge has not been done so far. Depending on the specifics, we show that in order to obtain identical significance, it may be necessary to perform twice as many experiments than in a setting where the placebo distribution is neglected. We proceed by considering more general probability distributions and close with comments on some erroneous assumptions on probability distributions which lead, for instance, to a trivial explanation of the fat tail.


Author(s):  
Maike Tormählen ◽  
Galiya Klinkova ◽  
Michael Grabinski

Statistical significance measures the reliability of a result obtained from a random experiment. We investigate the number of repetitions needed for a statistical result to have a certain significance. In the first step, we consider binomially distributed variables in the example of medication testing with fixed placebo efficacy, asking how many experiments are needed in order to achieve a significance of 95 %. In the next step, we take the probability distribution of the placebo efficacy into account, which to the best of our knowledge has not been done so far. Depending on the specifics, we show that in order to obtain identical significance, it may be necessary to perform twice as many experiments than in a setting where the placebo distribution is neglected. We proceed by considering more general probability distributions and close with comments on some erroneous assumptions on probability distributions which lead, for instance, to a trivial explanation of the fat tail.


2020 ◽  
Author(s):  
D.C.L. Handler ◽  
P.A. Haynes

AbstractAssessment of replicate quality is an important process for any shotgun proteomics experiment. One fundamental question in proteomics data analysis is whether any specific replicates in a set of analyses are biasing the downstream comparative quantitation. In this paper, we present an experimental method to address such a concern. PeptideMind uses a series of clustering Machine Learning algorithms to assess outliers when comparing proteomics data from two states with six replicates each. The program is a JVM native application written in the Kotlin language with Python sub-process calls to scikit-learn. By permuting the six data replicates provided into four hundred triplet non redundant pairwise comparisons, PeptideMind determines if any one replicate is biasing the downstream quantitation of the states. In addition, PeptideMind generates useful visual representations of the spread of the significance measures, allowing researchers a rapid, effective way to monitor the quality of those identified proteins found to be differentially expressed between sample states.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 100
Author(s):  
Martin Vogt ◽  
Jürgen Bajorath

The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 100
Author(s):  
Martin Vogt ◽  
Jürgen Bajorath

The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.


2019 ◽  
Vol 9 (14) ◽  
pp. 2841 ◽  
Author(s):  
Nan Zhang ◽  
Xueyi Gao ◽  
Tianyou Yu

Attribute reduction is a challenging problem in rough set theory, which has been applied in many research fields, including knowledge representation, machine learning, and artificial intelligence. The main objective of attribute reduction is to obtain a minimal attribute subset that can retain the same classification or discernibility properties as the original information system. Recently, many attribute reduction algorithms, such as positive region preservation, generalized decision preservation, and distribution preservation, have been proposed. The existing attribute reduction algorithms for generalized decision preservation are mainly based on the discernibility matrix and are, thus, computationally very expensive and hard to use in large-scale and high-dimensional data sets. To overcome this problem, we introduce the similarity degree for generalized decision preservation. On this basis, the inner and outer significance measures are proposed. By using heuristic strategies, we develop two quick reduction algorithms for generalized decision preservation. Finally, theoretical and experimental results show that the proposed heuristic reduction algorithms are effective and efficient.


Data ◽  
2018 ◽  
Vol 3 (4) ◽  
pp. 57 ◽  
Author(s):  
Olexiy Azarov ◽  
Leonid Krupelnitsky ◽  
Hanna Rakytyanska

The purpose of this study is to control the ratio of programs of different genres whenforming the broadcast grid in order to increase and maintain the rating of a channel. In themultichannel environment, television rating controls consist of selecting content, the ratings ofwhich are completely restored after advertising. The hybrid approach to rule set refinement basedon fuzzy relational calculus simplifies the process of expert recommendation systems construction.By analogy with the problem of the inverted pendulum control, the managerial actions aim to retainthe balance between the fuzzy demand and supply. The increase or decrease trends of the demandand supply are described by primary fuzzy relations. The rule-based solutions of fuzzy relationalequations connect significance measures of the primary fuzzy terms. Program set refinement bysolving fuzzy relational equations allows avoiding procedures of content-based selective filtering.The solution set generation corresponds to the granulation of television time, where each solutionrepresents the time slot and the granulated rating of the content. In automated media planning,generation of the weekly TV program in the form of the granular solution provides the decrease oftime needed for the programming of the channel broadcast grid.


Author(s):  
Olexiy Azarov ◽  
Leonid Krupelnitsky ◽  
Hanna Rakytyanska

The purpose of the study is to control the ratio of programs of different genres when forming the broadcast grid in order to increase and maintain the rating of the channel. In the multichannel environment, television rating control consists of selecting such content, ratings of which are completely restored after advertising. A hybrid approach combining the benefits of semantic training and fuzzy relational equations in simplification of the expert recommendation systems construction is proposed. The problem of retaining the television rating can be attributed to the problems of fuzzy resources control. The increase or decrease trends of the demand and supply are described by primary fuzzy relations. The rule-based solutions of fuzzy relational equations connect significance measures of the primary fuzzy terms. Rules refinement by solving fuzzy relational equations allows avoiding labor-intensive procedures for the generation and selection of expert rules. The solution set generation corresponds to the granulation of the television time, where each solution represents the time slot and the granulated rating of the content. In automated media planning, generation of the weekly TV program in the form of the granular solutions provides the decrease of the time needed for the programming of the channel broadcast grid.


Sign in / Sign up

Export Citation Format

Share Document