Probability estimate and the optimal text size

Reliable language corpus implies a text sample of size n that provides stable probability distributions of linguistic phenomena. The question is what is the minimal (i.e. the optimal) text size at which probabilities of linguistic phenomena become stable. Specifically, we were interested in probabilities of grammatical forms. We started with an a priori assumption that text size of 1.000.000 words is sufficient to provide stable probability distributions. Text of this size we treated as a "quasi-population". Probability distribution derived from the "quasi-population" was then correlated with probability distribution obtained on a minimal sample size (32 items) for a given linguistic category (e.g. nouns). Correlation coefficient was treated as a measure of similarity between the two probability distributions. The minimal sample was increased by geometrical progression, up to the size where correlation between distribution derived from the quasi-population and the one derived from an increased sample reached its maximum (r=1). Optimal sample size was established for grammatical forms of nouns, adjectives and verbs. General formalism is proposed that allows estimate of an optimal sample size from minimal sample (i.e. 32 items).

Download Full-text

Sample size calculation and re-estimation based on the prevalence in a single-arm confirmatory diagnostic accuracy study

Statistical Methods in Medical Research ◽

10.1177/0962280220913588 ◽

2020 ◽

Vol 29 (10) ◽

pp. 2958-2971 ◽

Cited By ~ 1

Author(s):

Maria Stark ◽

Antonia Zapf

Keyword(s):

Diagnostic Accuracy ◽

Sample Size ◽

Sensitivity And Specificity ◽

Sample Size Calculation ◽

Type I ◽

Diagnostic Accuracy Study ◽

Optimal Sample Size ◽

Optimal Sample ◽

Accuracy Study ◽

The One

Introduction In a confirmatory diagnostic accuracy study, sensitivity and specificity are considered as co-primary endpoints. For the sample size calculation, the prevalence of the target population must be taken into account to obtain a representative sample. In this context, a general problem arises. With a low or high prevalence, the study may be overpowered in one subpopulation. One further issue is the correct pre-specification of the true prevalence. With an incorrect assumption about the prevalence, an over- or underestimated sample size will result. Methods To obtain the desired power independent of the prevalence, a method for an optimal sample size calculation for the comparison of a diagnostic experimental test with a prespecified minimum sensitivity and specificity is proposed. To face the problem of an incorrectly pre-specified prevalence, a blinded one-time re-estimation design of the sample size based on the prevalence and a blinded repeated re-estimation design of the sample size based on the prevalence are evaluated by a simulation study. Both designs are compared to a fixed design and additionally among each other. Results The type I error rates of both blinded re-estimation designs are not inflated. Their empirical overall power equals the desired theoretical power and both designs offer unbiased estimates of the prevalence. The repeated re-estimation design reveals no advantages concerning the mean squared error of the re-estimated prevalence or sample size compared to the one-time re-estimation design. The appropriate size of the internal pilot study in the one-time re-estimation design is 50% of the initially calculated sample size. Conclusions A one-time re-estimation design of the prevalence based on the optimal sample size calculation is recommended in single-arm diagnostic accuracy studies.

Download Full-text

Using the potential outcome framework to estimate optimal sample size for cluster randomized trials: a simulation-based algorithm

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2021.1946806 ◽

2021 ◽

pp. 1-27

Author(s):

Ruoshui Zhai ◽

Roee Gutman

Keyword(s):

Sample Size ◽

Randomized Trials ◽

Cluster Randomized Trials ◽

Optimal Sample Size ◽

Potential Outcome ◽

Optimal Sample ◽

Outcome Framework ◽

Simulation Based ◽

Cluster Randomized

Download Full-text

Adjustment of the Analytical Data of Chemical Constituents of Tea Leaf by its Maturity and Optimal Sample Size

Nippon Nōgeikagaku Kaishi ◽

10.1271/nogeikagaku1924.30.31 ◽

1956 ◽

Vol 30 (1) ◽

pp. 31-34

Author(s):

Hideichi TORII ◽

Isao OTA ◽

Jun KANAZAWA

Keyword(s):

Sample Size ◽

Chemical Constituents ◽

Analytical Data ◽

Optimal Sample Size ◽

Optimal Sample ◽

Tea Leaf

Download Full-text

A Simple Note on the Determination of the Optimal Sample Size in χ2-goodness of Fit-test (Comparison of Observed Frequencies with Theoretically Expected Frequencies)

Biometrical Journal ◽

10.1002/bimj.4710200611 ◽

1978 ◽

Vol 20 (6) ◽

pp. 623-624 ◽

Cited By ~ 1

Author(s):

M. H. Hühn

Keyword(s):

Sample Size ◽

Goodness Of Fit ◽

Goodness Of Fit Test ◽

Optimal Sample Size ◽

Optimal Sample

Download Full-text

An Adaptive Algorithm for the Optimal Sample Size in the Non-Stationary Data-Driven Newsvendor Problem

Operations Research/Computer Science Interfaces Series - Extending the Horizons: Advances in Computing, Optimization, and Decision Technologies ◽

10.1007/978-0-387-48793-9_6 ◽

2007 ◽

pp. 77-96 ◽

Cited By ~ 2

Author(s):

Gokhan Metan ◽

Aurélie Thiele

Keyword(s):

Sample Size ◽

Adaptive Algorithm ◽

Newsvendor Problem ◽

Data Driven ◽

Optimal Sample Size ◽

Optimal Sample

Download Full-text

Effectiveness and policies analysis of pool testing method for COVID-19

Kybernetes ◽

10.1108/k-01-2021-0052 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yang Liu ◽

Yi Chen ◽

Kefan Xie ◽

Jia Liu

Keyword(s):

Sample Size ◽

Infection Rate ◽

Large Population ◽

Optimal Sample Size ◽

Content Type ◽

Testing Method ◽

Multiple Sampling ◽

Optimal Sample ◽

Test Result ◽

Sampling Approach

PurposeThis research aims to figure out whether the pool testing method of SARS-CoV-2 for COVID-19 is effective and the optimal sample size is in one bunch. Additionally, since the infection rate was unknown at the beginning, this research aims to propose a multiple sampling approach that enables the pool testing method to be utilized successfully.Design/methodology/approachThe authors verify that the pool testing method of SARS-CoV-2 for COVID-19 is effective under the situation of the shortage of nucleic acid detection kits based on probabilistic modeling. In this method, the testing is performed on several samples of the cases together as a bunch. If the test result of the bunch is negative, then it is shown that none of the cases in the bunch has been infected with the novel coronavirus. On the contrary, if the test result of the bunch is positive, then the samples are tested one by one to confirm which cases are infected.FindingsIf the infection rate is extremely low, while the same number of detection kits is used, the expected number of cases that can be tested by the pool testing method is far more than that by the one-by-one testing method. The pool testing method is effective only when the infection rate is less than 0.3078. The higher the infection rate, the smaller the optimal sample size in one bunch. If N samples are tested by the pool testing method, while the sample size in one bunch is G, the number of detection kits required is in the interval (N/G, N).Originality/valueThis research proves that the pool testing method is not only suitable for the situation of the shortage of detection kits but also the situation of the overall or sampling detection for a large population. More importantly, it calculates the optimal sample size in one bunch corresponding to different infection rates. Additionally, a multiple sampling approach is proposed. In this approach, the whole testing process is divided into several rounds in which the sample sizes in one bunch are different. The actual infection rate is estimated gradually precisely by sampling inspection in each round.

Download Full-text

Estimation of optimum sample size allocation: An illustration with body mass index for evaluating the effect of a dietetic supplement

International Journal of Biomathematics ◽

10.1142/s1793524519500864 ◽

2019 ◽

Vol 12 (08) ◽

pp. 1950086

Author(s):

Carlos N. Bouza-Herrera ◽

Sira M. Allende-Alonso ◽

Gajendra K. Vishwakarma ◽

Neha Singh

Keyword(s):

Body Mass Index ◽

Numerical Methods ◽

Sample Size ◽

Body Mass ◽

Optimal Allocation ◽

Computing Time ◽

Optimal Sample Size ◽

Optimum Sample Size ◽

Optimal Sample ◽

Sample Size Allocation

In many medical researches, it is needed to determine the optimal sample size allocation in a heterogeneous population. This paper proposes the algorithm for optimal sample size allocation. We consider the optimal allocation problem as an optimization problem and the solution is obtained by using Bisection, Secant, Regula–Falsi and other numerical methods. The performance of the algorithm for different numerical methods are analyzed and evaluated in terms of computing time, number of iterations and gain in accuracy using stratification. The efficacy of algorithm is evaluated for the response in terms of body mass index (BMI) to the dietetic supplement with diabetes mellitus, HIV/AIDS and cancer post-operatory recovery patients.

Download Full-text