SMALL SAMPLE SIZE SCIENTIST

PEDIATRICS ◽  
1989 ◽  
Vol 83 (3) ◽  
pp. A72-A72
Author(s):  
Student

The believer in the law of small numbers practices science as follows: 1. He gambles his research hypotheses on small samples without realizing that the odds against him are unreasonably high. He overestimates power. 2. He has undue confidence in early trends (e.g., the data of the first few subjects) and in the stability of observed patterns (e.g., the number and identity of significant results). He overestimates significance. 3. In evaluating replications, his or others', he has unreasonably high expectations about the replicability of significant results. He underestimates the breadth of confidence intervals. 4. He rarely attributes a deviation of results from expectations to sampling variability, because he finds a causal "explanation" for any discrepancy. Thus, he has little opportunity to recognize sampling variation in action. His belief in the law of small numbers, therefore, will forever remain intact.

Author(s):  
J. Mullaert ◽  
M. Bouaziz ◽  
Y. Seeleuthner ◽  
B. Bigio ◽  
J-L. Casanova ◽  
...  

AbstractMany methods for rare variant association studies require permutations to assess the significance of tests. Standard permutations assume that all individuals are exchangeable and do not take population stratification (PS), a known confounding factor in genetic studies, into account. We propose a novel strategy, LocPerm, in which individuals are permuted only with their closest ancestry-based neighbors. We performed a simulation study, focusing on small samples, to evaluate and compare LocPerm with standard permutations and classical adjustment on first principal components. Under the null hypothesis, LocPerm was the only method providing an acceptable type I error, regardless of sample size and level of stratification. The power of LocPerm was similar to that of standard permutation in the absence of PS, and remained stable in different PS scenarios. We conclude that LocPerm is a method of choice for taking PS and/or small sample size into account in rare variant association studies.


2020 ◽  
Author(s):  
Salem Alelyani

Abstract In the medical field, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The results of the selection stability and accuracy show the improvement in terms of both the stability and the accuracy with the bagging technique.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Vahid Ebrahimi ◽  
Zahra Bagheri ◽  
Zahra Shayan ◽  
Peyman Jafari

Assessing differential item functioning (DIF) using the ordinal logistic regression (OLR) model highly depends on the asymptotic sampling distribution of the maximum likelihood (ML) estimators. The ML estimation method, which is often used to estimate the parameters of the OLR model for DIF detection, may be substantially biased with small samples. This study is aimed at proposing a new application of the elastic net regularized OLR model, as a special type of machine learning method, for assessing DIF between two groups with small samples. Accordingly, a simulation study was conducted to compare the powers and type I error rates of the regularized and nonregularized OLR models in detecting DIF under various conditions including moderate and severe magnitudes of DIF ( DIF = 0.4   and   0.8 ), sample size ( N ), sample size ratio ( R ), scale length ( I ), and weighting parameter ( w ). The simulation results revealed that for I = 5 and regardless of R , the elastic net regularized OLR model with w = 0.1 , as compared with the nonregularized OLR model, increased the power of detecting moderate uniform DIF ( DIF = 0.4 ) approximately 35% and 21% for N = 100   and   150 , respectively. Moreover, for I = 10 and severe uniform DIF ( DIF = 0.8 ), the average power of the elastic net regularized OLR model with 0.03 ≤ w ≤ 0.06 , as compared with the nonregularized OLR model, increased approximately 29.3% and 11.2% for N = 100   and   150 , respectively. In these cases, the type I error rates of the regularized and nonregularized OLR models were below or close to the nominal level of 0.05. In general, this simulation study showed that the elastic net regularized OLR model outperformed the nonregularized OLR model especially in extremely small sample size groups. Furthermore, the present research provided a guideline and some recommendations for researchers who conduct DIF studies with small sample sizes.


1967 ◽  
Vol 11 ◽  
pp. 185-190
Author(s):  
George H. Glade

AbstractManufacture of reed switches, critical components in present-day dataprocessing support devices, requires a means of accurate, rapid analysis of elements used in plating tlie levers of the switch. Because of gready reduced feedback time, X-ray spectroscopy has replaced metallographic sectioning and optical measurement as a plating-thickness control method. While 6 hr were required to obtain thickness data for a given sample size by sectioning, X-ray spectroscopy requires only 2 hr, which permits better control of the plating operating. X-ray spectroscopy is now used routinely to control both gold and rhodium plating thicknesses in the 20- to 100μin. (1 × 10−6) thickness range. The large number of samples prevents long count duration, while the small sample size (0.110 by 0.033 in.) reduces the precision of the analysis. However, the precision of the X-ray and optical methods is approximately the same, 8% variance. X-ray accuracy is comparable to that of sectioning since the standards are obtained by sectioning. Simplicity of operation is required since relatively untrained operators are used. An aperture system is used to reduce background. The rhodium thickness measurement is obtained from gross rhodium intensity. Attenuation of gross nickel intensity from the base material was found to be a better measure of gold thickness intensity. Calibration for hoth gold and rhodium is performed by using the same wide detector conditions. The choice of analysis is made by changing only the 2θ angle, thus avoiding the time required for recalibration when changing analysis.


Evolution ◽  
1989 ◽  
Vol 43 (3) ◽  
pp. 678 ◽  
Author(s):  
James W. Archie ◽  
Chris Simon ◽  
Andrew Martin

2020 ◽  
Author(s):  
Salem Alelyani

Abstract In the medical eld, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning eld with respect to the disease. However, learning from a medical dataset to identify relevant features suers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features.The proposed technique shows a significant improvement in selection stability while at least maintaining the classification accuracy. The stability improvement ranges from 20 to 50 percent in all cases. This implies that the likelihood of selecting the same features increased 20 to 50 percent more. This is accompanied with the increase of classification accuracy in most cases, which signifies the stated results of stability.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Salem Alelyani

AbstractIn the medical field, distinguishing genes that are relevant to a specific disease, let’s say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The proposed technique shows a significant improvement in selection stability while at least maintaining the classification accuracy. The stability improvement ranges from 20 to 50 percent in all cases. This implies that the likelihood of selecting the same features increased 20 to 50 percent more. This is accompanied with the increase of classification accuracy in most cases, which signifies the stated results of stability.


Sign in / Sign up

Export Citation Format

Share Document