Assessing and improving the stability of chemometric models in small sample size situations

The believer in the law of small numbers practices science as follows: 1. He gambles his research hypotheses on small samples without realizing that the odds against him are unreasonably high. He overestimates power. 2. He has undue confidence in early trends (e.g., the data of the first few subjects) and in the stability of observed patterns (e.g., the number and identity of significant results). He overestimates significance. 3. In evaluating replications, his or others', he has unreasonably high expectations about the replicability of significant results. He underestimates the breadth of confidence intervals. 4. He rarely attributes a deviation of results from expectations to sampling variability, because he finds a causal "explanation" for any discrepancy. Thus, he has little opportunity to recognize sampling variation in action. His belief in the law of small numbers, therefore, will forever remain intact.

Download Full-text

Stable Bagging Feature Selection on Medical Data

10.21203/rs.3.rs-50237/v1 ◽

2020 ◽

Author(s):

Salem Alelyani

Keyword(s):

Feature Selection ◽

Sample Size ◽

Variance Reduction ◽

Small Sample Size ◽

Small Sample ◽

Domain Experts ◽

Complex Dimensions ◽

The Stability ◽

Stability And Accuracy ◽

Selection Algorithms

Abstract In the medical field, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The results of the selection stability and accuracy show the improvement in terms of both the stability and the accuracy with the bagging technique.

Download Full-text

SMALL SAMPLE SIZE DOES DECREASE THE STABILITY OF DENDROGRAMS CALCULATED FROM ALLOZYME-FREQUENCY DATA

Evolution ◽

10.1111/j.1558-5646.1989.tb04265.x ◽

1989 ◽

Vol 43 (3) ◽

pp. 678-683 ◽

Cited By ~ 15

Author(s):

James W. Archie ◽

Chris Simon ◽

Andrew Martin

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

Frequency Data ◽

The Stability

Download Full-text

Small Sample Size Does Decrease the Stability of Dendrograms Calculated from Allozyme-Frequency Data

Evolution ◽

10.2307/2409072 ◽

1989 ◽

Vol 43 (3) ◽

pp. 678 ◽

Cited By ~ 23

Author(s):

James W. Archie ◽

Chris Simon ◽

Andrew Martin

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

Frequency Data ◽

The Stability

Download Full-text

Stable Bagging Feature Selection on Medical Data

10.21203/rs.3.rs-50237/v2 ◽

2020 ◽

Author(s):

Salem Alelyani

Keyword(s):

Feature Selection ◽

Sample Size ◽

Classification Accuracy ◽

Variance Reduction ◽

Small Sample Size ◽

Small Sample ◽

Domain Experts ◽

Complex Dimensions ◽

The Stability ◽

Selection Algorithms

Abstract In the medical eld, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning eld with respect to the disease. However, learning from a medical dataset to identify relevant features suers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features.The proposed technique shows a significant improvement in selection stability while at least maintaining the classification accuracy. The stability improvement ranges from 20 to 50 percent in all cases. This implies that the likelihood of selecting the same features increased 20 to 50 percent more. This is accompanied with the increase of classification accuracy in most cases, which signifies the stated results of stability.

Download Full-text

Stable bagging feature selection on medical data

Journal Of Big Data ◽

10.1186/s40537-020-00385-8 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Salem Alelyani

Keyword(s):

Feature Selection ◽

Sample Size ◽

Classification Accuracy ◽

Variance Reduction ◽

Small Sample Size ◽

Small Sample ◽

Domain Experts ◽

Complex Dimensions ◽

The Stability ◽

Selection Algorithms

AbstractIn the medical field, distinguishing genes that are relevant to a specific disease, let’s say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The proposed technique shows a significant improvement in selection stability while at least maintaining the classification accuracy. The stability improvement ranges from 20 to 50 percent in all cases. This implies that the likelihood of selecting the same features increased 20 to 50 percent more. This is accompanied with the increase of classification accuracy in most cases, which signifies the stated results of stability.

Download Full-text

Faculty Opinions recommendation of Power failure: why small sample size undermines the reliability of neuroscience.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718002370.793476498 ◽

2013 ◽

Author(s):

Björn Brembs

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

Power Failure

Download Full-text

Faculty Opinions recommendation of Power failure: why small sample size undermines the reliability of neuroscience.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718002370.793475357 ◽

2013 ◽

Author(s):

Wayne Hall ◽

Adrian Carter

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

Power Failure

Download Full-text

Application of Ultrasound Elastography for Assessing Intestinal Fibrosis in Inflammatory Bowel Disease: Fiction or Reality?

Current Drug Targets ◽

10.2174/1389450121666201119142919 ◽

2020 ◽

Vol 21 ◽

Author(s):

Roberto Gabbiadini ◽

Eirini Zacharopoulou ◽

Federica Furfaro ◽

Vincenzo Craviotto ◽

Alessandra Zilli ◽

...

Keyword(s):

Inflammatory Bowel Disease ◽

Sample Size ◽

Bowel Disease ◽

Small Sample Size ◽

Shear Wave Elastography ◽

Small Sample ◽

Ultrasound Elastography ◽

Invasive Technique ◽

Intestinal Fibrosis ◽

Inflammatory Bowel

Background: Intestinal fibrosis and subsequent strictures represent an important burden in inflammatory bowel disease (IBD). The detection and evaluation of the degree of fibrosis in stricturing Crohn’s disease (CD) is important to address the best therapeutic strategy (medical anti-inflammatory therapy, endoscopic dilation, surgery). Ultrasound elastography (USE) is a non-invasive technique that has been proposed in the field of IBD for evaluating intestinal stiffness as a biomarker of intestinal fibrosis. Objective: The aim of this review is to discuss the ability and current role of ultrasound elastography in the assessment of intestinal fibrosis. Results and Conclusion: Data on USE in IBD are provided by pilot and proof-of-concept studies with small sample size. The first type of USE investigated was strain elastography, while shear wave elastography has been introduced lately. Despite the heterogeneity of the methods of the studies, USE has been proven to be able to assess intestinal fibrosis in patients with stricturing CD. However, before introducing this technique in current practice, further studies with larger sample size and homogeneous parameters, testing reproducibility, and identification of validated cut-off values are needed.

Download Full-text

Fukunaga-Koontz transform for small sample size problems

IEE Irish Signals and Systems Conference 2005 ◽

10.1049/cp:20050304 ◽

2005 ◽

Cited By ~ 4

Author(s):

A.A. Miranda

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Small Sample

Download Full-text