Assessing and improving the stability of chemometric models in small sample size situations

2008 ◽  
Vol 390 (5) ◽  
pp. 1261-1271 ◽  
Author(s):  
Claudia Beleites ◽  
Reiner Salzer
PEDIATRICS ◽  
1989 ◽  
Vol 83 (3) ◽  
pp. A72-A72
Author(s):  
Student

The believer in the law of small numbers practices science as follows: 1. He gambles his research hypotheses on small samples without realizing that the odds against him are unreasonably high. He overestimates power. 2. He has undue confidence in early trends (e.g., the data of the first few subjects) and in the stability of observed patterns (e.g., the number and identity of significant results). He overestimates significance. 3. In evaluating replications, his or others', he has unreasonably high expectations about the replicability of significant results. He underestimates the breadth of confidence intervals. 4. He rarely attributes a deviation of results from expectations to sampling variability, because he finds a causal "explanation" for any discrepancy. Thus, he has little opportunity to recognize sampling variation in action. His belief in the law of small numbers, therefore, will forever remain intact.


2020 ◽  
Author(s):  
Salem Alelyani

Abstract In the medical field, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The results of the selection stability and accuracy show the improvement in terms of both the stability and the accuracy with the bagging technique.


Evolution ◽  
1989 ◽  
Vol 43 (3) ◽  
pp. 678 ◽  
Author(s):  
James W. Archie ◽  
Chris Simon ◽  
Andrew Martin

2020 ◽  
Author(s):  
Salem Alelyani

Abstract In the medical eld, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning eld with respect to the disease. However, learning from a medical dataset to identify relevant features suers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features.The proposed technique shows a significant improvement in selection stability while at least maintaining the classification accuracy. The stability improvement ranges from 20 to 50 percent in all cases. This implies that the likelihood of selecting the same features increased 20 to 50 percent more. This is accompanied with the increase of classification accuracy in most cases, which signifies the stated results of stability.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Salem Alelyani

AbstractIn the medical field, distinguishing genes that are relevant to a specific disease, let’s say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The proposed technique shows a significant improvement in selection stability while at least maintaining the classification accuracy. The stability improvement ranges from 20 to 50 percent in all cases. This implies that the likelihood of selecting the same features increased 20 to 50 percent more. This is accompanied with the increase of classification accuracy in most cases, which signifies the stated results of stability.


2020 ◽  
Vol 21 ◽  
Author(s):  
Roberto Gabbiadini ◽  
Eirini Zacharopoulou ◽  
Federica Furfaro ◽  
Vincenzo Craviotto ◽  
Alessandra Zilli ◽  
...  

Background: Intestinal fibrosis and subsequent strictures represent an important burden in inflammatory bowel disease (IBD). The detection and evaluation of the degree of fibrosis in stricturing Crohn’s disease (CD) is important to address the best therapeutic strategy (medical anti-inflammatory therapy, endoscopic dilation, surgery). Ultrasound elastography (USE) is a non-invasive technique that has been proposed in the field of IBD for evaluating intestinal stiffness as a biomarker of intestinal fibrosis. Objective: The aim of this review is to discuss the ability and current role of ultrasound elastography in the assessment of intestinal fibrosis. Results and Conclusion: Data on USE in IBD are provided by pilot and proof-of-concept studies with small sample size. The first type of USE investigated was strain elastography, while shear wave elastography has been introduced lately. Despite the heterogeneity of the methods of the studies, USE has been proven to be able to assess intestinal fibrosis in patients with stricturing CD. However, before introducing this technique in current practice, further studies with larger sample size and homogeneous parameters, testing reproducibility, and identification of validated cut-off values are needed.


Sign in / Sign up

Export Citation Format

Share Document