Variability of Classification Results in Data with High Dimensionality and Small Sample Size

Information Technology and Management Science ◽

10.7250/itms-2021-0007 ◽

2021 ◽

Vol 24 ◽

pp. 45-52

Author(s):

Jana Busa ◽

Inese Polaka

Keyword(s):

Small Sample Size ◽

Antibiotic Use ◽

Small Sample ◽

Biological Data ◽

Classification Model ◽

Intestinal Microbiome ◽

High Dimensionality ◽

Data Sets ◽

Data Set ◽

Before And After

The study focuses on the analysis of biological data containing information on the number of genome sequences of intestinal microbiome bacteria before and after antibiotic use. The data have high dimensionality (bacterial taxa) and a small number of records, which is typical of bioinformatics data. Classification models induced on data sets like this usually are not stable and the accuracy metrics have high variance. The aim of the study is to create a preprocessing workflow and a classification model that can perform the most accurate classification of the microbiome into groups before and after the use of antibiotics and lessen the variability of accuracy measures of the classifier. To evaluate the accuracy of the model, measures of the area under the ROC curve and the overall accuracy of the classifier were used. In the experiments, the authors examined how classification results were affected by feature selection and increased size of the data set.

Download Full-text

The Support Feature Machine: Classification with the Least Number of Features and Application to Neuroimaging Data

Neural Computation ◽

10.1162/neco_a_00447 ◽

2013 ◽

Vol 25 (6) ◽

pp. 1548-1584 ◽

Cited By ~ 2

Author(s):

Sascha Klement ◽

Silke Anders ◽

Thomas Martinetz

Keyword(s):

Feature Selection ◽

Small Sample Size ◽

Small Sample ◽

Biological Data ◽

Support Vector ◽

Data Sets ◽

Universal Method ◽

Data Set ◽

Separating Hyperplane ◽

New Formulation

By minimizing the zero-norm of the separating hyperplane, the support feature machine (SFM) finds the smallest subspace (the least number of features) of a data set such that within this subspace, two classes are linearly separable without error. This way, the dimensionality of the data is more efficiently reduced than with support vector–based feature selection, which can be shown both theoretically and empirically. In this letter, we first provide a new formulation of the previously introduced concept of the SFM. With this new formulation, classification of unbalanced and nonseparable data is straightforward, which allows using the SFM for feature selection and classification in a large variety of different scenarios. To illustrate how the SFM can be used to identify both the smallest subset of discriminative features and the total number of informative features in biological data sets we apply repetitive feature selection based on the SFM to a functional magnetic resonance imaging data set. We suggest that these capabilities qualify the SFM as a universal method for feature selection, especially for high-dimensional small-sample-size data sets that often occur in biological and medical applications.

Download Full-text

FINGERPRINT SILICON REPLICAS: STATIC AND DYNAMIC FEATURES FOR VITALITY DETECTION USING AN OPTICAL CAPTURE DEVICE

International Journal of Image and Graphics ◽

10.1142/s0219467808003209 ◽

2008 ◽

Vol 08 (04) ◽

pp. 495-512 ◽

Cited By ~ 26

Author(s):

PIETRO COLI ◽

GIAN LUCA MARCIALIS ◽

FABIO ROLI

Keyword(s):

Performance Improvement ◽

Small Sample Size ◽

Small Sample ◽

Small Data ◽

Data Sets ◽

Dynamic Features ◽

Data Set ◽

Feature Sets ◽

Small Data Sets ◽

Verification Systems

The automatic vitality detection of a fingerprint has become an important issue in personal verification systems based on this biometric. It has been shown that fake fingerprints made using materials like gelatine or silicon can deceive commonly used sensors. Recently, the extraction of vitality features from fingerprint images has been proposed to address this problem. Among others, static and dynamic features have been separately studied so far, thus their respective merits are not yet clear; especially because reported results were often obtained with different sensors and using small data sets which could have obscured relative merits, due to the potential small sample-size issues. In this paper, we compare some static and dynamic features by experiments on a larger data set and using the same optical sensor for the extraction of both feature sets. We dealt with fingerprint stamps made using liquid silicon rubber. Reported results show the relative merits of static and dynamic features and the performance improvement achievable by using such features together.

Download Full-text

Participant Reactions to Suicide-Focused Research

Crisis ◽

10.1027/0227-5910/a000650 ◽

2020 ◽

Vol 41 (5) ◽

pp. 367-374

Author(s):

Sarah P. Carter ◽

Brooke A. Ammerman ◽

Heather M. Gebhardt ◽

Jonathan Buchholz ◽

Mark A. Reger

Keyword(s):

Medical Records ◽

Research Study ◽

Small Sample Size ◽

Psychiatric Inpatient ◽

Small Sample ◽

Inpatient Unit ◽

Online Laboratory ◽

Psychiatric Inpatient Unit ◽

Before And After ◽

Participant Reactions

Abstract. Background: Concerns exist regarding the perceived risks of conducting suicide-focused research among an acutely distressed population. Aims: The current study assessed changes in participant distress before and after participation in a suicide-focused research study conducted on a psychiatric inpatient unit. Method: Participants included 37 veterans who were receiving treatment on a psychiatric inpatient unit and completed a survey-based research study focused on suicide-related behaviors and experiences. Results: Participants reported no significant changes in self-reported distress. The majority of participants reported unchanged or decreased distress. Reviews of electronic medical records revealed no behavioral dysregulation and minimal use of as-needed medications or changes in mood following participation. Limitations: The study's small sample size and veteran population may limit generalizability. Conclusion: Findings add to research conducted across a variety of settings (i.e., outpatient, online, laboratory), indicating that participating in suicide-focused research is not significantly associated with increased distress or suicide risk.

Download Full-text

Bayesian Classifier for Sparsity-Promoting Feature Selection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500226 ◽

2015 ◽

Vol 29 (06) ◽

pp. 1550022 ◽

Cited By ~ 1

Author(s):

Danlei Xu ◽

Lan Du ◽

Hongwei Liu ◽

Penghui Wang

Keyword(s):

Feature Selection ◽

Synthetic Data ◽

Original Data ◽

Radar Data ◽

Bayesian Classifier ◽

Classification Model ◽

Data Sets ◽

Data Set ◽

Classification Boundary ◽

Nonlinear Mappings

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.

Download Full-text

O-225 Effects of SARS-Corona virus 2 on IVF treatment parameters: A cohort study of post COVID-19 patients

Human Reproduction ◽

10.1093/humrep/deab128.049 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

Y Kabalkin ◽

M Gil ◽

E Lifshitz ◽

A Moav ◽

M Kabessa ◽

...

Keyword(s):

Cohort Study ◽

Virus Disease ◽

Small Sample Size ◽

Sperm Concentration ◽

Small Sample ◽

Cycle Performance ◽

Ang Ii ◽

P Values ◽

Corona Virus ◽

Before And After

Abstract Study question Does recovery from SARS–Corona virus 2 (SARS–CoV-2) infection negatively effect IVF cycle parameters? Summary answer Female IVF treatment parameters were comparable to the pre-Covid-19 infection cycle performance. Sperm concentration and motility demonstrated lower mean counts following Covid-19 infection. What is known already Corona-virus disease-19 (Covid-19) is a global pandemic caused by SARS–Corona virus 2 (SARS–CoV-2). The virus primarily affects the respiratory system, but other systemic and immune mediated effects have been reported. The spikes of SARS-CoV-2 have strong affinity for the Angiotensin converting enzyme (ACE) 2 receptor, leading to an increased Angiotensin II (Ang II) mediated pro-inflammatory response. ACE2 receptors exist in the human reproductive tract (more in males) and pose a regulatory role together with Ang II. So far, reports have been inconsistent regarding testicular effects. Other implications involving fertility and fertility treatment post infection are scarce. Study design, size, duration In this retrospective cohort study, IVF cycle performance was compared before and after Corona-virus disease-19. Patients were included only in cases where an IVF cycle was initiated within 3 months of Covid-19 recovery, between March 2020-December 2020. Participants/materials, setting, methods The study was conducted in a University affiliated IVF unit. Post Covid- 19 cycle parameters were compared to previous cycles of the same individual prior to infection. If previous cycles were not available, parameters were compared to non-exposed patients of same age, same treatment and identical indication. Sperm concentration and motility were compared before and after infection. Non exposure was defined by a lack of past Covid-19 diagnosis and a negative PCR throughout the treatment. Main results and the role of chance All together, including the matched cycles, we compared 40 cycles which started within 3 months of recovery: 26 fresh stimulation cycles and 14 frozen thawed transfer cycles. In 28 of these cycles the patient could serve as its own control. Mean age for the female partner was 33.2 years ±6.5 years. Eight male partners presented post infection and provided fresh samples for a cycle involving fertilization. We compared stimulation parameters including maximal Estradiol level, stimulation length, FSH dosage, number of oocytes retrieved, fertilization rates, number of embryos created, high quality embryo number and endometrial thickness. All of these were comparable to non-exposed cycles (generalized estimating equations, p values >0.1). No complications were recorded, specifically no thromboembolic events or respiratory complications. A total of 8 patients conceived: 1 was a chemical pregnancy, 1 extra-uterine pregnancy, 3 miscarriages and 3 ongoing, of those 1 was complicated by early bleeding. Male sperm analyses showed a trend towards lower post disease parameters, not reaching a statistical significance: 23mil/ml compared to 13.6 and 20.7% progressive motility compared to 12.3% (p values 0.09 and 0.17, respectively). Limitations, reasons for caution Current results are based on a small sample size, still insufficient for deducing definite conclusions or guidelines. Pregnancy outcome following IVF treatment in Covid-19 recoverees should further be studied. By the time of the conference, the number of cases is expected to be significantly higher. Wider implications of the findings This study provides preliminary data regarding the effects of SARS-COV-2 infection on IVF treatment outcomes. Despite the small sample size, treatment parameters seem unaffected, however, sperm performance seems to be compromised. Health policy and patients’ decisions regarding whether or not to postpone IVF procedures necessitates additional data. Trial registration number Not applicable - retrospective

Download Full-text

A Pilot Placebo Controlled Randomized Trial of Dexamethasone for Chronic Subdural Hematoma

Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques ◽

10.1017/cjn.2015.393 ◽

2016 ◽

Vol 43 (2) ◽

pp. 284-290 ◽

Cited By ~ 23

Author(s):

Michel Prud’homme ◽

François Mathieu ◽

Nicolas Marcotte ◽

Sylvine Cottin

Keyword(s):

Conservative Treatment ◽

Adverse Events ◽

Small Sample Size ◽

Chronic Subdural Hematoma ◽

Symptomatic Patient ◽

Placebo Treatment ◽

Small Sample ◽

Chi Square ◽

Serious Adverse Events ◽

Before And After

AbstractBackground: Current opinions regarding the use of dexamethasone in the treatment of chronic subdural hematomas (CSDH) are only based on observational studies. Moreover, the use of corticosteroids in asymptomatic or minimally symptomatic patient with this condition remains controversial. Here, we present data from a prospective randomized pilot study of CSDH patients treated with dexamethasone or placebo. Methods: Twenty patients with imaging-confirmed CSDH were recruited from a single center and randomized to receive dexamethasone (12 mg/day for 3 weeks followed by tapering) or placebo as a conservative treatment. Patients were followed for 6 months and the rate of success of conservative treatment with dexamethasone versus placebo was measured. Parameters such as hematoma thickness and clinical changes were also compared before and after treatment with chi-square tests. Adverse events and complications were documented. Results: During the 6-month follow-up, one of ten patients treated with corticosteroids had to undergo surgical drainage and three of ten patients were treated surgically after placebo treatment. At the end of the study, all remaining patients had complete radiological resolution. No significant differences were observed in terms of hematoma thickness profile and impression of change; however, patients experienced more severe side effects when treated with steroids as compared with placebo. Dexamethasone contributed to many serious adverse events. Conclusions: Given the small sample size, these preliminary results have not shown a clear beneficial effect of dexamethasone against placebo in our patients. However, the number of secondary effects reported was much greater for corticosteroids, and dexamethasone treatment was responsible for significant complications.

Download Full-text

Abstract 13215: Gender-affirming Hormones Are Associated With Increased Risk of QT Prolongation in Transgender Females

Circulation ◽

10.1161/circ.142.suppl_3.13215 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Daniel Antwi Amoabeng ◽

Ahmed Hanfy ◽

Munadel Awad ◽

Bryce D Beutler ◽

Amneet Rai ◽

...

Keyword(s):

Qt Interval ◽

Small Sample Size ◽

Qt Prolongation ◽

Small Sample ◽

Rank Test ◽

Qt Interval Prolongation ◽

Northern Nevada ◽

Increased Risk ◽

Before And After ◽

Signed Rank Test

Introduction: Women have a longer QT interval than men. This sex-specific difference is attributed to hormones associated with the biological female sex. Male-to-female transgender individuals often take antiandrogens such as spironolactone or goserelin in addition to estrogens to suppress testosterone effects while increasing feminine features. Effects of gender-affirming hormone therapy (GHT) on the QT interval in these individuals remains to be elucidated. Hypothesis: We assessed the hypothesis that the use of GHT is associated with an increased risk for QT interval prolongation in transgender females. Methods: We identified 46 transgender females through a search of the electronic medical records of a Veterans Administration hospital in Northern Nevada. Patients with a diagnosis of congenital long QT syndrome were excluded. Of these, 13 patients had ECGs before and after initiation of GHT and were included. We adapted the Tisdale score using the auto-calculated corrected QT interval (QTc) to estimate the risk of QT prolongation. Age, QTc, and Tisdale scores before and after GHT initiation were compared using the Wilcoxon signed-rank test. All tests were performed as two-tailed at a 5% level of significance. Results: All 13 study patients were taking estrogens. Of these, 3 (23.1%) were taking goserelin and 9 (69.2%) were taking spironolactone. Mean ± SEM age at ECG acquisition was 45.0 ± 4.4 and 47.7 ± 4.7 years before and after the initiation of GHT respectively. Mean ± SEM QTc after initiation of GHT was significantly higher compared to the baseline (467.5 ± 12.8 ms vs. 428.2 ± 7.1 ms) (Figure 1A). The average baseline Tisdale score was significantly smaller on follow-up (1-point vs. 3 points) (Figure 1B). Conclusions: GHT appears to be associated with increased QTc in transgender women. This needs to be interpreted with caution owing to the very small sample size in this study. Further studies to investigate the strength of this association, if it exists, are warranted.

Download Full-text

A Multi-Linear Statistical Method for Discriminant Analysis of 2D Frontal Face Images

Cross-Disciplinary Applications of Artificial Intelligence and Pattern Recognition - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-61350-429-1.ch002 ◽

2012 ◽

pp. 18-33 ◽

Cited By ~ 1

Author(s):

Carlos Eduardo Thomaz ◽

Vagner do Amaral ◽

Gilson Antonio Giraldi ◽

Edson Caoru Kitani ◽

João Ricardo Sato ◽

...

Keyword(s):

Small Sample Size ◽

Face Image ◽

Small Sample ◽

High Dimensional ◽

Data Set ◽

Linear Discriminant ◽

Face Images ◽

Linear Framework ◽

Facial Changes ◽

2D Data

This chapter describes a multi-linear discriminant method of constructing and quantifying statistically significant changes on human identity photographs. The approach is based on a general multivariate two-stage linear framework that addresses the small sample size problem in high-dimensional spaces. Starting with a 2D data set of frontal face images, the authors determine a most characteristic direction of change by organizing the data according to the patterns of interest. These experiments on publicly available face image sets show that the multi-linear approach does produce visually plausible results for gender, facial expression and aging facial changes in a simple and efficient way. The authors believe that such approach could be widely applied for modeling and reconstruction in face recognition and possibly in identifying subjects after a lapse of time.

Download Full-text

Descriptive Statistics and Cross-National Comparisons of Arctic Power Projection

Perils of Plenty ◽

10.1093/oso/9780190078249.003.0004 ◽

2020 ◽

pp. 56-80

Author(s):

Jonathan N. Markowitz

Keyword(s):

Descriptive Statistics ◽

The Arctic ◽

Military Force ◽

Data Sets ◽

Data Set ◽

Power Projection ◽

Resource Rents ◽

Before And After ◽

Military Activity ◽

Events Data

Chapter 4 employs data from three new data sets, the Arctic Military Activity Events Data Set, the Arctic Bases Data Set, and the Icebreaker and Ice-Hardened Warships Data Set. These new data enable a systematic comparison of each state’s Arctic military forces and deployments before and after the 2007 climate shock. The data offer a corrective to both sensationalist media accounts that suggest that all states are scrambling to fight over Arctic resources and those who downplay real changes in states’ Arctic military capabilities and presence. Confirming Rent-Addition’s Theory’s predictions, the descriptive statistical comparisons reveal that the states that were most economically dependent on resource rents, Norway and Russia, were the most willing to back their claims by projecting military force to disputed areas and investing in Arctic bases, ice-hardened warships, and icebreakers.

Download Full-text

Partial Least Square Analyses of Landscape and Surface Water Biota Associations in the Savannah River Basin

ISRN Ecology ◽

10.5402/2011/571749 ◽

2011 ◽

Vol 2011 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Maliha S. Nash ◽

Deborah J. Chaloud

Keyword(s):

Surface Water ◽

Small Sample Size ◽

Surface Water Quality ◽

Spatial Location ◽

Small Sample ◽

Partial Least Square ◽

Least Square ◽

Data Sets ◽

Savannah River ◽

Remotely Sensed Data

Ecologists are often faced with problem of small sample size, correlated and large number of predictors, and high noise-to-signal relationships. This necessitates excluding important variables from the model when applying standard multiple or multivariate regression analyses. In this paper, we present the results of applying PLS to explore relationships among biotic indicators of surface water quality and landscape conditions accounting for the above problems. Available field sampling and remotely sensed data sets for the Savannah Basin are used. We were able to develop models and compare results for the whole basin and for each ecoregion (Blue Ridge, Piedmont, and Coastal Plain) in spite of the data constraints. The amount of variability in surface water biota explained by each model reflects the scale, spatial location, and the composition of contributing landscape metrics. The landscape-biota model developed for the whole basin using PLS explains 43% and 80% of the variation in water biota and landscape data sets, respectively. Models developed for each of the three ecoregions indicate dominance of landscape variables which reflect the geophysical characteristics of that ecoregion.

Download Full-text