scholarly journals Variability of Classification Results in Data with High Dimensionality and Small Sample Size

2021 ◽  
Vol 24 ◽  
pp. 45-52
Author(s):  
Jana Busa ◽  
Inese Polaka

The study focuses on the analysis of biological data containing information on the number of genome sequences of intestinal microbiome bacteria before and after antibiotic use. The data have high dimensionality (bacterial taxa) and a small number of records, which is typical of bioinformatics data. Classification models induced on data sets like this usually are not stable and the accuracy metrics have high variance. The aim of the study is to create a preprocessing workflow and a classification model that can perform the most accurate classification of the microbiome into groups before and after the use of antibiotics and lessen the variability of accuracy measures of the classifier. To evaluate the accuracy of the model, measures of the area under the ROC curve and the overall accuracy of the classifier were used. In the experiments, the authors examined how classification results were affected by feature selection and increased size of the data set.

2013 ◽  
Vol 25 (6) ◽  
pp. 1548-1584 ◽  
Author(s):  
Sascha Klement ◽  
Silke Anders ◽  
Thomas Martinetz

By minimizing the zero-norm of the separating hyperplane, the support feature machine (SFM) finds the smallest subspace (the least number of features) of a data set such that within this subspace, two classes are linearly separable without error. This way, the dimensionality of the data is more efficiently reduced than with support vector–based feature selection, which can be shown both theoretically and empirically. In this letter, we first provide a new formulation of the previously introduced concept of the SFM. With this new formulation, classification of unbalanced and nonseparable data is straightforward, which allows using the SFM for feature selection and classification in a large variety of different scenarios. To illustrate how the SFM can be used to identify both the smallest subset of discriminative features and the total number of informative features in biological data sets we apply repetitive feature selection based on the SFM to a functional magnetic resonance imaging data set. We suggest that these capabilities qualify the SFM as a universal method for feature selection, especially for high-dimensional small-sample-size data sets that often occur in biological and medical applications.


2008 ◽  
Vol 08 (04) ◽  
pp. 495-512 ◽  
Author(s):  
PIETRO COLI ◽  
GIAN LUCA MARCIALIS ◽  
FABIO ROLI

The automatic vitality detection of a fingerprint has become an important issue in personal verification systems based on this biometric. It has been shown that fake fingerprints made using materials like gelatine or silicon can deceive commonly used sensors. Recently, the extraction of vitality features from fingerprint images has been proposed to address this problem. Among others, static and dynamic features have been separately studied so far, thus their respective merits are not yet clear; especially because reported results were often obtained with different sensors and using small data sets which could have obscured relative merits, due to the potential small sample-size issues. In this paper, we compare some static and dynamic features by experiments on a larger data set and using the same optical sensor for the extraction of both feature sets. We dealt with fingerprint stamps made using liquid silicon rubber. Reported results show the relative merits of static and dynamic features and the performance improvement achievable by using such features together.


Crisis ◽  
2020 ◽  
Vol 41 (5) ◽  
pp. 367-374
Author(s):  
Sarah P. Carter ◽  
Brooke A. Ammerman ◽  
Heather M. Gebhardt ◽  
Jonathan Buchholz ◽  
Mark A. Reger

Abstract. Background: Concerns exist regarding the perceived risks of conducting suicide-focused research among an acutely distressed population. Aims: The current study assessed changes in participant distress before and after participation in a suicide-focused research study conducted on a psychiatric inpatient unit. Method: Participants included 37 veterans who were receiving treatment on a psychiatric inpatient unit and completed a survey-based research study focused on suicide-related behaviors and experiences. Results: Participants reported no significant changes in self-reported distress. The majority of participants reported unchanged or decreased distress. Reviews of electronic medical records revealed no behavioral dysregulation and minimal use of as-needed medications or changes in mood following participation. Limitations: The study's small sample size and veteran population may limit generalizability. Conclusion: Findings add to research conducted across a variety of settings (i.e., outpatient, online, laboratory), indicating that participating in suicide-focused research is not significantly associated with increased distress or suicide risk.


Author(s):  
Danlei Xu ◽  
Lan Du ◽  
Hongwei Liu ◽  
Penghui Wang

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
Y Kabalkin ◽  
M Gil ◽  
E Lifshitz ◽  
A Moav ◽  
M Kabessa ◽  
...  

Abstract Study question Does recovery from SARS–Corona virus 2 (SARS–CoV-2) infection negatively effect IVF cycle parameters? Summary answer Female IVF treatment parameters were comparable to the pre-Covid-19 infection cycle performance. Sperm concentration and motility demonstrated lower mean counts following Covid-19 infection. What is known already Corona-virus disease-19 (Covid-19) is a global pandemic caused by SARS–Corona virus 2 (SARS–CoV-2). The virus primarily affects the respiratory system, but other systemic and immune mediated effects have been reported. The spikes of SARS-CoV-2 have strong affinity for the Angiotensin converting enzyme (ACE) 2 receptor, leading to an increased Angiotensin II (Ang II) mediated pro-inflammatory response. ACE2 receptors exist in the human reproductive tract (more in males) and pose a regulatory role together with Ang II. So far, reports have been inconsistent regarding testicular effects. Other implications involving fertility and fertility treatment post infection are scarce. Study design, size, duration In this retrospective cohort study, IVF cycle performance was compared before and after Corona-virus disease-19. Patients were included only in cases where an IVF cycle was initiated within 3 months of Covid-19 recovery, between March 2020-December 2020. Participants/materials, setting, methods The study was conducted in a University affiliated IVF unit. Post Covid- 19 cycle parameters were compared to previous cycles of the same individual prior to infection. If previous cycles were not available, parameters were compared to non-exposed patients of same age, same treatment and identical indication. Sperm concentration and motility were compared before and after infection. Non exposure was defined by a lack of past Covid-19 diagnosis and a negative PCR throughout the treatment. Main results and the role of chance All together, including the matched cycles, we compared 40 cycles which started within 3 months of recovery: 26 fresh stimulation cycles and 14 frozen thawed transfer cycles. In 28 of these cycles the patient could serve as its own control. Mean age for the female partner was 33.2 years ±6.5 years. Eight male partners presented post infection and provided fresh samples for a cycle involving fertilization. We compared stimulation parameters including maximal Estradiol level, stimulation length, FSH dosage, number of oocytes retrieved, fertilization rates, number of embryos created, high quality embryo number and endometrial thickness. All of these were comparable to non-exposed cycles (generalized estimating equations, p values >0.1). No complications were recorded, specifically no thromboembolic events or respiratory complications. A total of 8 patients conceived: 1 was a chemical pregnancy, 1 extra-uterine pregnancy, 3 miscarriages and 3 ongoing, of those 1 was complicated by early bleeding. Male sperm analyses showed a trend towards lower post disease parameters, not reaching a statistical significance: 23mil/ml compared to 13.6 and 20.7% progressive motility compared to 12.3% (p values 0.09 and 0.17, respectively). Limitations, reasons for caution Current results are based on a small sample size, still insufficient for deducing definite conclusions or guidelines. Pregnancy outcome following IVF treatment in Covid-19 recoverees should further be studied. By the time of the conference, the number of cases is expected to be significantly higher. Wider implications of the findings This study provides preliminary data regarding the effects of SARS-COV-2 infection on IVF treatment outcomes. Despite the small sample size, treatment parameters seem unaffected, however, sperm performance seems to be compromised. Health policy and patients’ decisions regarding whether or not to postpone IVF procedures necessitates additional data. Trial registration number Not applicable - retrospective


Author(s):  
Michel Prud’homme ◽  
François Mathieu ◽  
Nicolas Marcotte ◽  
Sylvine Cottin

AbstractBackground: Current opinions regarding the use of dexamethasone in the treatment of chronic subdural hematomas (CSDH) are only based on observational studies. Moreover, the use of corticosteroids in asymptomatic or minimally symptomatic patient with this condition remains controversial. Here, we present data from a prospective randomized pilot study of CSDH patients treated with dexamethasone or placebo. Methods: Twenty patients with imaging-confirmed CSDH were recruited from a single center and randomized to receive dexamethasone (12 mg/day for 3 weeks followed by tapering) or placebo as a conservative treatment. Patients were followed for 6 months and the rate of success of conservative treatment with dexamethasone versus placebo was measured. Parameters such as hematoma thickness and clinical changes were also compared before and after treatment with chi-square tests. Adverse events and complications were documented. Results: During the 6-month follow-up, one of ten patients treated with corticosteroids had to undergo surgical drainage and three of ten patients were treated surgically after placebo treatment. At the end of the study, all remaining patients had complete radiological resolution. No significant differences were observed in terms of hematoma thickness profile and impression of change; however, patients experienced more severe side effects when treated with steroids as compared with placebo. Dexamethasone contributed to many serious adverse events. Conclusions: Given the small sample size, these preliminary results have not shown a clear beneficial effect of dexamethasone against placebo in our patients. However, the number of secondary effects reported was much greater for corticosteroids, and dexamethasone treatment was responsible for significant complications.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Daniel Antwi Amoabeng ◽  
Ahmed Hanfy ◽  
Munadel Awad ◽  
Bryce D Beutler ◽  
Amneet Rai ◽  
...  

Introduction: Women have a longer QT interval than men. This sex-specific difference is attributed to hormones associated with the biological female sex. Male-to-female transgender individuals often take antiandrogens such as spironolactone or goserelin in addition to estrogens to suppress testosterone effects while increasing feminine features. Effects of gender-affirming hormone therapy (GHT) on the QT interval in these individuals remains to be elucidated. Hypothesis: We assessed the hypothesis that the use of GHT is associated with an increased risk for QT interval prolongation in transgender females. Methods: We identified 46 transgender females through a search of the electronic medical records of a Veterans Administration hospital in Northern Nevada. Patients with a diagnosis of congenital long QT syndrome were excluded. Of these, 13 patients had ECGs before and after initiation of GHT and were included. We adapted the Tisdale score using the auto-calculated corrected QT interval (QTc) to estimate the risk of QT prolongation. Age, QTc, and Tisdale scores before and after GHT initiation were compared using the Wilcoxon signed-rank test. All tests were performed as two-tailed at a 5% level of significance. Results: All 13 study patients were taking estrogens. Of these, 3 (23.1%) were taking goserelin and 9 (69.2%) were taking spironolactone. Mean ± SEM age at ECG acquisition was 45.0 ± 4.4 and 47.7 ± 4.7 years before and after the initiation of GHT respectively. Mean ± SEM QTc after initiation of GHT was significantly higher compared to the baseline (467.5 ± 12.8 ms vs. 428.2 ± 7.1 ms) (Figure 1A). The average baseline Tisdale score was significantly smaller on follow-up (1-point vs. 3 points) (Figure 1B). Conclusions: GHT appears to be associated with increased QTc in transgender women. This needs to be interpreted with caution owing to the very small sample size in this study. Further studies to investigate the strength of this association, if it exists, are warranted.


Author(s):  
Carlos Eduardo Thomaz ◽  
Vagner do Amaral ◽  
Gilson Antonio Giraldi ◽  
Edson Caoru Kitani ◽  
João Ricardo Sato ◽  
...  

This chapter describes a multi-linear discriminant method of constructing and quantifying statistically significant changes on human identity photographs. The approach is based on a general multivariate two-stage linear framework that addresses the small sample size problem in high-dimensional spaces. Starting with a 2D data set of frontal face images, the authors determine a most characteristic direction of change by organizing the data according to the patterns of interest. These experiments on publicly available face image sets show that the multi-linear approach does produce visually plausible results for gender, facial expression and aging facial changes in a simple and efficient way. The authors believe that such approach could be widely applied for modeling and reconstruction in face recognition and possibly in identifying subjects after a lapse of time.


2020 ◽  
pp. 56-80
Author(s):  
Jonathan N. Markowitz

Chapter 4 employs data from three new data sets, the Arctic Military Activity Events Data Set, the Arctic Bases Data Set, and the Icebreaker and Ice-Hardened Warships Data Set. These new data enable a systematic comparison of each state’s Arctic military forces and deployments before and after the 2007 climate shock. The data offer a corrective to both sensationalist media accounts that suggest that all states are scrambling to fight over Arctic resources and those who downplay real changes in states’ Arctic military capabilities and presence. Confirming Rent-Addition’s Theory’s predictions, the descriptive statistical comparisons reveal that the states that were most economically dependent on resource rents, Norway and Russia, were the most willing to back their claims by projecting military force to disputed areas and investing in Arctic bases, ice-hardened warships, and icebreakers.


ISRN Ecology ◽  
2011 ◽  
Vol 2011 ◽  
pp. 1-11 ◽  
Author(s):  
Maliha S. Nash ◽  
Deborah J. Chaloud

Ecologists are often faced with problem of small sample size, correlated and large number of predictors, and high noise-to-signal relationships. This necessitates excluding important variables from the model when applying standard multiple or multivariate regression analyses. In this paper, we present the results of applying PLS to explore relationships among biotic indicators of surface water quality and landscape conditions accounting for the above problems. Available field sampling and remotely sensed data sets for the Savannah Basin are used. We were able to develop models and compare results for the whole basin and for each ecoregion (Blue Ridge, Piedmont, and Coastal Plain) in spite of the data constraints. The amount of variability in surface water biota explained by each model reflects the scale, spatial location, and the composition of contributing landscape metrics. The landscape-biota model developed for the whole basin using PLS explains 43% and 80% of the variation in water biota and landscape data sets, respectively. Models developed for each of the three ecoregions indicate dominance of landscape variables which reflect the geophysical characteristics of that ecoregion.


Sign in / Sign up

Export Citation Format

Share Document