Topology and Geometry for Small Sample Sizes: An Application to Research on the Profoundly Gifted

Mapping Intimacies ◽

10.31234/osf.io/mknpj ◽

2018 ◽

Author(s):

Colleen Molloy Farrelly

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

School Attendance ◽

Gifted Students ◽

Small Sample ◽

Topological Data Analysis ◽

Small Samples ◽

Logistic Regression Models ◽

Regression Methods ◽

Small Sample Sizes

This study aims to confirm prior findings on the usefulness of topological data analysis (TDA) in the analysis of small samples, particularly focused on cohorts of profoundly gifted students, as well as explore the use of TDA-based regression methods for statistical modeling with small samples. A subset of the Gross sample is analyzed through supervised and unsupervised methods, including 16 and 17 individuals, respectively. Unsupervised learning confirmed prior results suggesting that evenly gifted and unevenly gifted subpopulations fundamentally differ. Supervised learning focused on predicting graduate school attendance and awards earned during undergraduate studies, and TDA-based logistic regression models were compared with more traditional machine learning models for logistic regression. Results suggest 1) that TDA-based methods are capable of handing small samples and seem more robust to the issues that arise in small samples than other machine learning methods and 2) that early childhood achievement scores and several factors related to childhood education interventions (such as early entry and radical acceleration) play a role in predicting key educational and professional achievements in adulthood. Possible new directions from this work include the use of TDA-based tools in the analysis of rare cohorts thus-far relegated to qualitative analytics or case studies, as well as potential exploration of early educational factors and adult-level achievement in larger populations of the profoundly gifted, particularly within the Study of Exceptional Talent and Talent Identification Program cohorts.

Download Full-text

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Scientific Reports ◽

10.1038/s41598-021-81110-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Florent Le Borgne ◽

Arthur Chatton ◽

Maxime Léger ◽

Rémi Lenain ◽

Yohann Foucher

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Statistical Power ◽

Small Sample ◽

Causal Effects ◽

Small Samples ◽

Support Vector ◽

Sample Sizes ◽

Super Learner ◽

Small Sample Sizes

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.

Download Full-text

A Comparison of Logistic Regression Models for DIF Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality of Ability Distributions

International Journal of Assessment Tools in Education ◽

10.21449/ijate.239563 ◽

2016 ◽

Vol 2 (1) ◽

pp. 22-39 ◽

Cited By ~ 2

Author(s):

Yasemin KAYA ◽

Walter L. Leite ◽

M. David Miller

Keyword(s):

Logistic Regression ◽

Regression Models ◽

Small Sample ◽

Sample Sizes ◽

Polytomous Items ◽

Logistic Regression Models ◽

Small Sample Sizes

Download Full-text

Topological Data Analysis for Data Mining Small Educational Samples with Application to Studies of the Gifted

10.31234/osf.io/3rafk ◽

2017 ◽

Author(s):

Colleen Molloy Farrelly

Keyword(s):

Data Analysis ◽

Small Sample ◽

Topological Data Analysis ◽

Small Samples ◽

Gifted Children ◽

Sample Sizes ◽

Talent Search ◽

Sample Characteristics ◽

Small Sample Sizes ◽

Topological Data

Studies of highly and profoundly gifted children typically involve small sample sizes, as the population is relatively rare, and many statistical methods cannot handle these small sample sizes well. However, topological data analysis (TDA) tools are robust, even with very small samples, and can provide useful information as well as robust statistical tests.This study demonstrates these capabilities on data simulated from previous talent search results (small and large samples), as well as a subset of data from Ruf’s cohort of gifted children. TDA methods show strong, robust performance and uncover insight into sample characteristics and subgroups, including the appearance of similar subgroups across assessment populations.

Download Full-text

Implications of Small Samples for Generalization: Adjustments and Rules of Thumb

Evaluation Review ◽

10.1177/0193841x16655665 ◽

2016 ◽

Vol 41 (5) ◽

pp. 472-505 ◽

Cited By ~ 16

Author(s):

Elizabeth Tipton ◽

Kelly Hallberg ◽

Larry V. Hedges ◽

Wendy Chan

Keyword(s):

Observational Studies ◽

Small Sample ◽

Average Treatment Effect ◽

Small Samples ◽

Sample Sizes ◽

Random Samples ◽

Rules Of Thumb ◽

Large Populations ◽

Small Sample Sizes ◽

Combine Information

Background: Policy makers and researchers are frequently interested in understanding how effective a particular intervention may be for a specific population. One approach is to assess the degree of similarity between the sample in an experiment and the population. Another approach is to combine information from the experiment and the population to estimate the population average treatment effect (PATE). Method: Several methods for assessing the similarity between a sample and population currently exist as well as methods estimating the PATE. In this article, we investigate properties of six of these methods and statistics in the small sample sizes common in education research (i.e., 10–70 sites), evaluating the utility of rules of thumb developed from observational studies in the generalization case. Result: In small random samples, large differences between the sample and population can arise simply by chance and many of the statistics commonly used in generalization are a function of both sample size and the number of covariates being compared. The rules of thumb developed in observational studies (which are commonly applied in generalization) are much too conservative given the small sample sizes found in generalization. Conclusion: This article implies that sharp inferences to large populations from small experiments are difficult even with probability sampling. Features of random samples should be kept in mind when evaluating the extent to which results from experiments conducted on nonrandom samples might generalize.

Download Full-text

45 Application of Machine Learning Models to Thermal Burn Patient Outcome Predictions in the Aftermath of a Nuclear Event

Journal of Burn Care & Research ◽

10.1093/jbcr/irab032.049 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

pp. S33-S34

Author(s):

Morgan A Taylor ◽

Randy D Kearns ◽

Jeffrey E Carter ◽

Mark H Ebell ◽

Curt A Harris

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Length Of Stay ◽

Regression Models ◽

Large Scale ◽

Prediction Models ◽

Burn Patients ◽

Thermal Burn ◽

Logistic Regression Models ◽

Burn Patient

Abstract Introduction A nuclear disaster would generate an unprecedented volume of thermal burn patients from the explosion and subsequent mass fires (Figure 1). Prediction models characterizing outcomes for these patients may better equip healthcare providers and other responders to manage large scale nuclear events. Logistic regression models have traditionally been employed to develop prediction scores for mortality of all burn patients. However, other healthcare disciplines have increasingly transitioned to machine learning (ML) models, which are automatically generated and continually improved, potentially increasing predictive accuracy. Preliminary research suggests ML models can predict burn patient mortality more accurately than commonly used prediction scores. The purpose of this study is to examine the efficacy of various ML methods in assessing thermal burn patient mortality and length of stay in burn centers. Methods This retrospective study identified patients with fire/flame burn etiologies in the National Burn Repository between the years 2009 – 2018. Patients were randomly partitioned into a 67%/33% split for training and validation. A random forest model (RF) and an artificial neural network (ANN) were then constructed for each outcome, mortality and length of stay. These models were then compared to logistic regression models and previously developed prediction tools with similar outcomes using a combination of classification and regression metrics. Results During the study period, 82,404 burn patients with a thermal etiology were identified in the analysis. The ANN models will likely tend to overfit the data, which can be resolved by ending the model training early or adding additional regularization parameters. Further exploration of the advantages and limitations of these models is forthcoming as metric analyses become available. Conclusions In this proof-of-concept study, we anticipate that at least one ML model will predict the targeted outcomes of thermal burn patient mortality and length of stay as judged by the fidelity with which it matches the logistic regression analysis. These advancements can then help disaster preparedness programs consider resource limitations during catastrophic incidents resulting in burn injuries.

Download Full-text

Implementation of Machine Learning Algorithms for Prediction of Fluidelastic Instability in Tube Arrays

Journal of Pressure Vessel Technology ◽

10.1115/1.4049876 ◽

2021 ◽

Vol 143 (2) ◽

Author(s):

Joaquin E. Moran ◽

Yasser Selima

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Two Phase ◽

Factors Affecting ◽

Logistic Regression Models ◽

Number Of Factors ◽

Tube Arrays ◽

Fluidelastic Instability

Abstract Fluidelastic instability (FEI) in tube arrays has been studied extensively experimentally and theoretically for the last 50 years, due to its potential to cause significant damage in short periods. Incidents similar to those observed at San Onofre Nuclear Generating Station indicate that the problem is not yet fully understood, probably due to the large number of factors affecting the phenomenon. In this study, a new approach for the analysis and interpretation of FEI data using machine learning (ML) algorithms is explored. FEI data for both single and two-phase flows have been collected from the literature and utilized for training a machine learning algorithm in order to either provide estimates of the reduced velocity (single and two-phase) or indicate if the bundle is stable or unstable under certain conditions (two-phase). The analysis included the use of logistic regression as a classification algorithm for two-phase flow problems to determine if specific conditions produce a stable or unstable response. The results of this study provide some insight into the capability and potential of logistic regression models to analyze FEI if appropriate quantities of experimental data are available.

Download Full-text

A Machine Learning Approach to Reveal the NeuroPhenotypes of Autisms

International Journal of Neural Systems ◽

10.1142/s0129065718500582 ◽

2019 ◽

Vol 29 (07) ◽

pp. 1850058 ◽

Cited By ~ 8

Author(s):

Juan M. Górriz ◽

Javier Ramírez ◽

F. Segovia ◽

Francisco J. Martínez ◽

Meng-Chuan Lai ◽

...

Keyword(s):

Machine Learning ◽

Brain Structure ◽

Feature Space ◽

Classification Problem ◽

Small Sample ◽

Biological Sex ◽

Machine Learning Approach ◽

Learning Machine ◽

Small Sample Sizes ◽

Low Dimensional

Although much research has been undertaken, the spatial patterns, developmental course, and sexual dimorphism of brain structure associated with autism remains enigmatic. One of the difficulties in investigating differences between the sexes in autism is the small sample sizes of available imaging datasets with mixed sex. Thus, the majority of the investigations have involved male samples, with females somewhat overlooked. This paper deploys machine learning on partial least squares feature extraction to reveal differences in regional brain structure between individuals with autism and typically developing participants. A four-class classification problem (sex and condition) is specified, with theoretical restrictions based on the evaluation of a novel upper bound in the resubstitution estimate. These conditions were imposed on the classifier complexity and feature space dimension to assure generalizable results from the training set to test samples. Accuracies above [Formula: see text] on gray and white matter tissues estimated from voxel-based morphometry (VBM) features are obtained in a sample of equal-sized high-functioning male and female adults with and without autism ([Formula: see text], [Formula: see text]/group). The proposed learning machine revealed how autism is modulated by biological sex using a low-dimensional feature space extracted from VBM. In addition, a spatial overlap analysis on reference maps partially corroborated predictions of the “extreme male brain” theory of autism, in sexual dimorphic areas.

Download Full-text

Machine Learning Models Have Better Performance than Traditional Logistic Regression Models in Predicting the Risk of Diabetes

SSRN Electronic Journal ◽

10.2139/ssrn.3854672 ◽

2021 ◽

Author(s):

Yaqian Mao ◽

Shuyao Pan ◽

Zheng Zhu ◽

Wei Lin ◽

Junping Wen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Models ◽

Learning Models ◽

Logistic Regression Models ◽

Machine Learning Models

Download Full-text

Machine Learning Model Validation for Early Stage Studies with Small Sample Sizes

10.1109/embc46164.2021.9629697 ◽

2021 ◽

Author(s):

Robyn Larracy ◽

Angkoon Phinyomark ◽

Erik Scheme

Keyword(s):

Machine Learning ◽

Model Validation ◽

Early Stage ◽

Learning Model ◽

Small Sample ◽

Sample Sizes ◽

Machine Learning Model ◽

Small Sample Sizes

Download Full-text

Taphonomy and ecology of modern avifaunal remains from Amboseli Park, Kenya

Paleobiology ◽

10.1666/0094-8373(2003)029<0052:taeoma>2.0.co;2 ◽

2003 ◽

Vol 29 (1) ◽

pp. 52-70 ◽

Cited By ~ 59

Author(s):

Anna K. Behrensmeyer ◽

C. Tristan Stayton ◽

Ralph E. Chapman

Keyword(s):

Habitat Preferences ◽

Large Body ◽

Bird Community ◽

Skeletal Remains ◽

Small Sample ◽

Small Samples ◽

Scaling Analysis ◽

Ecological Data ◽

Small Sample Sizes ◽

Grassland Habitats

Avian skeletal remains occur in many fossil assemblages, and in spite of small sample sizes and incomplete preservation, they may be a source of valuable paleoecological information. In this paper, we examine the taphonomy of a modern avian bone assemblage and test the relationship between ecological data based on avifaunal skeletal remains and known ecological attributes of a living bird community. A total of 54 modern skeletal occurrences and a sample of 126 identifiable bones from Amboseli Park, Kenya, were analyzed for weathering features and skeletal part preservation in order to characterize preservation features and taphonomic biases. Avian remains, with the exception of ostrich, decay more rapidly than adult mammal bones and rarely reach advanced stages of weathering. Breakage and the percentage of anterior limb elements serve as indicators of taphonomic overprinting that may affect paleoecological signals. Using ecomorphic categories including body weight, diet, and habitat, we compared species in the bone assemblage with the living Amboseli avifauna. The documented bone sample is biased toward large body size, representation of open grassland habitats, and grazing or scavenging diets. In spite of this, multidimensional scaling analysis shows that the small faunal sample (16 out of 364 species) in the pre-fossil bone assemblage accurately represents general features of avian ecospace in Amboseli. This provides a measure of the potential fidelity of paleoecological reconstructions based on small samples of avian remains. In the Cenozoic, the utility of avian fossils is enhanced because bird ecomorphology is relatively well known and conservative through time, allowing back-extrapolations of habitat preferences, diet, etc. based on modern taxa.

Download Full-text