scholarly journals Topology and Geometry for Small Sample Sizes: An Application to Research on the Profoundly Gifted

2018 ◽  
Author(s):  
Colleen Molloy Farrelly

This study aims to confirm prior findings on the usefulness of topological data analysis (TDA) in the analysis of small samples, particularly focused on cohorts of profoundly gifted students, as well as explore the use of TDA-based regression methods for statistical modeling with small samples. A subset of the Gross sample is analyzed through supervised and unsupervised methods, including 16 and 17 individuals, respectively. Unsupervised learning confirmed prior results suggesting that evenly gifted and unevenly gifted subpopulations fundamentally differ. Supervised learning focused on predicting graduate school attendance and awards earned during undergraduate studies, and TDA-based logistic regression models were compared with more traditional machine learning models for logistic regression. Results suggest 1) that TDA-based methods are capable of handing small samples and seem more robust to the issues that arise in small samples than other machine learning methods and 2) that early childhood achievement scores and several factors related to childhood education interventions (such as early entry and radical acceleration) play a role in predicting key educational and professional achievements in adulthood. Possible new directions from this work include the use of TDA-based tools in the analysis of rare cohorts thus-far relegated to qualitative analytics or case studies, as well as potential exploration of early educational factors and adult-level achievement in larger populations of the profoundly gifted, particularly within the Study of Exceptional Talent and Talent Identification Program cohorts.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Florent Le Borgne ◽  
Arthur Chatton ◽  
Maxime Léger ◽  
Rémi Lenain ◽  
Yohann Foucher

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.


2017 ◽  
Author(s):  
Colleen Molloy Farrelly

Studies of highly and profoundly gifted children typically involve small sample sizes, as the population is relatively rare, and many statistical methods cannot handle these small sample sizes well. However, topological data analysis (TDA) tools are robust, even with very small samples, and can provide useful information as well as robust statistical tests.This study demonstrates these capabilities on data simulated from previous talent search results (small and large samples), as well as a subset of data from Ruf’s cohort of gifted children. TDA methods show strong, robust performance and uncover insight into sample characteristics and subgroups, including the appearance of similar subgroups across assessment populations.


2016 ◽  
Vol 41 (5) ◽  
pp. 472-505 ◽  
Author(s):  
Elizabeth Tipton ◽  
Kelly Hallberg ◽  
Larry V. Hedges ◽  
Wendy Chan

Background: Policy makers and researchers are frequently interested in understanding how effective a particular intervention may be for a specific population. One approach is to assess the degree of similarity between the sample in an experiment and the population. Another approach is to combine information from the experiment and the population to estimate the population average treatment effect (PATE). Method: Several methods for assessing the similarity between a sample and population currently exist as well as methods estimating the PATE. In this article, we investigate properties of six of these methods and statistics in the small sample sizes common in education research (i.e., 10–70 sites), evaluating the utility of rules of thumb developed from observational studies in the generalization case. Result: In small random samples, large differences between the sample and population can arise simply by chance and many of the statistics commonly used in generalization are a function of both sample size and the number of covariates being compared. The rules of thumb developed in observational studies (which are commonly applied in generalization) are much too conservative given the small sample sizes found in generalization. Conclusion: This article implies that sharp inferences to large populations from small experiments are difficult even with probability sampling. Features of random samples should be kept in mind when evaluating the extent to which results from experiments conducted on nonrandom samples might generalize.


2021 ◽  
Vol 42 (Supplement_1) ◽  
pp. S33-S34
Author(s):  
Morgan A Taylor ◽  
Randy D Kearns ◽  
Jeffrey E Carter ◽  
Mark H Ebell ◽  
Curt A Harris

Abstract Introduction A nuclear disaster would generate an unprecedented volume of thermal burn patients from the explosion and subsequent mass fires (Figure 1). Prediction models characterizing outcomes for these patients may better equip healthcare providers and other responders to manage large scale nuclear events. Logistic regression models have traditionally been employed to develop prediction scores for mortality of all burn patients. However, other healthcare disciplines have increasingly transitioned to machine learning (ML) models, which are automatically generated and continually improved, potentially increasing predictive accuracy. Preliminary research suggests ML models can predict burn patient mortality more accurately than commonly used prediction scores. The purpose of this study is to examine the efficacy of various ML methods in assessing thermal burn patient mortality and length of stay in burn centers. Methods This retrospective study identified patients with fire/flame burn etiologies in the National Burn Repository between the years 2009 – 2018. Patients were randomly partitioned into a 67%/33% split for training and validation. A random forest model (RF) and an artificial neural network (ANN) were then constructed for each outcome, mortality and length of stay. These models were then compared to logistic regression models and previously developed prediction tools with similar outcomes using a combination of classification and regression metrics. Results During the study period, 82,404 burn patients with a thermal etiology were identified in the analysis. The ANN models will likely tend to overfit the data, which can be resolved by ending the model training early or adding additional regularization parameters. Further exploration of the advantages and limitations of these models is forthcoming as metric analyses become available. Conclusions In this proof-of-concept study, we anticipate that at least one ML model will predict the targeted outcomes of thermal burn patient mortality and length of stay as judged by the fidelity with which it matches the logistic regression analysis. These advancements can then help disaster preparedness programs consider resource limitations during catastrophic incidents resulting in burn injuries.


2021 ◽  
Vol 143 (2) ◽  
Author(s):  
Joaquin E. Moran ◽  
Yasser Selima

Abstract Fluidelastic instability (FEI) in tube arrays has been studied extensively experimentally and theoretically for the last 50 years, due to its potential to cause significant damage in short periods. Incidents similar to those observed at San Onofre Nuclear Generating Station indicate that the problem is not yet fully understood, probably due to the large number of factors affecting the phenomenon. In this study, a new approach for the analysis and interpretation of FEI data using machine learning (ML) algorithms is explored. FEI data for both single and two-phase flows have been collected from the literature and utilized for training a machine learning algorithm in order to either provide estimates of the reduced velocity (single and two-phase) or indicate if the bundle is stable or unstable under certain conditions (two-phase). The analysis included the use of logistic regression as a classification algorithm for two-phase flow problems to determine if specific conditions produce a stable or unstable response. The results of this study provide some insight into the capability and potential of logistic regression models to analyze FEI if appropriate quantities of experimental data are available.


2019 ◽  
Vol 29 (07) ◽  
pp. 1850058 ◽  
Author(s):  
Juan M. Górriz ◽  
Javier Ramírez ◽  
F. Segovia ◽  
Francisco J. Martínez ◽  
Meng-Chuan Lai ◽  
...  

Although much research has been undertaken, the spatial patterns, developmental course, and sexual dimorphism of brain structure associated with autism remains enigmatic. One of the difficulties in investigating differences between the sexes in autism is the small sample sizes of available imaging datasets with mixed sex. Thus, the majority of the investigations have involved male samples, with females somewhat overlooked. This paper deploys machine learning on partial least squares feature extraction to reveal differences in regional brain structure between individuals with autism and typically developing participants. A four-class classification problem (sex and condition) is specified, with theoretical restrictions based on the evaluation of a novel upper bound in the resubstitution estimate. These conditions were imposed on the classifier complexity and feature space dimension to assure generalizable results from the training set to test samples. Accuracies above [Formula: see text] on gray and white matter tissues estimated from voxel-based morphometry (VBM) features are obtained in a sample of equal-sized high-functioning male and female adults with and without autism ([Formula: see text], [Formula: see text]/group). The proposed learning machine revealed how autism is modulated by biological sex using a low-dimensional feature space extracted from VBM. In addition, a spatial overlap analysis on reference maps partially corroborated predictions of the “extreme male brain” theory of autism, in sexual dimorphic areas.


Paleobiology ◽  
2003 ◽  
Vol 29 (1) ◽  
pp. 52-70 ◽  
Author(s):  
Anna K. Behrensmeyer ◽  
C. Tristan Stayton ◽  
Ralph E. Chapman

Avian skeletal remains occur in many fossil assemblages, and in spite of small sample sizes and incomplete preservation, they may be a source of valuable paleoecological information. In this paper, we examine the taphonomy of a modern avian bone assemblage and test the relationship between ecological data based on avifaunal skeletal remains and known ecological attributes of a living bird community. A total of 54 modern skeletal occurrences and a sample of 126 identifiable bones from Amboseli Park, Kenya, were analyzed for weathering features and skeletal part preservation in order to characterize preservation features and taphonomic biases. Avian remains, with the exception of ostrich, decay more rapidly than adult mammal bones and rarely reach advanced stages of weathering. Breakage and the percentage of anterior limb elements serve as indicators of taphonomic overprinting that may affect paleoecological signals. Using ecomorphic categories including body weight, diet, and habitat, we compared species in the bone assemblage with the living Amboseli avifauna. The documented bone sample is biased toward large body size, representation of open grassland habitats, and grazing or scavenging diets. In spite of this, multidimensional scaling analysis shows that the small faunal sample (16 out of 364 species) in the pre-fossil bone assemblage accurately represents general features of avian ecospace in Amboseli. This provides a measure of the potential fidelity of paleoecological reconstructions based on small samples of avian remains. In the Cenozoic, the utility of avian fossils is enhanced because bird ecomorphology is relatively well known and conservative through time, allowing back-extrapolations of habitat preferences, diet, etc. based on modern taxa.


Sign in / Sign up

Export Citation Format

Share Document