A Transfer Learning Method for Deep Networks with Small Sample Sizes

Transfer Learning Compensates Limited Data, Batch-Effects, And Technical Heterogeneity In Single-Cell Sequencing

10.1101/2021.07.23.453486 ◽

2021 ◽

Author(s):

Youngjun Park ◽

Anne-Christin Hauschild ◽

Dominik Heider

Keyword(s):

Pattern Recognition ◽

Single Cell ◽

Transfer Learning ◽

Small Sample ◽

Batch Effects ◽

Sample Sizes ◽

Molecular Pattern ◽

Meta Learning ◽

Small Sample Sizes ◽

Cell Data

Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity, and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as meta-learning dataset in other molecular pattern recognition tasks. Our results show that transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, e.g., from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects, and technological limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data.

Download Full-text

Multiple task transfer learning with small sample sizes

Knowledge and Information Systems ◽

10.1007/s10115-015-0821-z ◽

2015 ◽

Vol 46 (2) ◽

pp. 315-342 ◽

Cited By ~ 15

Author(s):

Budhaditya Saha ◽

Sunil Gupta ◽

Dinh Phung ◽

Svetha Venkatesh

Keyword(s):

Transfer Learning ◽

Small Sample ◽

Sample Sizes ◽

Task Transfer ◽

Small Sample Sizes ◽

Multiple Task

Download Full-text

Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab104 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Youngjun Park ◽

Anne-Christin Hauschild ◽

Dominik Heider

Keyword(s):

Pattern Recognition ◽

Single Cell ◽

Transfer Learning ◽

Small Sample ◽

The Cancer Genome Atlas ◽

Batch Effects ◽

Sample Sizes ◽

Molecular Pattern ◽

Small Sample Sizes ◽

Cell Data

Abstract Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-transfer learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as pre-training dataset in other molecular pattern recognition tasks. Our results show that meta-transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, for example, from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects and technical limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data.

Download Full-text

Problems with small sample sizes in psychophysiological research

PsycEXTRA Dataset ◽

10.1037/e526132012-267 ◽

1996 ◽

Author(s):

Todd C. Riniolo ◽

Stephen W. Porges

Keyword(s):

Small Sample ◽

Sample Sizes ◽

Psychophysiological Research ◽

Small Sample Sizes

Download Full-text

Bayesian Latent Growth Mixture-Modeling With Small Sample Sizes

PsycEXTRA Dataset ◽

10.1037/e568142014-001 ◽

2014 ◽

Author(s):

Sarah Depaoli

Keyword(s):

Growth Mixture Modeling ◽

Mixture Modeling ◽

Small Sample ◽

Sample Sizes ◽

Latent Growth ◽

Growth Mixture ◽

Latent Growth Mixture Modeling ◽

Small Sample Sizes

Download Full-text

No Evidence that Experiencing Physical Warmth Promotes Interpersonal Warmth: Two Failures to Replicate Williams and Bargh (2008)

10.31234/osf.io/mvn9b ◽

2018 ◽

Cited By ~ 1

Author(s):

Christopher Chabris ◽

Patrick Ryan Heck ◽

Jaclyn Mandart ◽

Daniel Jacob Benjamin ◽

Daniel J. Simons

Keyword(s):

Null Hypothesis ◽

Small Sample ◽

Sample Sizes ◽

Double Blind ◽

Bayesian Analyses ◽

Physical Warmth ◽

Small Sample Sizes ◽

Interpersonal Warmth

Williams and Bargh (2008) reported that holding a hot cup of coffee caused participants to judge a person’s personality as warmer, and that holding a therapeutic heat pad caused participants to choose rewards for other people rather than for themselves. These experiments featured large effects (r = .28 and .31), small sample sizes (41 and 53 participants), and barely statistically significant results. We attempted to replicate both experiments in field settings with more than triple the sample sizes (128 and 177) and double-blind procedures, but found near-zero effects (r = –.03 and .02). In both cases, Bayesian analyses suggest there is substantially more evidence for the null hypothesis of no effect than for the original physical warmth priming hypothesis.

Download Full-text

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Scientific Reports ◽

10.1038/s41598-021-81110-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Florent Le Borgne ◽

Arthur Chatton ◽

Maxime Léger ◽

Rémi Lenain ◽

Yohann Foucher

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Statistical Power ◽

Small Sample ◽

Causal Effects ◽

Small Samples ◽

Support Vector ◽

Sample Sizes ◽

Super Learner ◽

Small Sample Sizes

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.

Download Full-text

What can we Learn from Studies Based on Small Sample Sizes? Comment on Regan, Lakhanpal, and Anguiano (2012)

Psychological Reports ◽

10.2466/21.02.07.pr0.113x12z8 ◽

2013 ◽

Vol 113 (1) ◽

pp. 221-224 ◽

Cited By ~ 3

Author(s):

David R. Johnson ◽

Lauren K. Bachan

Keyword(s):

Sample Size ◽

Reporting rigorous qualitative results: Moving beyond small sample sizes

Australian Occupational Therapy Journal ◽

10.1111/1440-1630.12475 ◽

2018 ◽

Vol 65 (2) ◽

pp. 77-78

Author(s):

Genevieve Pepin

Keyword(s):

Small Sample ◽

Sample Sizes ◽

Small Sample Sizes

Download Full-text

Implications of Small Samples for Generalization: Adjustments and Rules of Thumb

Evaluation Review ◽

10.1177/0193841x16655665 ◽

2016 ◽

Vol 41 (5) ◽

pp. 472-505 ◽

Cited By ~ 16

Author(s):

Elizabeth Tipton ◽

Kelly Hallberg ◽

Larry V. Hedges ◽

Wendy Chan

Keyword(s):

Observational Studies ◽

Small Sample ◽

Average Treatment Effect ◽

Small Samples ◽

Sample Sizes ◽

Random Samples ◽

Rules Of Thumb ◽

Large Populations ◽

Small Sample Sizes ◽

Combine Information

Background: Policy makers and researchers are frequently interested in understanding how effective a particular intervention may be for a specific population. One approach is to assess the degree of similarity between the sample in an experiment and the population. Another approach is to combine information from the experiment and the population to estimate the population average treatment effect (PATE). Method: Several methods for assessing the similarity between a sample and population currently exist as well as methods estimating the PATE. In this article, we investigate properties of six of these methods and statistics in the small sample sizes common in education research (i.e., 10–70 sites), evaluating the utility of rules of thumb developed from observational studies in the generalization case. Result: In small random samples, large differences between the sample and population can arise simply by chance and many of the statistics commonly used in generalization are a function of both sample size and the number of covariates being compared. The rules of thumb developed in observational studies (which are commonly applied in generalization) are much too conservative given the small sample sizes found in generalization. Conclusion: This article implies that sharp inferences to large populations from small experiments are difficult even with probability sampling. Features of random samples should be kept in mind when evaluating the extent to which results from experiments conducted on nonrandom samples might generalize.

Download Full-text