Applications of single-cell genomics and computational strategies to study common disease and population-level variation

The advent and rapid development of single-cell technologies have made it possible to study cellular heterogeneity at an unprecedented resolution and scale. Cellular heterogeneity underlies phenotypic differences among individuals, and studying cellular heterogeneity is an important step toward our understanding of the disease molecular mechanism. Single-cell technologies offer opportunities to characterize cellular heterogeneity from different angles, but how to link cellular heterogeneity with disease phenotypes requires careful computational analysis. In this article, we will review the current applications of single-cell methods in human disease studies and describe what we have learned so far from existing studies about human genetic variation. As single-cell technologies are becoming widely applicable in human disease studies, population-level studies have become a reality. We will describe how we should go about pursuing and designing these studies, particularly how to select study subjects, how to determine the number of cells to sequence per subject, and the needed sequencing depth per cell. We also discuss computational strategies for the analysis of single-cell data and describe how single-cell data can be integrated with bulk tissue data and data generated from genome-wide association studies. Finally, we point out open problems and future research directions.

Download Full-text

Exploiting single-cell quantitative data to map genetic variants having probabilistic effects

10.1101/040113 ◽

2016 ◽

Cited By ~ 1

Author(s):

Florent Chuffart ◽

Magali Richard ◽

Daniel Jost ◽

Helene Duplus-Bottin ◽

Yoshikazu Ohya ◽

...

Keyword(s):

Genetic Mapping ◽

Single Cell ◽

Statistical Power ◽

Association Studies ◽

Mapping Method ◽

Cellular Level ◽

Incomplete Penetrance ◽

Genome Wide Association Studies ◽

Small Contribution ◽

Cell Technologies

Despite the recent progress in sequencing technologies, genome-wide association studies (GWAS) remain limited by a statistical-power issue: many polymorphisms contribute little to common trait variation and therefore escape detection. The small contribution sometimes corresponds to incomplete penetrance, which may result from probabilistic effects on molecular regulations. In such cases, genetic mapping may benefit from the wealth of data produced by single-cell technologies. We present here the development of a novel genetic mapping method that allows to scan genomes for single-cell Probabilistic Trait Loci that modify the statistical properties of cellular-level quantitative traits. Phenotypic values are acquired on thousands of individual cells, and genetic association is obtained from a multivariate analysis of a matrix of Kantorovich distances. No prior assumption is required on the mode of action of the genetic loci involved and, by exploiting all single-cell values, the method can reveal non-deterministic effects. Using both simulations and yeast experimental datasets, we show that it can detect linkages that are missed by classical genetic mapping. A probabilistic effect of a single SNP on cell shape was detected and validated. The method also detected a novel locus associated with elevated gene expression noise of the yeast galactose regulon. Our results illustrate how single-cell technologies can be exploited to improve the genetic dissection of certain common traits.

Download Full-text

Data safe havens to combine health and genomic data: benefits and challenges

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.348 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Kerina H Jones ◽

Arron S Lacey ◽

Brian L Perkins ◽

Mark I Rees

Keyword(s):

Association Studies ◽

Genomic Data ◽

Population Level ◽

Data Availability ◽

Genome Wide Association Studies ◽

Related Data ◽

Research Areas ◽

Individual Privacy ◽

Access Controls ◽

Health Related

ABSTRACTObjectivesData safe havens can bring together and combine a rich array of anonymised person-based data for research and policy evaluation within a secure setting. To date, the majority of available datasets have been structured micro-data derived from routine health-related records. Possibilities are opening up for the greater reuse of genomic data such as Genome Wide Association studies (GWAS) and Whole Exome/Genome Sequencing (WES or WGS). However, there are considerable challenges to be addressed if the benefits of using these data in combination with health-related data are to be realized safely. ApproachWe explore the benefits and challenges of using genomic datasets with health-related data, and using the Secure Anonymised Information Linkage (SAIL) system as a case study, the implications and way forward for Data Safe Havens in seeking to incorporate genomic data for use with health-related data. ResultsThe benefits of using GWAS, WES and WGS data in conjunction with health-related data include the potential to explore genetics at a population level and open up novel research areas. These include the ability to increasingly stratify and personalize how medical indications are detected and treated through precision medicine by understanding rare conditions and adding socioeconomic and environmental context to genomic data. Among the challenges are: data availability, computing capacity, technical solutions, legal and regulatory frameworks, public perceptions, individual privacy and organizational risk. Many of the challenges within these areas are common to person-based data in general, and often Data Safe Havens have been designed to address these. But there are also aspects of these challenges, and other challenges, specific to genomic data. These include issues due to the unknown clinical significance of genomic information now or in the future, with corresponding risks for privacy and impact on individuals. ConclusionGenomic data sets contain vast amounts of valuable information, some of which is currently undefined, but which may have direct bearing on individual health at some point. The use of these data in combination with health-related data has the potential to bring great benefits, better clinical trial stratification, epidemiology project design and clinical improvements. It is, therefore, essential that such data are surrounded by a properly-designed, robust governance framework including technical and procedural access controls that enable the data to be used safely.

Download Full-text

VoPo leverages cellular heterogeneity for predictive modeling of single-cell data

Nature Communications ◽

10.1038/s41467-020-17569-8 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 2

Author(s):

Natalie Stanley ◽

Ina A. Stelzer ◽

Amy S. Tsai ◽

Ramin Fallahzadeh ◽

Edward Ganio ◽

...

Keyword(s):

Single Cell ◽

Predictive Modeling ◽

Cellular Heterogeneity ◽

Cell Data

Download Full-text

Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data

Bioinformatics ◽

10.1093/bioinformatics/btz333 ◽

2019 ◽

Vol 35 (14) ◽

pp. i427-i435 ◽

Cited By ~ 3

Author(s):

Héctor Climente-González ◽

Chloé-Agathe Azencott ◽

Samuel Kaski ◽

Makoto Yamada

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Association Studies ◽

Real Data ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Model Free ◽

Computational Overhead ◽

Single Cell Rna Sequencing ◽

Non Linear

AbstractMotivationFinding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block HSIC Lasso, a non-linear feature selector that does not present the previous drawbacks.ResultsWe compare block HSIC Lasso to other state-of-the-art feature selection techniques in both synthetic and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA sequencing and genome-wide association studies. In all cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than those selected by other techniques. As a proof of concept, we applied block HSIC Lasso to a single-cell RNA sequencing experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons.Availability and implementationBlock HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available on PyPI. Source code is available on GitHub (https://github.com/riken-aip/pyHSICLasso).Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Inositol 1,4,5-Trisphosphate Receptors in Human Disease: A Comprehensive Update

Journal of Clinical Medicine ◽

10.3390/jcm9041096 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1096

Author(s):

Jessica Gambardella ◽

Angela Lombardi ◽

Marco Bruno Morelli ◽

John Ferrara ◽

Gaetano Santulli

Keyword(s):

Human Disease ◽

Calcium Release ◽

Current Knowledge ◽

Association Studies ◽

Genome Wide Association Studies ◽

Loss Of Function ◽

Genome Wide ◽

Inositol 1,4,5 Trisphosphate ◽

Human Disorders

Inositol 1,4,5-trisphosphate receptors (ITPRs) are intracellular calcium release channels located on the endoplasmic reticulum of virtually every cell. Herein, we are reporting an updated systematic summary of the current knowledge on the functional role of ITPRs in human disorders. Specifically, we are describing the involvement of its loss-of-function and gain-of-function mutations in the pathogenesis of neurological, immunological, cardiovascular, and neoplastic human disease. Recent results from genome-wide association studies are also discussed.

Download Full-text

Single-Cell Microbiology: Tools, Technologies, and Applications

Microbiology and Molecular Biology Reviews ◽

10.1128/mmbr.68.3.538-559.2004 ◽

2004 ◽

Vol 68 (3) ◽

pp. 538-559 ◽

Cited By ~ 333

Author(s):

Byron F. Brehm-Stecher ◽

Eric A. Johnson

Keyword(s):

Single Cell ◽

Population Level ◽

Cellular Heterogeneity ◽

Food Preservatives ◽

Complex Processes ◽

Biocide Resistance ◽

Level Data ◽

Level Information ◽

Or Gene ◽

Tools And Techniques

SUMMARY The field of microbiology has traditionally been concerned with and focused on studies at the population level. Information on how cells respond to their environment, interact with each other, or undergo complex processes such as cellular differentiation or gene expression has been obtained mostly by inference from population-level data. Individual microorganisms, even those in supposedly “clonal” populations, may differ widely from each other in terms of their genetic composition, physiology, biochemistry, or behavior. This genetic and phenotypic heterogeneity has important practical consequences for a number of human interests, including antibiotic or biocide resistance, the productivity and stability of industrial fermentations, the efficacy of food preservatives, and the potential of pathogens to cause disease. New appreciation of the importance of cellular heterogeneity, coupled with recent advances in technology, has driven the development of new tools and techniques for the study of individual microbial cells. Because observations made at the single-cell level are not subject to the “averaging” effects characteristic of bulk-phase, population-level methods, they offer the unique capacity to observe discrete microbiological phenomena unavailable using traditional approaches. As a result, scientists have been able to characterize microorganisms, their activities, and their interactions at unprecedented levels of detail.

Download Full-text

What Can Genome-Wide Association Studies Tell Us about the Genetics of Common Disease?

PLoS Genetics ◽

10.1371/journal.pgen.0040033 ◽

2008 ◽

Vol 4 (2) ◽

pp. e33 ◽

Cited By ~ 93

Author(s):

Mark M Iles

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Common Disease ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Leveraging mouse chromatin data for heritability enrichment informs common disease architecture and reveals cortical layer contributions to schizophrenia

10.1101/427484 ◽

2018 ◽

Cited By ~ 1

Author(s):

Paul W. Hook ◽

Andrew S. McCallion

Keyword(s):

Association Studies ◽

Cortical Layer ◽

Genome Wide Association ◽

Cell Populations ◽

Common Disease ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Excitatory Neurons ◽

Cellular Context

Genome-wide association studies have implicated thousands of non-coding variants across human phenotypes. However, they cannot directly inform the cellular context in which disease-associated variants act. Here, we use open chromatin profiles from discrete mouse cell populations to address this challenge. We applied stratified linkage disequilibrium score regression and evaluated heritability enrichment in 64 genome-wide association studies, emphasizing schizophrenia. We provide evidence that mouse-derived human open chromatin profiles can serve as powerful proxies for difficult to obtain human cell populations, facilitating the illumination of common disease heritability enrichment across an array of human phenotypes. We demonstrate signatures from discrete subpopulations of cortical excitatory and inhibitory neurons are significantly enriched for schizophrenia heritability with maximal enrichment in discrete cortical layer V excitatory neurons. We also show differences between schizophrenia and bipolar disorder are concentrated in excitatory neurons in layers II-III, IV, V as well as the dentate gyrus. Finally, we use these data to fine-map variants in 177 schizophrenia loci, nominating variants in 104/177 loci, and place them in the cellular context where they may modulate risk.

Download Full-text

Editorial: Integrative Analysis of Genome-Wide Association Studies and Single-Cell Sequencing Studies

Frontiers in Genetics ◽

10.3389/fgene.2021.752057 ◽

2021 ◽

Vol 12 ◽

Author(s):

Shiquan Sun ◽

Sheng Yang

Keyword(s):

Single Cell ◽

Association Studies ◽

Integrative Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Single Cell Sequencing ◽

Genome Wide ◽

Sequencing Studies

Download Full-text

Exome sequencing and genotyping identify a rare variant in NLRP7 gene associated with ulcerative colitis

10.1101/182113 ◽

2017 ◽

Author(s):

Alexandros Onoufriadis ◽

Kristina Stone ◽

Antreas Katsiamides ◽

Ariella Amar ◽

Yasmin Omar ◽

...

Keyword(s):

Ulcerative Colitis ◽

Exome Sequencing ◽

Genetic Variants ◽

Odds Ratio ◽

High Throughput Sequencing ◽

Association Studies ◽

Common Disease ◽

Genome Wide Association Studies ◽

Increased Risk ◽

Coding Variants

AbstractBackground and aimsAlthough genome-wide association studies (GWAS) in inflammatory bowel disease (IBD) have identified a large number of common disease susceptibility alleles for both Crohn’s disease (CD) and ulcerative colitis (UC), a substantial fraction of IBD heritability remains unexplained, suggesting that rare coding genetic variants may also have a role in pathogenesis. We used high-throughput sequencing in families with multiple cases of IBD, followed by genotyping of cases and controls, to investigate whether rare protein altering genetic variants are associated with susceptibility to IBD.MethodsWhole exome sequencing was carried out in 10 families in which 3 or more individuals were affected with IBD. A stepwise filtering approach was applied to exome variants to identify potential causal variants. Follow-up genotyping was performed in 6,025 IBD cases (2,948 CD; 3,077 UC) and 7,238 controls.ResultsOur exome variant analysis revealed coding variants in the NLRP7 gene that were present in affected individuals in two distinct families. Genotyping of the two variants, p.S361L and p.R801H, in IBD cases and controls showed that the p.S361L variant was significantly associated with an increased risk of ulcerative colitis (odds ratio 4.79, p=0.0039) and IBD (odds ratio 3.17, p=0.037). A combined analysis of both variants showed suggestive association with an increased risk of IBD (odds ratio 2.77, p=0.018).ConclusionsThe results suggest that NLRP7 signalling and inflammasome formation may be a significant component in the pathogenesis of IBD.

Download Full-text