scholarly journals STARCH: Copy number and clone inference from spatial transcriptomics data

Author(s):  
Rebecca Elyanow ◽  
Ron Zeira ◽  
Max Land ◽  
Benjamin J. Raphael

AbstractTumors are highly heterogeneous, consisting of cell populations with both transcriptional and genetic diversity. These diverse cell populations are spatially organized within a tumor, creating a distinct tumor microenvironment. A new technology called spatial transcriptomics can measure spatial patterns of gene expression within a tissue by sequencing RNA transcripts from a grid of spots, each containing a small number of cells. In tumor cells, these gene expression patterns represent the combined contribution of regulatory mechanisms, which alter the rate at which a gene is transcribed, and genetic diversity, particularly copy number aberrations (CNAs) which alter the number of copies of a gene in the genome. CNAs are common in tumors and often promote cancer growth through upregulation of oncogenes or downregulation of tumor-suppressor genes. We introduce a new method STARCH (Spatial Transcriptomics Algorithm Reconstructing Copy-number Heterogeneity) to infer CNAs from spatial transcriptomics data. STARCH overcomes challenges in inferring CNAs from RNA-sequencing data by leveraging the observation that cells located nearby in a tumor are likely to share similar CNAs. We find that STARCH outperforms existing methods for inferring CNAs from RNA-sequencing data without incorporating spatial information.

2020 ◽  
Author(s):  
Edward Zhao ◽  
Matthew R. Stone ◽  
Xing Ren ◽  
Thomas Pulliam ◽  
Paul Nghiem ◽  
...  

AbstractRecently developed spatial gene expression technologies such as the Spatial Transcriptomics and Visium platforms allow for comprehensive measurement of transcriptomic profiles while retaining spatial context. However, existing methods for analyzing spatial gene expression data often do not efficiently leverage the spatial information and fail to address the limited resolution of the technology. Here, we introduce BayesSpace, a fully Bayesian statistical method for clustering analysis and resolution enhancement of spatial transcriptomics data that seamlessly integrates into current transcriptomics analysis workflows. We show that BayesSpace improves the identification of transcriptionally distinct tissues from spatial transcriptomics samples of the brain, of melanoma, and of squamous cell carcinoma. In particular, BayesSpace’s improved resolution allows the identification of tissue structure that is not detectable at the original resolution and thus not recovered by other methods. Using an in silico dataset constructed from scRNA-seq, we demonstrate that BayesSpace can spatially resolve expression patterns to near single-cell resolution without the need for external single-cell sequencing data. In all, our results illustrate the utility BayesSpace has in facilitating the discovery of biological insights from a variety of spatial transcriptomics datasets.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
Floranne Boulogne ◽  
Laura Claus ◽  
Henry Wiersma ◽  
Roy Oelen ◽  
Floor Schukking ◽  
...  

Abstract Background and Aims Genetic testing in patients with suspected hereditary kidney disease does not always reveal the genetic cause for the patient's disorder. Potentially pathogenic variants can reside in genes that are not known to be involved in kidney disease, which makes it difficult to prioritize and interpret the relevance of these variants. As such, there is a clear need for methods that predict the phenotypic consequences of gene expression in a way that is as unbiased as possible. To help identify candidate genes we have developed KidneyNetwork, in which tissue-specific expression is utilized to predict kidney-specific gene functions. Method We combined gene co-expression in 878 publicly available kidney RNA-sequencing samples with the co-expression of a multi-tissue RNA-sequencing dataset of 31,499 samples to build KidneyNetwork. The expression patterns were used to predict which genes have a kidney-related function, and which (disease) phenotypes might be caused when these genes are mutated. By integrating the information from the HPO database, in which known phenotypic consequences of disease genes are annotated, with the gene co-expression network we obtained prediction scores for each gene per HPO term. As proof of principle, we applied KidneyNetwork to prioritize variants in exome-sequencing data from 13 kidney disease patients without a genetic diagnosis. Results We assessed the prediction performance of KidneyNetwork by comparing it to GeneNetwork, a multi-tissue co-expression network we previously developed. In KidneyNetwork, we observe a significantly improved prediction accuracy of kidney-related HPO-terms, as well as an increase in the total number of significantly predicted kidney-related HPO-terms (figure 1). To examine its clinical utility, we applied KidneyNetwork to 13 patients with a suspected hereditary kidney disease without a genetic diagnosis. Based on the HPO terms “Renal cyst” and “Hepatic cysts”, combined with a list of potentially damaging variants in one of the undiagnosed patients with mild ADPKD/PCLD, we identified ALG6 as a new candidate gene. ALG6 bears a high resemblance to other genes implicated in this phenotype in recent years. Through the 100,000 Genomes Project and collaborators we identified three additional patients with kidney and/or liver cysts carrying a suspected deleterious variant in ALG6. Conclusion We present KidneyNetwork, a kidney specific co-expression network that accurately predicts what genes have kidney-specific functions and may result in kidney disease. Gene-phenotype associations of genes unknown for kidney-related phenotypes can be predicted by KidneyNetwork. We show the added value of KidneyNetwork by applying it to exome sequencing data of kidney disease patients without a molecular diagnosis and consequently we propose ALG6 as a promising candidate gene. KidneyNetwork can be applied to clinically unsolved kidney disease cases, but it can also be used by researchers to gain insight into individual genes to better understand kidney physiology and pathophysiology. Acknowledgments This research was made possible through access to the data and findings generated by the 100,000 Genomes Project; http://www.genomicsengland.co.uk.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 374-374 ◽  
Author(s):  
Chase Miller ◽  
Jennifer Yesil ◽  
Mary Derome ◽  
Andrea Donnelly ◽  
Jean Marrian ◽  
...  

Abstract Fluorescent in situ hybridization (FISH) is commonly used in the multiple myeloma field to subtype and risk-stratify patients. There are many benefits to FISH based assays, which are widely used around the world and represent true single cell assays. However, there are significant discrepancies in the specific assays, utilization of reflex testing strategies, and enumeration requirements between clinical centers. By comparison next-generation sequencing tests can be designed to simultaneously detect the copy number abnormalities and translocations detected by clinical FISH along with gene mutations that cannot be detected by FISH. As part of the MMRF CoMMpass Study we have compared the results attained using clinical FISH assays compared to sequencing based FISH (Seq-FISH) results. Clinical FISH reports from a random subset of 339 CoMMpass patients were extraction by a single individual based on the ISCN result lines of each report. To validate the accuracy of the central data extraction, two independent cross validations of 10% of the cohort were performed, after which our data entry error rate is expected to be less than 0.348%. The Seq-FISH results were extracted from the whole genome sequencing data available from each patient using a rapid and fully automated informatics process and the results were cross-validated using the matching exome sequencing data for copy number abnormalities and by RNA sequencing data for dysregulated immunoglobulin translocation target genes. There were 230 patients with clinical FISH and Seq-FISH results. In this cohort, 151 translocations were identified by Seq-FISH. This includes translocations to MYC, CCND2, MAFA, and those involving IgK and IgL, which are not tested by clinical FISH. After filtering non-tested translocations there are 118 translocations identified by Seq-FISH. Only 97 of these translocations had a clinical FISH assay performed with 89 (91.75%) of these being detected by clinical FISH, yet spiked target gene expression was observed in all 89 cases by RNA sequencing. Conversely, 93 translocations were called by clinical FISH, of these 89 were called by Seq-FISH(95.7%). Of the 4 translocations only called by clinical FISH, 3 were t(4;14) and 1 was a t(11;14). In two of these t(4;14) cases we did observe spiked target gene expression by RNA sequencing, suggesting these are false negatives by Seq-FISH. However, the remaining two events appear to be false positive clinical FISH results. The t(4;14) event was only observed in 1/200 cells and a co-occuring t(11;14) was also called, which was confirmed by Seq-FISH and spiked gene expression. Similarly, the one t(11;14) was observed in 3/56 cells but a del13q14 was seen in 47/50 cells, unfortunately RNA sequencing data is not available to cross-validate in this case. Plasma cell enrichment or identification is commonly used to prepare myeloma samples for FISH because even in myeloma, the total plasma cell percentage can be low (median 8.3% in the MMRF CoMMpass Baseline Cohort). Therefore, performing FISH on a sample without performing purification or plasma cell identification will indiscriminately assay non-plasma cells and limit the efficacy of the assay. We looked at the two most common translocations in myeloma, t(4;14) and t(11;14), to test the effect of enrichment on sensitivity. Sensitivity was higher for both sets of translocations in the enriched cohort. There was 1 false negative in the enriched population, yielding sensitivities of 100% (32/32) and 95%(19/20) for CCND1 and WHSC1 respectively. For those reports that did not indicate enrichment was performed the observed sensitivities were 86.36% (19/22) and 92.86% (13/14). Seq-FISH identified almost all of the translocations called by clinical FISH and simultaneously; it identified 30 translocations missed by clinical FISH. The translocations that were not reported by clinical FISH can be attributed to a mixture of the correct assay not being performed and the translocation being missed even though the assay was performed. We believe that Seq-FISH is a viable alternative to clinical FISH, with similar specificity and greater sensitivity. It is important to note that a single Seq-FISH assay is sufficient to investigate all translocations, while each translocation must be investigated separately with clinical FISH. As such, Seq-FISH obviates the concern that a translocation would be missed because the correct assay was not performed. Disclosures McBride: Instat: Employment.


2020 ◽  
Vol 62 (3-4) ◽  
pp. 119-134
Author(s):  
Mona Rams ◽  
Tim Conrad

AbstractExtracting information from large biological datasets is a challenging task, due to the large data size, high-dimensionality, noise, and errors in the data. Gene expression data contains information about which gene products have been formed by a cell, thus representing which genes have been read to activate a particular biological process. Understanding which of these gene products can be related to which processes can for example give insights about how diseases evolve and might give hints about how to fight them.The Next Generation RNA-sequencing method emerged over a decade ago and is nowadays state-of-the-art in the field of gene expression analyses. However, analyzing these large, complex datasets is still a challenging task. Many of the existing methods do not take into account the underlying structure of the data.In this paper, we present a new approach for RNA-sequencing data analysis based on dictionary learning. Dictionary learning is a sparsity enforcing method that has widely been used in many fields, such as image processing, pattern classification, signal denoising and more. We show how for RNA-sequencing data, the atoms in the dictionary matrix can be interpreted as modules of genes that either capture patterns specific to different types, or else represent modules that are reused across different scenarios. We evaluate our approach on four large datasets with samples from multiple types. A Gene Ontology term analysis, which is a standard tool indicated to help understanding the functions of genes, shows that the found gene-sets are in agreement with the biological context of the sample types. Further, we find that the sparse representations of samples using the dictionary can be used to identify type-specific differences.


Author(s):  
Hyundoo Jeong ◽  
Zhandong Liu

AbstractSingle-cell RNA sequencing technology provides a novel means to analyze the transcriptomic profiles of individual cells. The technique is vulnerable, however, to a type of noise called dropout effects, which lead to zero-inflated distributions in the transcriptome profile and reduce the reliability of the results. Single-cell RNA sequencing data therefore need to be carefully processed before in-depth analysis. Here we describe a novel imputation method that reduces dropout effects in single-cell sequencing. We construct a cell correspondence network and adjust gene expression estimates based on transcriptome profiles for the local community of cells of the same type. We comprehensively evaluated this method, called PRIME (PRobabilistic IMputation to reduce dropout effects in Expression profiles of single cell sequencing), on six datasets and verified that it improves the quality of visualization and accuracy of clustering analysis and can discover gene expression patterns hidden by noise.


Leukemia ◽  
2021 ◽  
Author(s):  
Alboukadel Kassambara ◽  
Laurie Herviou ◽  
Sara Ovejero ◽  
Michel Jourdan ◽  
Coraline Thibaut ◽  
...  

AbstractPlasma cells (PCs) play an important role in the adaptive immune system through a continuous production of antibodies. We have demonstrated that PC differentiation can be modeled in vitro using complex multistep culture systems reproducing sequential differentiation process occurring in vivo. Here we present a comprehensive, temporal program of gene expression data encompassing human PC differentiation (PCD) using RNA sequencing (RNA-seq). Our results reveal 6374 differentially expressed genes classified into four temporal gene expression patterns. A stringent pathway enrichment analysis of these gene clusters highlights known pathways but also pathways largely unknown in PCD, including the heme biosynthesis and the glutathione conjugation pathways. Additionally, our analysis revealed numerous novel transcriptional networks with significant stage-specific overexpression and potential importance in PCD, including BATF2, BHLHA15/MIST1, EZH2, WHSC1/MMSET, and BLM. We have experimentally validated a potent role for BLM in regulating cell survival and proliferation during human PCD. Taken together, this RNA-seq analysis of PCD temporal stages helped identify coexpressed gene modules with associated up/downregulated transcription regulator genes that could represent major regulatory nodes for human PC maturation. These data constitute a unique resource of human PCD gene expression programs in support of future studies for understanding the underlying mechanisms that control PCD.


2018 ◽  
Vol 34 (14) ◽  
pp. 2392-2400 ◽  
Author(s):  
Trung Nghia Vu ◽  
Quin F Wills ◽  
Krishna R Kalari ◽  
Nifang Niu ◽  
Liewei Wang ◽  
...  

2018 ◽  
Author(s):  
Eric Talevich ◽  
A. Hunter Shain

AbstractRNA-sequencing is most commonly used to measure gene expression, but it is possible to extract genotypic information from RNA-sequencing data, too. Point mutations and translocations can be detected when they occur in expressed genes, however, there are few software solutions to infer copy number information from RNA-sequencing data. This is because a gene’s expression is dictated by a number of variables, including, but not limited to, copy number variation. Here, we report new functionalities within the software package CNVkit that enable copy number inference from RNA-sequencing data. First, CNVkit removes technical variation in gene expression associated with GC-content and transcript length. Next, CNVkit assigns a weight, dictated by several variables, to each transcript with the net effect of preferentially inferring copy number from highly and stably expressed genes. We benchmarked our approach on 105 melanomas from The Cancer Genome Atlas project and observed a high degree of concordance (R = 0.739) between our estimates and those from array comparative genomic hybridization (aCGH) on the same samples. After initial configuration, the software requires few inputs, is able to process a batch of up to 100 samples in less than ten minutes, and can be used in conjunction with pre-existing features of CNVkit, including visualization tools. Overall, we present a rapid, user-friendly software solution to infer copy number information from gene expression data.


2016 ◽  
Author(s):  
Trung Nghia Vu ◽  
Quin F Wills ◽  
Krishna R Kalari ◽  
Nifang Niu ◽  
Liewei Wang ◽  
...  

RNA-sequencing of single-cells enables characterization of transcriptional heterogeneity in seemingly homogenous cell populations. In this study we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of pairs of isoforms from the same gene in single-cell isoform-level expression data. We define six principal patterns of isoform expression relationships and introduce the concept of differential pattern analysis. We applied ISOP for analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in two independent datasets. In the primary dataset we detected and assigned pattern type of 16562 isoform-pairs from 4929 genes. Our results showed that 78% of the isoform pairs displayed a mutually exclusive expression pattern, 14% of the isoform pairs displayed bimodal isoform preference and 8% isoform pairs displayed isoform preference. 26% of the isoform-pair patterns were significant, while remaining isoform-pair patterns can be understood as effects of transcriptional bursting, drop-out and biological heterogeneity. 32% of genes discovered through differential pattern analysis were novel and not detected by differential expression analysis. ISOP provides a novel approach for characterization of isoform-level expression in single-cell populations. Our results reveal a common occurrence of isoform-level preference, commitment and heterogeneity in single-cell populations.


Sign in / Sign up

Export Citation Format

Share Document