scholarly journals Integrated cytometry with machine learning applied to high-content imaging of human kidney tissue for in-situ cell classification and neighborhood analysis

2021 ◽  
Author(s):  
Seth Winfree ◽  
Andrew T McNutt ◽  
Suraj Khochare ◽  
Tyler J Borgard ◽  
Daria Barwinska ◽  
...  

The human kidney is a complex organ with various cell types that are intricately organized to perform key physiological functions and maintain homeostasis. New imaging modalities such as mesoscale and highly multiplexed fluorescence microscopy are increasingly applied to human kidney tissue to create single cell resolution datasets that are both spatially large and multi-dimensional. These single cell resolution high-content imaging datasets have a great potential to uncover the complex spatial organization and cellular make-up of the human kidney. Tissue cytometry is a novel approach used for quantitative analysis of imaging data, but the scale and complexity of such datasets pose unique challenges for processing and analysis. We have developed the Volumetric Tissue Exploration and Analysis (VTEA) software, a unique tool that integrates image processing, segmentation and interactive cytometry analysis into a single framework on desktop computers. Supported by an extensible and open-source framework, VTEA's integrated pipeline now includes enhanced analytical tools, such as machine learning, data visualization, and neighborhood analyses for hyperdimensional large-scale imaging datasets. These novel capabilities enable the analysis of mesoscale two and three-dimensional multiplexed human kidney imaging datasets (such as CODEX and 3D confocal multiplexed fluorescence imaging). We demonstrate the utility of this approach in identifying cell subtypes in the kidney based on labels, spatial association and their microenvironment or neighborhood membership. VTEA provides integrated and intuitive approach to decipher the cellular and spatial complexity of the human kidney and complement other transcriptomics and epigenetic efforts to define the landscape of kidney cell types.

2019 ◽  
Vol 21 (4) ◽  
pp. 1209-1223 ◽  
Author(s):  
Raphael Petegrosso ◽  
Zhuliu Li ◽  
Rui Kuang

Abstract   Single-cell RNAsequencing (scRNA-seq) technologies have enabled the large-scale whole-transcriptome profiling of each individual single cell in a cell population. A core analysis of the scRNA-seq transcriptome profiles is to cluster the single cells to reveal cell subtypes and infer cell lineages based on the relations among the cells. This article reviews the machine learning and statistical methods for clustering scRNA-seq transcriptomes developed in the past few years. The review focuses on how conventional clustering techniques such as hierarchical clustering, graph-based clustering, mixture models, $k$-means, ensemble learning, neural networks and density-based clustering are modified or customized to tackle the unique challenges in scRNA-seq data analysis, such as the dropout of low-expression genes, low and uneven read coverage of transcripts, highly variable total mRNAs from single cells and ambiguous cell markers in the presence of technical biases and irrelevant confounding biological variations. We review how cell-specific normalization, the imputation of dropouts and dimension reduction methods can be applied with new statistical or optimization strategies to improve the clustering of single cells. We will also introduce those more advanced approaches to cluster scRNA-seq transcriptomes in time series data and multiple cell populations and to detect rare cell types. Several software packages developed to support the cluster analysis of scRNA-seq data are also reviewed and experimentally compared to evaluate their performance and efficiency. Finally, we conclude with useful observations and possible future directions in scRNA-seq data analytics. Availability All the source code and data are available at https://github.com/kuanglab/single-cell-review.


Author(s):  
Andrew W. Schroeder ◽  
Swastika Sur ◽  
Priyanka Rashmi ◽  
Izabella Damm ◽  
Arya Zarinsefat ◽  
...  

AbstractBackgroundThe kidney is a highly complex organ that performs multiple functions necessary to maintain systemic homeostasis, with complex interplay from different kidney sub-structures and the coordinated response of diverse cell types, few known and likely many others, as yet undiscovered. Traditional global sequencing techniques are limited in their ability to identify unique and functionally diverse cell types in complex tissues.MethodsHerein we characterize over 45,000 cells from 10 normal human kidneys using unbiased single-cell RNA sequencing. We also apply, for the first time, an approach of multiplexing kidney samples (Mux-Seq), pooled from different individuals, to save input sample amount and cost. We applied the computational tool Demuxlet to assess differential expression across multiple individuals by pooling human kidney cells for scRNA sequencing, utilizing individual genetic variability to determine the identity of each cell.ResultsMultiplexed droplet single-cell RNA sequencing results were highly correlated with the singleplexed sample run data. One hundred distinct cell cluster populations in total were identified across the major cell types of the kidney, with varied functional states. Proximal tubular and collecting duct cells were the most heterogeneous, displaying multiple clusters with unique ontologies. Novel proximal tubular cell subsets were identified with regenerative potential. Trajectory analysis demonstrated evolution of cell states between intercalated and principal cells in the collecting duct.ConclusionsHealthy kidney tissue has been successfully analyzed to detect all known renal cell types, inclusive of resident and infiltrating immune cells in the kidney. Mux-Seq is a unique method that allows for rapid and cost-effective single cell, in depth, transcriptional analysis of human kidney tissue.Significance StatementUse of renal biopsies for single cell transcriptomics is limited by small tissue availability and batch effects. In this study, we have successfully employed the use of Mux-Seq for the first time in kidney. Mux-Seq allows the use of single cell technology at a much more cost-effective manner by pooling samples from multiple individuals for a single sequencing run. This is even more relevant in the case of patient biopsies where the input of tissue is significantly limited. We show that the data from overlapping tissue samples are highly correlated between Mux-Seq and traditional Singleplexed RNA seq. Furthermore, the results from Mux-Seq of 4 pooled samples are highly correlated with singleplexed data from 10 singleplex samples despite the inherent variability among individuals.


2021 ◽  
Author(s):  
Lijun Ma ◽  
Mariana Murea ◽  
Young A Choi ◽  
Ashok K. Hemal ◽  
Alexei V. Mikhailov ◽  
...  

The kidney is composed of multiple cell types, each with specific physiological functions. Single-cell RNA sequencing (scRNA-Seq) is useful for classifying cell-specific gene expression profiles in kidney tissue. Because viable cells are critical in scRNA-Seq analyses, we report an optimized cell dissociation process and the necessity for histological screening of human kidney sections prior to performing scRNA-Seq. We show that glomerular injury can result in loss of select cell types during the cell clustering process. Subsequent fluorescence microscopy confirmed reductions in cell-specific markers among the injured cells seen on kidney sections and these changes need to be considered when interpreting results of scRNA-Seq.


2020 ◽  
Author(s):  
Jens Hansen ◽  
Rachel Sealfon ◽  
Rajasree Menon ◽  
Michael T. Eadon ◽  
Blue B. Lake ◽  
...  

AbstractThe Kidney Precision Medicine Project (KPMP) plans to construct a spatially specified tissue atlas of the human kidney at a cellular resolution with near comprehensive molecular details. The atlas will have maps of healthy, acute kidney injury and chronic kidney disease tissues. To construct such maps, we integrate different data sets that profile mRNAs, proteins and metabolites collected by five KPMP Tissue Interrogation Sites. Here, we describe a set of hierarchical analytical methods to process, combine, and harmonize single-cell, single-nucleus and subsegmental laser microdissection (LMD) transcriptomics with LMD and near single-cell proteomics, 3-D nondestructive and immunofluorescence-based Codex imaging and spatial metabolomics datasets. We use nephrectomy, healthy living donor and surveillance transplant biopsy tissues to create a harmonized reference tissue map. Our results demonstrate that different assays produce reliable and coherent identification of cell types and tissue subsegments. They further show that the molecular profiles and pathways are partially overlapping yet complementary for cell type-specific and subsegmental physiological processes. Focusing on the proximal tubules, we find that our integrated systems biologybased analyses identify different subtypes of tubular cells with potential for different levels of lipid oxidation and energy generation. Integration of our omics data with pathways from the literature, enables us to construct predictive computational models to develop a smart kidney atlas. These integrated models can describe physiological capabilities of the tissues based on the underlying cell types and pathways in health and disease.


2020 ◽  
Author(s):  
Andre Woloshuk ◽  
Suraj Khochare ◽  
Aljohara Fahad Almulhim ◽  
Andrew McNutt ◽  
Dawson Dean ◽  
...  

AbstractTo understand the physiology and pathology of disease, capturing the heterogeneity of cell types within their tissue environment is fundamental. In such an endeavor, the human kidney presents a formidable challenge because its complex organizational structure is tightly linked to key physiological functions. Advances in imaging-based cell classification may be limited by the need to incorporate specific markers that can link classification to function. Multiplex imaging can mitigate these limitations, but requires cumulative incorporation of markers, which may lead to tissue exhaustion. Furthermore, the application of such strategies in large scale 3-dimensional (3D) imaging is challenging. Here, we propose that 3D nuclear signatures from a DNA stain, DAPI, which could be incorporated in most experimental imaging, can be used for classifying cells in intact human kidney tissue. We developed an unsupervised approach that uses 3D tissue cytometry to generate a large training dataset of nuclei images (NephNuc), where each nucleus is associated with a cell type label. We then devised various supervised machine learning approaches for kidney cell classification and demonstrated that a deep learning approach outperforms classical machine learning or shape-based classifiers. Specifically, a custom 3D convolutional neural network (NephNet3D) trained on nuclei image volumes achieved a balanced accuracy of 80.26%. Importantly, integrating NephNet3D classification with tissue cytometry allowed in situ visualization of cell type classifications in kidney tissue. In conclusion, we present a tissue cytometry and deep learning approach for in situ classification of cell types in human kidney tissue using only a DNA stain. This methodology is generalizable to other tissues and has potential advantages on tissue economy and non-exhaustive classification of different cell types.


2019 ◽  
Author(s):  
Michael Hagemann-Jensen ◽  
Christoph Ziegenhain ◽  
Ping Chen ◽  
Daniel Ramsköld ◽  
Gert-Jan Hendriks ◽  
...  

AbstractLarge-scale sequencing of RNAs from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states1. However, current single-cell RNA-sequencing (scRNA-seq) methods have limited ability to count RNAs at allele- and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells2,3. Here, we introduce Smart-seq3 that combines full-length transcriptome coverage with a 5’ unique molecular identifier (UMI) RNA counting strategy that enabled in silico reconstruction of thousands of RNA molecules per cell. Importantly, a large portion of counted and reconstructed RNA molecules could be directly assigned to specific isoforms and allelic origin, and we identified significant transcript isoform regulation in mouse strains and human cell types. Moreover, Smart-seq3 showed a dramatic increase in sensitivity and typically detected thousands more genes per cell than Smart-seq2. Altogether, we developed a short-read sequencing strategy for single-cell RNA counting at isoform and allele-resolution applicable to large-scale characterization of cell types and states across tissues and organisms.


2021 ◽  
Author(s):  
Anita Bandrowski ◽  
Jeffrey S. Grethe ◽  
Anna Pilko ◽  
Tom Gillespie ◽  
Gabi Pine ◽  
...  

AbstractThe NIH Common Fund’s Stimulating Peripheral Activity to Relieve Conditions (SPARC) initiative is a large-scale program that seeks to accelerate the development of therapeutic devices that modulate electrical activity in nerves to improve organ function. Integral to the SPARC program are the rich anatomical and functional datasets produced by investigators across the SPARC consortium that provide key details about organ-specific circuitry, including structural and functional connectivity, mapping of cell types and molecular profiling. These datasets are provided to the research community through an open data platform, the SPARC Portal. To ensure SPARC datasets are Findable, Accessible, Interoperable and Reusable (FAIR), they are all submitted to the SPARC portal following a standard scheme established by the SPARC Curation Team, called the SPARC Data Structure (SDS). Inspired by the Brain Imaging Data Structure (BIDS), the SDS has been designed to capture the large variety of data generated by SPARC investigators who are coming from all fields of biomedical research. Here we present the rationale and design of the SDS, including a description of the SPARC curation process and the automated tools for complying with the SDS, including the SDS validator and Software to Organize Data Automatically (SODA) for SPARC. The objective is to provide detailed guidelines for anyone desiring to comply with the SDS. Since the SDS are suitable for any type of biomedical research data, it can be adopted by any group desiring to follow the FAIR data principles for managing their data, even outside of the SPARC consortium. Finally, this manuscript provides a foundational framework that can be used by any organization desiring to either adapt the SDS to suit the specific needs of their data or simply desiring to design their own FAIR data sharing scheme from scratch.


2020 ◽  
Author(s):  
Etienne Becht ◽  
Daniel Tolstrup ◽  
Charles-Antoine Dutertre ◽  
Florent Ginhoux ◽  
Evan W. Newell ◽  
...  

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.


Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 45-46
Author(s):  
Christian Pohlkamp ◽  
Kapil Jhalani ◽  
Niroshan Nadarajah ◽  
Inseok Heo ◽  
William Wetton ◽  
...  

Background: Cytomorphology is the gold standard for quick assessment of peripheral blood and bone marrow samples in hematological neoplasms. It is a broadly-accepted method for orchestrating more specific diagnostics including immunophenotyping or genetics. Inter-/intra-observer-reproducibility of single cell classification is only 75 to 90%. Only a limited number of cells (100 - 500 cells/smear) is read in a time-consuming procedure. Machine learning (ML) is more reliable where human skills are limited, i.e. in handling large amounts of data or images. We here tested ML to differentiate peripheral blood leukocytes in a high throughput hematology laboratory. Aim: To establish an ML-based cell classifier capable of identifying healthy and pathologic cells in digitalized peripheral blood smear scans at an accuracy competitive with or outperforming human expert level. Methods: We selected >2,600 smears out of our unique archive of > 250,000 peripheral blood smears from hematological neoplasms. Depending on quality, we scanned up to 1,000 single cell images per smear. For image acquisition, a Metafer Scanning System (Zeiss Axio Imager.Z2 microscope, automatic slide feeder and automatic oiling device) from MetaSystems (Altlussheim, GER) was used. Areas of interest were defined by pre-scan in 10x magnification followed by high resolution scan in 40x to generate cell images for analysis. Average capture times for 300/500 cells were 3:43/4:37 min We set up a supervised ML-learning model using colour images (144x144 pixels) as input, outputting predicted probabilities of 21 predefined classes. We used ImageNet-pretrained Xception as our base model. We trained, evaluated and deployed the model using Amazon SageMaker on a subset of 82,974 images randomly selected from 514,183 cells captured and labelled for this study. 20 different cell types and one garbage class were classified. We included cell type categories referring to the critical importance of detecting rare leukemia subtypes (e.g. APL). Numbers of images from respective 21 classes ranged from 1,830 to 14,909 (median: 2,945). Minority classes were up-sampledto handle imbalances. Each picture was labelled by highly skilled technicians (median years practicing in this laboratory: 5) and two independent hematologists (median years at microscope: 20). Results: On a separate test set of 8,297 cells, our classifier was able to predict any of the five cell types occurring in the peripheral blood of healthy individuals (PMN, lymphocytes, monocytes, eosinophils, basophils) at very high median accuracy (97.0%) Median prediction accuracy of 15 rare or pathological cell types was 91.3%. For six critical pathological cell forms (myeloblasts, atypical/bilobulated promyelocytes in APL/APLv, hairy cells, lymphoma cells,plasma cells), median accuracy was 93.4% (sensitivity 93.8%). We saw a very high "T98 accuracy" for these cell types (98.5%) which is the accuracy of cell type predictions with prediction probability >0.98 (achieved in 2231/2417 cases), implicating that critical cells predicted with probability <0.98 should be flagged for human expert validation with priority. For all 21 classes median accuracy was 91.7%. Accuracy was lower for cells representing consecutive steps of maturation, e.g. promyelo-/myelo-/metamyelocytes, reproducing inconsistencies from the human-built phenotypic classification system (s.Fig.). Conclusions: We demonstrate an automated workflow using automatic microscopic cell capturing and ML-driven cell differentiation in samples of hematologic patients. Reproducibility, accuracy, sensitivity and specificity are above 90%, for many cell types above 98%. By flagging suspicious cells for humanvalidation, this tool can support even experienced hematology professionals, especially in detecting rare cell types. Given an appropriate scanning speed, it clearly outperforms human investigators in terms of examination time and number of differentiated cells. An ML-based intelligence can make its skills accessible to hematology laboratories on site or after upload of scanned cell images, independent of time/location. A cloud-based infrastructure is available. A prospective head to head challenge between ML-based classifier and human experts comparing sensitivity and accuracy for detection of all cell classes in peripheral blood will be tested to proof suitability for routine use (NCT 4466059). Figure Disclosures Heo: AWS: Current Employment. Wetton:AWS: Current Employment. Drescher:MetaSystems: Current Employment. Hänselmann:MetaSystems: Current Employment. Lörch:MetaSystems: Current equity holder in private company.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Patrick S. Stumpf ◽  
Xin Du ◽  
Haruka Imanishi ◽  
Yuya Kunisaki ◽  
Yuichiro Semba ◽  
...  

AbstractBiomedical research often involves conducting experiments on model organisms in the anticipation that the biology learnt will transfer to humans. Previous comparative studies of mouse and human tissues were limited by the use of bulk-cell material. Here we show that transfer learning—the branch of machine learning that concerns passing information from one domain to another—can be used to efficiently map bone marrow biology between species, using data obtained from single-cell RNA sequencing. We first trained a multiclass logistic regression model to recognize different cell types in mouse bone marrow achieving equivalent performance to more complex artificial neural networks. Furthermore, it was able to identify individual human bone marrow cells with 83% overall accuracy. However, some human cell types were not easily identified, indicating important differences in biology. When re-training the mouse classifier using data from human, less than 10 human cells of a given type were needed to accurately learn its representation. In some cases, human cell identities could be inferred directly from the mouse classifier via zero-shot learning. These results show how simple machine learning models can be used to reconstruct complex biology from limited data, with broad implications for biomedical research.


Sign in / Sign up

Export Citation Format

Share Document