CytoPy: an autonomous cytometry analysis framework

AbstractCytometry analysis has seen a considerable expansion in in recent years with the expansion in the maximum number of parameters that can be acquired in a single experiment. In response to this technological advance, there has been an increased effort to develop computational methodologies for handling high-dimensional data acquired by flow or mass cytometry. Despite the success of numerous algorithms and published packages to replicate and outperform traditional manual analysis, widespread adoption of these techniques has yet to be realised in the field of cytometry. Here we present CytoPy, a Python framework for automated analysis of high dimensional cytometry data that integrates a document-based database for a data-centric and iterative analytical environment. The capability of supervised classification algorithms in CytoPy to identify cell subsets was successfully confirmed by using the FlowCAP-I competition data. The applicability of the complete analytical pipeline to real world datasets was validated by immunophenotyping the local inflammatory infiltrate in individuals with and without acute bacterial infection. CytoPy is open-source and licensed under the MIT license. Source code is available online at the https://github.com/burtonrj/CytoPy, and software documentation can be found at https://cytopy.readthedocs.io/.

Download Full-text

CytoPy: An autonomous cytometry analysis framework

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009071 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009071

Author(s):

Ross J. Burton ◽

Raya Ahmed ◽

Simone M. Cuff ◽

Sarah Baker ◽

Andreas Artemiou ◽

...

Keyword(s):

Open Source ◽

Automated Analysis ◽

T Cell Subsets ◽

Analysis Framework ◽

Mass Cytometry ◽

Single Experiment ◽

Software Documentation ◽

Link Type ◽

Manual Analysis ◽

Cell Data

Cytometry analysis has seen a considerable expansion in recent years in the maximum number of parameters that can be acquired in a single experiment. In response to this technological advance there has been an increased effort to develop new computational methodologies for handling high-dimensional single cell data acquired by flow or mass cytometry. Despite the success of numerous algorithms and published packages to replicate and outperform traditional manual analysis, widespread adoption of these techniques has yet to be realised in the field of immunology. Here we present CytoPy, a Python framework for automated analysis of cytometry data that integrates a document-based database for a data-centric and iterative analytical environment. In addition, our algorithm agnostic design provides a platform for open-source cytometry bioinformatics in the Python ecosystem. We demonstrate the ability of CytoPy to phenotype T cell subsets in whole blood samples even in the presence of significant batch effects due to technical and user variation. The complete analytical pipeline was then used to immunophenotype the local inflammatory infiltrate in individuals with and without acute bacterial infection. CytoPy is open-source and licensed under the MIT license. CytoPy is open source and available at https://github.com/burtonrj/CytoPy, with notebooks accompanying this manuscript (https://github.com/burtonrj/CytoPyManuscript) and software documentation at https://cytopy.readthedocs.io/.

Download Full-text

Correction to: High-dimensional single cell mass cytometry analysis of the murine hematopoietic system reveals signatures induced by ageing and physiological pathogen challenges

Immunity & Ageing ◽

10.1186/s12979-021-00235-y ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Christos Nikolaou ◽

Kerstin Muehle ◽

Stephan Schlickeiser ◽

Alberto Sada Japp ◽

Nadine Matzmohr ◽

...

Keyword(s):

Single Cell ◽

Cell Mass ◽

High Dimensional ◽

Mass Cytometry ◽

Hematopoietic System

An amendment to this paper has been published and can be accessed via the original article.

Download Full-text

Automated analysis of 3D-echocardiography using spatially registered patient-specific CMR meshes

European Heart Journal - Cardiovascular Imaging ◽

10.1093/ehjci/jeaa356.425 ◽

2021 ◽

Vol 22 (Supplement_1) ◽

Author(s):

D Zhao ◽

E Ferdian ◽

GD Maso Talou ◽

GM Quill ◽

K Gilbert ◽

...

Keyword(s):

New Zealand ◽

Interobserver Variability ◽

Ground Truth ◽

Automated Analysis ◽

3D Echocardiography ◽

Training Data ◽

Patient Specific ◽

Manual Analysis ◽

Lv Mass ◽

3D Echo

Abstract Funding Acknowledgements Type of funding sources: Public grant(s) – National budget only. Main funding source(s): National Heart Foundation (NHF) of New Zealand Health Research Council (HRC) of New Zealand Artificial intelligence shows considerable promise for automated analysis and interpretation of medical images, particularly in the domain of cardiovascular imaging. While application to cardiac magnetic resonance (CMR) has demonstrated excellent results, automated analysis of 3D echocardiography (3D-echo) remains challenging, due to the lower signal-to-noise ratio (SNR), signal dropout, and greater interobserver variability in manual annotations. As 3D-echo is becoming increasingly widespread, robust analysis methods will substantially benefit patient evaluation. We sought to leverage the high SNR of CMR to provide training data for a convolutional neural network (CNN) capable of analysing 3D-echo. We imaged 73 participants (53 healthy volunteers, 20 patients with non-ischaemic cardiac disease) under both CMR and 3D-echo (<1 hour between scans). 3D models of the left ventricle (LV) were independently constructed from CMR and 3D-echo, and used to spatially align the image volumes using least squares fitting to a cardiac template. The resultant transformation was used to map the CMR mesh to the 3D-echo image. Alignment of mesh and image was verified through volume slicing and visual inspection (Fig. 1) for 120 paired datasets (including 47 rescans) each at end-diastole and end-systole. 100 datasets (80 for training, 20 for validation) were used to train a shallow CNN for mesh extraction from 3D-echo, optimised with a composite loss function consisting of normalised Euclidian distance (for 290 mesh points) and volume. Data augmentation was applied in the form of rotations and tilts (<15 degrees) about the long axis. The network was tested on the remaining 20 datasets (different participants) of varying image quality (Tab. I). For comparison, corresponding LV measurements from conventional manual analysis of 3D-echo and associated interobserver variability (for two observers) were also estimated. Initial results indicate that the use of embedded CMR meshes as training data for 3D-echo analysis is a promising alternative to manual analysis, with improved accuracy and precision compared with conventional methods. Further optimisations and a larger dataset are expected to improve network performance. (n = 20) LV EDV (ml) LV ESV (ml) LV EF (%) LV mass (g) Ground truth CMR 150.5 ± 29.5 57.9 ± 12.7 61.5 ± 3.4 128.1 ± 29.8 Algorithm error -13.3 ± 15.7 -1.4 ± 7.6 -2.8 ± 5.5 0.1 ± 20.9 Manual error -30.1 ± 21.0 -15.1 ± 12.4 3.0 ± 5.0 Not available Interobserver error 19.1 ± 14.3 14.4 ± 7.6 -6.4 ± 4.8 Not available Tab. 1. LV mass and volume differences (means ± standard deviations) for 20 test cases. Algorithm: CNN – CMR (as ground truth). Abstract Figure. Fig 1. CMR mesh registered to 3D-echo.

Download Full-text

High-dimensional immune phenotyping of blood cells by mass cytometry in patients infected with hepatitis C virus

Clinical Microbiology and Infection ◽

10.1016/j.cmi.2021.08.018 ◽

2021 ◽

Author(s):

Jacobus Herderschee ◽

Tytti Heinonen ◽

Craig Fenwick ◽

Irene T. Schrijver ◽

Khalid Ohmiti ◽

...

Keyword(s):

Hepatitis C Virus ◽

Hepatitis C ◽

Blood Cells ◽

High Dimensional ◽

Mass Cytometry ◽

Immune Phenotyping

Download Full-text

Comparison of Supervised Classification Methods for Protein Profiling in Cancer Diagnosis

Cancer Informatics ◽

10.1177/117693510700300023 ◽

2007 ◽

Vol 3 ◽

pp. 117693510700300 ◽

Cited By ~ 6

Author(s):

Nadège Dossat ◽

Alain Mangé ◽

Jérôme Solassol ◽

William Jacot ◽

Ludovic Lhermitte ◽

...

Keyword(s):

Mass Spectrometry ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Supervised Classification ◽

Protein Profiling ◽

Clinical Proteomics ◽

High Dimensional ◽

Classification Methods ◽

Linear Discriminant ◽

Supervised Classification Methods

A key challenge in clinical proteomics of cancer is the identification of biomarkers that could allow detection, diagnosis and prognosis of the diseases. Recent advances in mass spectrometry and proteomic instrumentations offer unique chance to rapidly identify these markers. These advances pose considerable challenges, similar to those created by microarray-based investigation, for the discovery of pattern of markers from high-dimensional data, specific to each pathologic state (e.g. normal vs cancer). We propose a three-step strategy to select important markers from high-dimensional mass spectrometry data using surface enhanced laser desorption/ionization (SELDI) technology. The first two steps are the selection of the most discriminating biomarkers with a construction of different classifiers. Finally, we compare and validate their performance and robustness using different supervised classification methods such as Support Vector Machine, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Networks, Classification Trees and Boosting Trees. We show that the proposed method is suitable for analysing high-throughput proteomics data and that the combination of logistic regression and Linear Discriminant Analysis outperform other methods tested.

Download Full-text

High-throughput super-resolution analysis of influenza virus pleomorphism reveals insights into viral spatial organization

10.1101/2021.09.23.461536 ◽

2021 ◽

Author(s):

Andrew McMahon ◽

Rebecca Andrews ◽

Sohail V Ghani ◽

Thorben Cordes ◽

Achillefs N Kapanidis ◽

...

Keyword(s):

Large Scale ◽

Spatial Organization ◽

Structural Information ◽

Virus Assembly ◽

Super Resolution ◽

Automated Analysis ◽

Size Analysis ◽

Analysis Pipeline ◽

Single Experiment ◽

Viral Immunology

Many viruses form highly pleomorphic particles; in influenza, these particles range from spheres of ~ 100 nm in diameter to filaments of several microns in length. Virion structure is of interest, not only in the context of virus assembly, but also because pleomorphic variations may correlate with infectivity and pathogenicity. Detailed images of virus morphology often rely on electron microscopy, which is generally low throughput and limited in molecular identification. We have used fluorescence super-resolution microscopy combined with a rapid automated analysis pipeline to image many thousands of individual influenza virions, gaining information on their size, morphology and the distribution of membrane-embedded and internal proteins. This large-scale analysis revealed that influenza particles can be reliably characterised by length, that no spatial frequency patterning of the surface glycoproteins occurs, and that RNPs are preferentially located towards filament ends within Archetti bodies. Our analysis pipeline is versatile and can be adapted for use on multiple other pathogens, as demonstrated by its application for the size analysis of SARS-CoV-2. The ability to gain nanoscale structural information from many thousands of viruses in just a single experiment is valuable for the study of virus assembly mechanisms, host cell interactions and viral immunology, and should be able to contribute to the development of viral vaccines, anti-viral strategies and diagnostics.

Download Full-text

gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler

F1000Research ◽

10.12688/f1000research.24956.2 ◽

2020 ◽

Vol 9 ◽

pp. 709 ◽

Cited By ~ 1

Author(s):

Liis Kolberg ◽

Uku Raudvere ◽

Ivan Kuzmin ◽

Jaak Vilo ◽

Hedi Peterson

Keyword(s):

Gene List ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Automated Analysis ◽

R Package ◽

Biological Data ◽

Functional Enrichment ◽

Link Type ◽

Functional Profiling ◽

Rest Api

g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.

Download Full-text

GAC: Gene Associations with Clinical, a web based application

F1000Research ◽

10.12688/f1000research.11840.2 ◽

2017 ◽

Vol 6 ◽

pp. 1039

Author(s):

Xinyan Zhang ◽

Manali Rupji ◽

Jeanne Kowalski

Keyword(s):

High Dimensional Data ◽

Interactive Visualization ◽

High Dimensional ◽

Forest Plot ◽

Event Data ◽

Time To Event ◽

Web Based ◽

Link Type ◽

Time To Event Data ◽

Clinical Associations

We present GAC, a shiny R based tool for interactive visualization of clinical associations based on high-dimensional data. The tool provides a web-based suite to perform supervised principal component analysis (SuperPC), an approach that uses both high-dimensional data, such as gene expression, combined with clinical data to infer clinical associations. We extended the approach to address binary outcomes, in addition to continuous and time-to-event data in our package, thereby increasing the use and flexibility of SuperPC. Additionally, the tool provides an interactive visualization for summarizing results based on a forest plot for both binary and time-to-event data. In summary, the GAC suite of tools provide a one stop shop for conducting statistical analysis to identify and visualize the association between a clinical outcome of interest and high-dimensional data types, such as genomic data. Our GAC package has been implemented in R and is available via http://shinygispa.winship.emory.edu/GAC/. The developmental repository is available at https://github.com/manalirupji/GAC.

Download Full-text

Facetto: Combining Unsupervised and Supervised Learning for Hierarchical Phenotype Analysis in Multi-Channel Image Data

10.1101/722918 ◽

2019 ◽

Cited By ~ 1

Author(s):

Robert Krueger ◽

Johanna Beyer ◽

Won-Dong Jang ◽

Nam Wook Kim ◽

Artem Sokolov ◽

...

Keyword(s):

Supervised Learning ◽

Visual Analytics ◽

Cancer Biology ◽

Large Scale ◽

Hierarchical Structures ◽

Image Data ◽

Automated Analysis ◽

Cell Types ◽

High Dimensional ◽

Exploration Process

AbstractFacetto is a scalable visual analytics application that is used to discover single-cell phenotypes in high-dimensional multi-channel microscopy images of human tumors and tissues. Such images represent the cutting edge of digital histology and promise to revolutionize how diseases such as cancer are studied, diagnosed, and treated. Highly multiplexed tissue images are complex, comprising 109or more pixels, 60-plus channels, and millions of individual cells. This makes manual analysis challenging and error-prone. Existing automated approaches are also inadequate, in large part, because they are unable to effectively exploit the deep knowledge of human tissue biology available to anatomic pathologists. To overcome these challenges, Facetto enables a semi-automated analysis of cell types and states. It integrates unsupervised and supervised learning into the image and feature exploration process and offers tools for analytical provenance. Experts can cluster the data to discover new types of cancer and immune cells and use clustering results to train a convolutional neural network that classifies new cells accordingly. Likewise, the output of classifiers can be clustered to discover aggregate patterns and phenotype subsets. We also introduce a new hierarchical approach to keep track of analysis steps and data subsets created by users; this assists in the identification of cell types. Users can build phenotype trees and interact with the resulting hierarchical structures of both high-dimensional feature and image spaces. We report on use-cases in which domain scientists explore various large-scale fluorescence imaging datasets. We demonstrate how Facetto assists users in steering the clustering and classification process, inspecting analysis results, and gaining new scientific insights into cancer biology.

Download Full-text