CytoPy: An autonomous cytometry analysis framework

Cytometry analysis has seen a considerable expansion in recent years in the maximum number of parameters that can be acquired in a single experiment. In response to this technological advance there has been an increased effort to develop new computational methodologies for handling high-dimensional single cell data acquired by flow or mass cytometry. Despite the success of numerous algorithms and published packages to replicate and outperform traditional manual analysis, widespread adoption of these techniques has yet to be realised in the field of immunology. Here we present CytoPy, a Python framework for automated analysis of cytometry data that integrates a document-based database for a data-centric and iterative analytical environment. In addition, our algorithm agnostic design provides a platform for open-source cytometry bioinformatics in the Python ecosystem. We demonstrate the ability of CytoPy to phenotype T cell subsets in whole blood samples even in the presence of significant batch effects due to technical and user variation. The complete analytical pipeline was then used to immunophenotype the local inflammatory infiltrate in individuals with and without acute bacterial infection. CytoPy is open-source and licensed under the MIT license. CytoPy is open source and available at https://github.com/burtonrj/CytoPy, with notebooks accompanying this manuscript (https://github.com/burtonrj/CytoPyManuscript) and software documentation at https://cytopy.readthedocs.io/.

Download Full-text

CytoPy: an autonomous cytometry analysis framework

10.1101/2020.04.08.031898 ◽

2020 ◽

Author(s):

Ross J. Burton ◽

Raya Ahmed ◽

Simone M. Cuff ◽

Andreas Artemiou ◽

Matthias Eberl

Keyword(s):

Supervised Classification ◽

Automated Analysis ◽

High Dimensional ◽

Analysis Framework ◽

Mass Cytometry ◽

Single Experiment ◽

Software Documentation ◽

Link Type ◽

Manual Analysis ◽

Real World Datasets

AbstractCytometry analysis has seen a considerable expansion in in recent years with the expansion in the maximum number of parameters that can be acquired in a single experiment. In response to this technological advance, there has been an increased effort to develop computational methodologies for handling high-dimensional data acquired by flow or mass cytometry. Despite the success of numerous algorithms and published packages to replicate and outperform traditional manual analysis, widespread adoption of these techniques has yet to be realised in the field of cytometry. Here we present CytoPy, a Python framework for automated analysis of high dimensional cytometry data that integrates a document-based database for a data-centric and iterative analytical environment. The capability of supervised classification algorithms in CytoPy to identify cell subsets was successfully confirmed by using the FlowCAP-I competition data. The applicability of the complete analytical pipeline to real world datasets was validated by immunophenotyping the local inflammatory infiltrate in individuals with and without acute bacterial infection. CytoPy is open-source and licensed under the MIT license. Source code is available online at the https://github.com/burtonrj/CytoPy, and software documentation can be found at https://cytopy.readthedocs.io/.

Download Full-text

Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database

10.1101/206573 ◽

2017 ◽

Cited By ~ 2

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Open Source ◽

Rapid Development ◽

Analysis Tool ◽

Rna Seq ◽

Link Type ◽

Analysis Tools ◽

Rapid Pace ◽

Cell Data ◽

Selection Of

AbstractAs single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records of the growth of the field over time.Author summaryIn recent years single-cell RNA-sequeing technologies have emerged that allow scientists to measure the activity of genes in thousands of individual cells simultaneously. This means we can start to look at what each cell in a sample is doing instead of considering an average across all cells in a sample, as was the case with older technologies. However, while access to this kind of data presents a wealth of opportunities it comes with a new set of challenges. Researchers across the world have developed new methods and software tools to make the most of these datasets but the field is moving at such a rapid pace it is difficult to keep up with what is currently available. To make this easier we have developed the scRNA-tools database and website (www.scRNA-tools.org). Our database catalogues analysis tools, recording the tasks they can be used for, where they can be downloaded from and the publications that describe how they work. By looking at this database we can see that developers have focued on methods specific to single-cell data and that they embrace an open-source approach with permissive licensing, sharing of code and preprint publications.

Download Full-text

Automated analysis of 3D-echocardiography using spatially registered patient-specific CMR meshes

European Heart Journal - Cardiovascular Imaging ◽

10.1093/ehjci/jeaa356.425 ◽

2021 ◽

Vol 22 (Supplement_1) ◽

Author(s):

D Zhao ◽

E Ferdian ◽

GD Maso Talou ◽

GM Quill ◽

K Gilbert ◽

...

Keyword(s):

New Zealand ◽

Interobserver Variability ◽

Ground Truth ◽

Automated Analysis ◽

3D Echocardiography ◽

Training Data ◽

Patient Specific ◽

Manual Analysis ◽

Lv Mass ◽

3D Echo

Abstract Funding Acknowledgements Type of funding sources: Public grant(s) – National budget only. Main funding source(s): National Heart Foundation (NHF) of New Zealand Health Research Council (HRC) of New Zealand Artificial intelligence shows considerable promise for automated analysis and interpretation of medical images, particularly in the domain of cardiovascular imaging. While application to cardiac magnetic resonance (CMR) has demonstrated excellent results, automated analysis of 3D echocardiography (3D-echo) remains challenging, due to the lower signal-to-noise ratio (SNR), signal dropout, and greater interobserver variability in manual annotations. As 3D-echo is becoming increasingly widespread, robust analysis methods will substantially benefit patient evaluation. We sought to leverage the high SNR of CMR to provide training data for a convolutional neural network (CNN) capable of analysing 3D-echo. We imaged 73 participants (53 healthy volunteers, 20 patients with non-ischaemic cardiac disease) under both CMR and 3D-echo (<1 hour between scans). 3D models of the left ventricle (LV) were independently constructed from CMR and 3D-echo, and used to spatially align the image volumes using least squares fitting to a cardiac template. The resultant transformation was used to map the CMR mesh to the 3D-echo image. Alignment of mesh and image was verified through volume slicing and visual inspection (Fig. 1) for 120 paired datasets (including 47 rescans) each at end-diastole and end-systole. 100 datasets (80 for training, 20 for validation) were used to train a shallow CNN for mesh extraction from 3D-echo, optimised with a composite loss function consisting of normalised Euclidian distance (for 290 mesh points) and volume. Data augmentation was applied in the form of rotations and tilts (<15 degrees) about the long axis. The network was tested on the remaining 20 datasets (different participants) of varying image quality (Tab. I). For comparison, corresponding LV measurements from conventional manual analysis of 3D-echo and associated interobserver variability (for two observers) were also estimated. Initial results indicate that the use of embedded CMR meshes as training data for 3D-echo analysis is a promising alternative to manual analysis, with improved accuracy and precision compared with conventional methods. Further optimisations and a larger dataset are expected to improve network performance. (n = 20) LV EDV (ml) LV ESV (ml) LV EF (%) LV mass (g) Ground truth CMR 150.5 ± 29.5 57.9 ± 12.7 61.5 ± 3.4 128.1 ± 29.8 Algorithm error -13.3 ± 15.7 -1.4 ± 7.6 -2.8 ± 5.5 0.1 ± 20.9 Manual error -30.1 ± 21.0 -15.1 ± 12.4 3.0 ± 5.0 Not available Interobserver error 19.1 ± 14.3 14.4 ± 7.6 -6.4 ± 4.8 Not available Tab. 1. LV mass and volume differences (means ± standard deviations) for 20 test cases. Algorithm: CNN – CMR (as ground truth). Abstract Figure. Fig 1. CMR mesh registered to 3D-echo.

Download Full-text

Open source tools for geographic analysis in transport planning

Journal of Geographical Systems ◽

10.1007/s10109-020-00342-2 ◽

2021 ◽

Author(s):

Robin Lovelace

Keyword(s):

User Interface ◽

Open Source ◽

Citizen Participation ◽

Simulation Software ◽

First Century ◽

Transport Planning ◽

Geographic Analysis ◽

Link Type ◽

Twenty First Century ◽

Interactive Map

AbstractGeographic analysis has long supported transport plans that are appropriate to local contexts. Many incumbent ‘tools of the trade’ are proprietary and were developed to support growth in motor traffic, limiting their utility for transport planners who have been tasked with twenty-first century objectives such as enabling citizen participation, reducing pollution, and increasing levels of physical activity by getting more people walking and cycling. Geographic techniques—such as route analysis, network editing, localised impact assessment and interactive map visualisation—have great potential to support modern transport planning priorities. The aim of this paper is to explore emerging open source tools for geographic analysis in transport planning, with reference to the literature and a review of open source tools that are already being used. A key finding is that a growing number of options exist, challenging the current landscape of proprietary tools. These can be classified as command-line interface, graphical user interface or web-based user interface tools and by the framework in which they were implemented, with numerous tools released as R, Python and JavaScript packages, and QGIS plugins. The review found a diverse and rapidly evolving ‘ecosystem’ tools, with 25 tools that were designed for geographic analysis to support transport planning outlined in terms of their popularity and functionality based on online documentation. They ranged in size from single-purpose tools such as the QGIS plugin AwaP to sophisticated stand-alone multi-modal traffic simulation software such as MATSim, SUMO and Veins. Building on their ability to re-use the most effective components from other open source projects, developers of open source transport planning tools can avoid ‘reinventing the wheel’ and focus on innovation, the ‘gamified’ A/B Street https://github.com/dabreegster/abstreet/#abstreet simulation software, based on OpenStreetMap, a case in point. The paper, the source code of which can be found at https://github.com/robinlovelace/open-gat, concludes that, although many of the tools reviewed are still evolving and further research is needed to understand their relative strengths and barriers to uptake, open source tools for geographic analysis in transport planning already hold great potential to help generate the strategic visions of change and evidence that is needed by transport planners in the twenty-first century.

Download Full-text

High-throughput super-resolution analysis of influenza virus pleomorphism reveals insights into viral spatial organization

10.1101/2021.09.23.461536 ◽

2021 ◽

Author(s):

Andrew McMahon ◽

Rebecca Andrews ◽

Sohail V Ghani ◽

Thorben Cordes ◽

Achillefs N Kapanidis ◽

...

Keyword(s):

Large Scale ◽

Spatial Organization ◽

Structural Information ◽

Virus Assembly ◽

Super Resolution ◽

Automated Analysis ◽

Size Analysis ◽

Analysis Pipeline ◽

Single Experiment ◽

Viral Immunology

Many viruses form highly pleomorphic particles; in influenza, these particles range from spheres of ~ 100 nm in diameter to filaments of several microns in length. Virion structure is of interest, not only in the context of virus assembly, but also because pleomorphic variations may correlate with infectivity and pathogenicity. Detailed images of virus morphology often rely on electron microscopy, which is generally low throughput and limited in molecular identification. We have used fluorescence super-resolution microscopy combined with a rapid automated analysis pipeline to image many thousands of individual influenza virions, gaining information on their size, morphology and the distribution of membrane-embedded and internal proteins. This large-scale analysis revealed that influenza particles can be reliably characterised by length, that no spatial frequency patterning of the surface glycoproteins occurs, and that RNPs are preferentially located towards filament ends within Archetti bodies. Our analysis pipeline is versatile and can be adapted for use on multiple other pathogens, as demonstrated by its application for the size analysis of SARS-CoV-2. The ability to gain nanoscale structural information from many thousands of viruses in just a single experiment is valuable for the study of virus assembly mechanisms, host cell interactions and viral immunology, and should be able to contribute to the development of viral vaccines, anti-viral strategies and diagnostics.

Download Full-text

Open Source Digitally Replicable Lab-Grade Scales

10.20944/preprints202004.0472.v1 ◽

2020 ◽

Author(s):

Benjamin R. Hubbard ◽

Joshua M. Pearce

Keyword(s):

Open Source ◽

Low Cost ◽

Cost Savings ◽

Load Cell ◽

Significant Cost ◽

Load Cells ◽

The Cost ◽

Cell Data ◽

Precision Balance ◽

Multiple Load

This study provides designs for a low-cost, easily replicable open source lab-grade digital scale that can be used as a precision balance. The design is such that it can be manufactured for use in most labs throughout the world with open source RepRap-class material extrusion-based 3-D printers for the mechanical components and readily available open source electronics including the Arduino Nano. Several versions of the design were fabricated and tested for precision and accuracy for a range of load cells. The results showed the open source scale was found to be repeatable within 0.1g with multiple load cells, with even better precision (0.01g) depending on load cell range and style. The scale tracks linearly with proprietary lab-grade scales, meeting the performance specified in the load cell data sheets, indicating that it is accurate across the range of the load cell installed. The smallest loadcell tested(100g) offers precision on the order of a commercial digital mass balance. The scale can be produced at significant cost savings compared to scales of comparable range and precision when serial capability is present. The cost savings increase significantly as the range of the scale increases and are particularly well-suited for resource-constrained medical and scientific facilities.

Download Full-text

chewBBACA: A complete suite for gene-by-gene schema creation and strain identification

10.1101/173146 ◽

2017 ◽

Cited By ~ 5

Author(s):

Mickael Silva ◽

Miguel Machado ◽

Diogo N. Silva ◽

Mirko Rossi ◽

Jacob Moran-Gilad ◽

...

Keyword(s):

Open Source ◽

Core Genome ◽

Bacterial Species ◽

Outbreak Detection ◽

Strain Identification ◽

List Type ◽

Whole Genome ◽

Link Type ◽

The Creation ◽

Allele Calling

ABSTRACTGene-by-gene approaches are becoming increasingly popular in bacterial genomic epidemiology and outbreak detection. However, there is a lack of open-source scalable software for schema definition and allele calling for these methodologies. The chewBBACA suite was designed to assist users in the creation and evaluation of novel whole-genome or core-genome gene-by-gene typing schemas and subsequent allele calling in bacterial strains of interest. The software can run in a laptop or in high performance clusters making it useful for both small laboratories and large reference centers. ChewBBACA is available athttps://github.com/B-UMMI/chewBBACAor as a docker image athttps://hub.docker.com/r/ummidock/chewbbaca/.DATA SUMMARYAssembled genomes used for the tutorial were downloaded from NCBI in August 2016 by selecting those submitted asStreptococcus agalactiaetaxon or sub-taxa. All the assemblies have been deposited as a zip file in FigShare (https://figshare.com/s/9cbe1d422805db54cd52), where a file with the original ftp link for each NCBI directory is also available.Code for the chewBBACA suite is available athttps://github.com/B-UMMI/chewBBACAwhile the tutorial example is found athttps://github.com/B-UMMI/chewBBACA_tutorial.I/We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ⊠IMPACT STATEMENTThe chewBBACA software offers a computational solution for the creation, evaluation and use of whole genome (wg) and core genome (cg) multilocus sequence typing (MLST) schemas. It allows researchers to develop wg/cgMLST schemes for any bacterial species from a set of genomes of interest. The alleles identified by chewBBACA correspond to potential coding sequences, possibly offering insights into the correspondence between the genetic variability identified and phenotypic variability. The software performs allele calling in a matter of seconds to minutes per strain in a laptop but is easily scalable for the analysis of large datasets of hundreds of thousands of strains using multiprocessing options. The chewBBACA software thus provides an efficient and freely available open source solution for gene-by-gene methods. Moreover, the ability to perform these tasks locally is desirable when the submission of raw data to a central repository or web services is hindered by data protection policies or ethical or legal concerns.

Download Full-text

The Popgen Pipeline Platform: A Software Platform for Facilitating Population Genomic Analyses

10.1101/785774 ◽

2019 ◽

Author(s):

Andrew Webb ◽

Jared Knoblauch ◽

Nitesh Sabankar ◽

Apeksha Sukesh Kallur ◽

Jody Hey ◽

...

Keyword(s):

Open Source ◽

Development Time ◽

End Users ◽

File Format ◽

Software Platform ◽

Format Conversion ◽

Link Type ◽

Population Genomic ◽

Genomic Analyses ◽

File Format Conversion

AbstractHere we present the Pop-Gen Pipeline Platform (PPP), a software platform with the goal of reducing the computational expertise required for conducting population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized Python environment. Functions were developed to encompass entire workflows, including: input preparation, file format conversion, various population genomic analyses, output generation, and visualization. By facilitating entire workflows, the PPP offers several benefits to prospective end users - it reduces the need of redundant in-house software and scripts that would require development time and may be error-prone, or incorrect. The platform has also been developed with reproducibility and extensibility of analyses in mind. The PPP is an open-source package that is available for download and use at https://ppp.readthedocs.io/en/latest/PPP_pages/install.html

Download Full-text

gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler

F1000Research ◽

10.12688/f1000research.24956.2 ◽

2020 ◽

Vol 9 ◽

pp. 709 ◽

Cited By ~ 1

Author(s):

Liis Kolberg ◽

Uku Raudvere ◽

Ivan Kuzmin ◽

Jaak Vilo ◽

Hedi Peterson

Keyword(s):

Gene List ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Automated Analysis ◽

R Package ◽

Biological Data ◽

Functional Enrichment ◽

Link Type ◽

Functional Profiling ◽

Rest Api

g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.

Download Full-text