software pipeline Latest Research Papers

Validation of the HERA Phase I Epoch of Reionization 21 cm Power Spectrum Software Pipeline

The Astrophysical Journal ◽

10.3847/1538-4357/ac32cd ◽

2022 ◽

Vol 924 (2) ◽

pp. 85

Author(s):

James E. Aguirre ◽

Steven G. Murray ◽

Robert Pascua ◽

Zachary E. Martinot ◽

Jacob Burba ◽

...

Keyword(s):

Power Spectrum ◽

Phase I ◽

Thermal Noise ◽

Dynamic Range ◽

Synthetic Data ◽

Signal Loss ◽

Software Pipeline ◽

Wide Range ◽

Input Spectrum ◽

End To End

Abstract We describe the validation of the HERA Phase I software pipeline by a series of modular tests, building up to an end-to-end simulation. The philosophy of this approach is to validate the software and algorithms used in the Phase I upper-limit analysis on wholly synthetic data satisfying the assumptions of that analysis, not addressing whether the actual data meet these assumptions. We discuss the organization of this validation approach, the specific modular tests performed, and the construction of the end-to-end simulations. We explicitly discuss the limitations in scope of the current simulation effort. With mock visibility data generated from a known analytic power spectrum and a wide range of realistic instrumental effects and foregrounds, we demonstrate that the current pipeline produces power spectrum estimates that are consistent with known analytic inputs to within thermal noise levels (at the 2σ level) for k > 0.2h Mpc−1 for both bands and fields considered. Our input spectrum is intentionally amplified to enable a strong “detection” at k ∼ 0.2 h Mpc−1—at the level of ∼25σ—with foregrounds dominating on larger scales and thermal noise dominating at smaller scales. Our pipeline is able to detect this amplified input signal after suppressing foregrounds with a dynamic range (foreground to noise ratio) of ≳107. Our validation test suite uncovered several sources of scale-independent signal loss throughout the pipeline, whose amplitude is well-characterized and accounted for in the final estimates. We conclude with a discussion of the steps required for the next round of data analysis.

A Scalable and Modular Automated Pipeline for Stitching of Large Electron Microscopy Datasets

10.1101/2021.11.24.469932 ◽

2021 ◽

Author(s):

Gayathri Mahalingam ◽

Russel Torres ◽

Daniel Kapner ◽

Eric T Trautman ◽

Tim Fliss ◽

...

Keyword(s):

Electron Microscopy ◽

Visual Cortex ◽

High Throughput ◽

Serial Section ◽

Large Scale ◽

Software Pipeline ◽

Array Tomography ◽

Automated Quality Control ◽

Mouse Visual Cortex ◽

The Brain

Serial section Electron Microscopy can produce high throughput imaging of large biological specimen volumes. The high-resolution images are necessary to reconstruct dense neural wiring diagrams in the brain, so called connectomes. A high fidelity volume assembly is required to correctly reconstruct neural anatomy and synaptic connections. It involves seamless 2D stitching of the images within a serial section followed by 3D alignment of the stitched sections. The high throughput of ssEM necessitates 2D stitching to be done at the pace of imaging, which currently produces tens of terabytes per day. To achieve this, we present a modular volume assembly software pipeline ASAP(Assembly Stitching and Alignment Pipeline) that is scalable and parallelized to work with distributed systems. The pipeline is built on top of the Render [18] services used in the volume assembly of the brain of adult Drosophila melanogaster [2]. It achieves high throughput by operating on the meta-data and transformations of each image stored in a database, thus eliminating the need to render intermediate output. The modularity of ASAP allows for easy adaptation to new algorithms without significant changes to the workflow. The software pipeline includes a complete set of tools to do stitching, automated quality control, 3D section alignment, and rendering of the assembled volume to disk. We also implemented a workflow engine that executes the volume assembly workflow in an automated fashion triggered following the transfer of raw data. ASAP has been successfully utilized for continuous processing of several large-scale datasets of the mouse visual cortex and human brain samples including one cubic millimeter of mouse visual cortex [1, 25]. The pipeline also has multi-channel processing capabilities and can be applied to fluorescence and multi-modal datasets like array tomography.

MeTiS: a modular pipeline for extracting 3D-printable brain-surface models from conventional and ultra-high field MRI

Journal of 3D Printing in Medicine ◽

10.2217/3dp-2021-0018 ◽

2021 ◽

Author(s):

Jonathan Lee ◽

Gary Hoang ◽

Chia-Shang Liu ◽

Mark Shiroishi ◽

Alexander Lerner ◽

...

Keyword(s):

Software Tool ◽

3D Models ◽

Ultra High Field ◽

Software Pipeline ◽

Software Application ◽

Modular Software ◽

Brain Surface ◽

Skull Stripping ◽

Surface Models ◽

Combination Of Methods

Aim: To develop a modular software pipeline for robustly extracting 3D brain-surface models from MRIs for visualization or printing. No other end-to-end pipeline specialized for neuroimaging does this directly with an interchangeable combination of methods. Materials & methods: A software application was developed to dynamically generate Nipype workflows using interfaces from the Analysis of Functional NeuroImages, Advanced Normalization Tools, FreeSurfer, BrainSuite, Nighres and the FMRIB Software Library suites. The application was deployed for public use via the LONI pipeline environment. Results: In a small, head-to-head comparison test, a pipeline using FreeSurfer for both the skull stripping and cortical-mesh extraction stages earned the highest subjective quality scores. Conclusion: We have deployed a publicly available and modular software tool for extracting 3D models from brain MRIs to use in medical education.

iCompare: A Package for Automated Comparison of Solar System Integrators*

Research Notes of the AAS ◽

10.3847/2515-5172/ac3a6a ◽

2021 ◽

Vol 5 (11) ◽

pp. 267

Author(s):

Maria Chernyavskaya ◽

Mario Jurić ◽

Joachim Moeyens ◽

Siegfried Eggl ◽

Lynne Jones

Keyword(s):

Phase Space ◽

Solar System ◽

System Dynamics ◽

Prediction Accuracy ◽

Software Pipeline ◽

Pipeline Construction ◽

Link Type ◽

Test Particles ◽

Solar System Dynamics ◽

System Phase

Abstract We present a tool for the comparison and validation of the integration packages suitable for Solar System dynamics. iCompare, written in Python, compares the ephemeris prediction accuracy of a suite of commonly-used integration packages (JPL/HORIZONS, OpenOrb, OrbFit at present). It integrates a set of test particles with orbits picked to explore both usual and unusual regions in Solar System phase space and compares the computed to reference ephemerides. The results are visualized in an intuitive dashboard. This allows for the assessment of integrator suitability as a function of population, as well as monitoring their performance from version to version (a capability needed for the Rubin Observatory’s software pipeline construction efforts). We provide the code on GitHub with a readily runnable version in Binder (https://github.com/dirac-institute/iCompare).

Parakeet: a digital twin software pipeline to assess the impact of experimental parameters on tomographic reconstructions for cryo-electron tomography

Open Biology ◽

10.1098/rsob.210160 ◽

2021 ◽

Vol 11 (10) ◽

Author(s):

James M. Parkhurst ◽

Maud Dumoux ◽

Mark Basham ◽

Daniel Clare ◽

C. Alistair Siebert ◽

...

Keyword(s):

Data Acquisition ◽

Biological Samples ◽

Electron Tomography ◽

Tomographic Reconstruction ◽

Three Dimensional ◽

Software Pipeline ◽

Digital Twin ◽

Cryo Electron Tomography ◽

The Impact

In cryo-electron tomography (cryo-ET) of biological samples, the quality of tomographic reconstructions can vary depending on the transmission electron microscope (TEM) instrument and data acquisition parameters. In this paper, we present Parakeet, a ‘digital twin’ software pipeline for the assessment of the impact of various TEM experiment parameters on the quality of three-dimensional tomographic reconstructions. The Parakeet digital twin is a digital model that can be used to optimize the performance and utilization of a physical instrument to enable in silico optimization of sample geometries, data acquisition schemes and instrument parameters. The digital twin performs virtual sample generation, TEM image simulation, and tilt series reconstruction and analysis within a convenient software framework. As well as being able to produce physically realistic simulated cryo-ET datasets to aid the development of tomographic reconstruction and subtomogram averaging programs, Parakeet aims to enable convenient assessment of the effects of different microscope parameters and data acquisition parameters on reconstruction quality. To illustrate the use of the software, we present the example of a quantitative analysis of missing wedge artefacts on simulated planar and cylindrical biological samples and discuss how data collection parameters can be modified for cylindrical samples where a full 180° tilt range might be measured.

Industrial robot programming by demonstration using stereoscopic vision and inertial sensing

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-02-2021-0043 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

João Pedro C. de Souza ◽

António M. Amorim ◽

Luís F. Rocha ◽

Vítor H. Pinto ◽

António Paulo Moreira

Keyword(s):

Industrial Robot ◽

Tracking System ◽

Low Cost ◽

Stereoscopic Vision ◽

Shop Floor ◽

Programming By Demonstration ◽

Inertial Sensing ◽

Software Pipeline ◽

Content Type ◽

Pose Tracking

Purpose The purpose of this paper is to present a programming by demonstration (PbD) system based on 3D stereoscopic vision and inertial sensing that provides a cost-effective pose tracking system, even during error-prone situations, such as camera occlusions. Design/methodology/approach The proposed PbD system is based on the 6D Mimic innovative solution, whose six degrees of freedom marker hardware had to be revised and restructured to accommodate an IMU sensor. Additionally, a new software pipeline was designed to include this new sensing device, seeking the improvement of the overall system’s robustness in stereoscopic vision occlusion situations. Findings The IMU component and the new software pipeline allow the 6D Mimic system to successfully maintain the pose tracking when the main tracking tool, i.e. the stereoscopic vision, fails. Therefore, the system improves in terms of reliability, robustness, and accuracy which were verified by real experiments. Practical implications Based on this proposal, the 6D Mimic system reaches a reliable and low-cost PbD methodology. Therefore, the robot can accurately replicate, on an industrial scale, the artisan level performance of highly skilled shop-floor operators. Originality/value To the best of the authors’ knowledge, the sensor fusion between stereoscopic images and IMU applied to robot PbD is a novel approach. The system is entirely designed aiming to reduce costs and taking advantage of an offline processing step for data analysis, filtering and fusion, enhancing the reliability of the PbD system.

From mathematics to medicine: A practical primer on topological data analysis (TDA) and the development of related analytic tools for the functional discovery of latent structure in fMRI data

PLoS ONE ◽

10.1371/journal.pone.0255859 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255859

Author(s):

Andrew Salch ◽

Adam Regalski ◽

Hassan Abdallah ◽

Raviteja Suryadevara ◽

Michael J. Catanzaro ◽

...

Keyword(s):

Collaborative Research ◽

Persistent Homology ◽

Topological Data Analysis ◽

Fmri Data ◽

Data Sets ◽

Software Pipeline ◽

Spatially Distributed ◽

Mathematical Techniques ◽

Topological Data

fMRI is the preeminent method for collecting signals from the human brain in vivo, for using these signals in the service of functional discovery, and relating these discoveries to anatomical structure. Numerous computational and mathematical techniques have been deployed to extract information from the fMRI signal. Yet, the application of Topological Data Analyses (TDA) remain limited to certain sub-areas such as connectomics (that is, with summarized versions of fMRI data). While connectomics is a natural and important area of application of TDA, applications of TDA in the service of extracting structure from the (non-summarized) fMRI data itself are heretofore nonexistent. “Structure” within fMRI data is determined by dynamic fluctuations in spatially distributed signals over time, and TDA is well positioned to help researchers better characterize mass dynamics of the signal by rigorously capturing shape within it. To accurately motivate this idea, we a) survey an established method in TDA (“persistent homology”) to reveal and describe how complex structures can be extracted from data sets generally, and b) describe how persistent homology can be applied specifically to fMRI data. We provide explanations for some of the mathematical underpinnings of TDA (with expository figures), building ideas in the following sequence: a) fMRI researchers can and should use TDA to extract structure from their data; b) this extraction serves an important role in the endeavor of functional discovery, and c) TDA approaches can complement other established approaches toward fMRI analyses (for which we provide examples). We also provide detailed applications of TDA to fMRI data collected using established paradigms, and offer our software pipeline for readers interested in emulating our methods. This working overview is both an inter-disciplinary synthesis of ideas (to draw researchers in TDA and fMRI toward each other) and a detailed description of methods that can motivate collaborative research.

Tissue heterogeneity is prevalent in gene expression studies

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab077 ◽

2021 ◽

Vol 3 (3) ◽

Author(s):

Gregor Sturm ◽

Markus List ◽

Jitao David Zhang

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Open Source Software ◽

Tissue Type ◽

Expression Data ◽

Tissue Heterogeneity ◽

Software Pipeline ◽

Expression Studies ◽

Widespread Phenomenon ◽

Gene Expression Studies

Abstract Lack of reproducibility in gene expression studies is a serious issue being actively addressed by the biomedical research community. Besides established factors such as batch effects and incorrect sample annotations, we recently reported tissue heterogeneity, a consequence of unintended profiling of cells of other origins than the tissue of interest, as a source of variance. Although tissue heterogeneity exacerbates irreproducibility, its prevalence in gene expression data remains unknown. Here, we systematically analyse 2 667 publicly available gene expression datasets covering 76 576 samples. Using two independent data compendia and a reproducible, open-source software pipeline, we find a prevalence of tissue heterogeneity in gene expression data that affects between 1 and 40% of the samples, depending on the tissue type. We discover both cases of severe heterogeneity, which may be caused by mistakes in annotation or sample handling, and cases of moderate heterogeneity, which are likely caused by tissue infiltration or sample contamination. Our analysis establishes tissue heterogeneity as a widespread phenomenon in publicly available gene expression datasets, which constitutes an important source of variance that should not be ignored. Consequently, we advocate the application of quality-control methods such as BioQC to detect tissue heterogeneity prior to mining or analysing gene expression data.

PUblications Metadata Augmentation (PUMA) pipeline

F1000Research ◽

10.12688/f1000research.25484.2 ◽

2021 ◽

Vol 9 ◽

pp. 1095

Author(s):

Oliver W. Butters ◽

Rebecca C. Wilson ◽

Hugh Garner ◽

Thomas W. Y. Burton

Keyword(s):

Bibliometric Analysis ◽

Study Data ◽

Third Party ◽

Data Set ◽

Software Pipeline ◽

Web Based ◽

Social Studies Of Science ◽

Exploration Tool ◽

Barrier To Entry ◽

The Impact

Cohort studies collect, generate and distribute data over long periods of time – often over the lifecourse of their participants. It is common for these studies to host a list of publications (which can number many thousands) on their website to demonstrate the impact of the study and facilitate the search of existing research to which the study data has contributed. The ability to search and explore these publication lists varies greatly between studies. We believe a lack of rich search and exploration functionality of study publications is a barrier to entry for new or prospective users of a study’s data, since it may be difficult to find and evaluate previous work in a given area. These lists of publications are also typically manually curated, resulting in a lack of rich metadata to analyse, making bibliometric analysis difficult. We present here a software pipeline that aggregates metadata from a variety of third-party providers to power a web based search and exploration tool for lists of publications. Alongside core publication metadata (i.e. author lists, keywords etc.), we include geocoding of first authors and citation counts in our pipeline. This allows a characterisation of a study as a whole based on common locations of authors, frequency of keywords, citation profile etc. This enriched publications metadata can be useful for generating study impact metrics and web-based graphics for public dissemination. In addition, the pipeline produces a research data set for bibliometric analysis or social studies of science. We use a previously published list of publications from a cohort study as an exemplar input data set to show the output and utility of the pipeline here.

3DeeCellTracker, a deep learning-based pipeline for segmenting and tracking cells in 3D time lapse images

eLife ◽

10.7554/elife.59187 ◽

2021 ◽

Vol 10 ◽

Author(s):

Chentao Wen ◽

Takuya Miura ◽

Venkatakaushik Voleti ◽

Kazushi Yamaguchi ◽

Motosuke Tsutsumi ◽

...

Keyword(s):

Deep Learning ◽

Three Dimensional ◽

Time Lapse ◽

Training Data ◽

Optical Systems ◽

New Techniques ◽

Software Pipeline ◽

Dynamic Cell ◽

Initial Correction ◽

Zebrafish Heart

Despite recent improvements in microscope technologies, segmenting and tracking cells in three-dimensional time-lapse images (3D + T images) to extract their dynamic positions and activities remains a considerable bottleneck in the field. We developed a deep learning-based software pipeline, 3DeeCellTracker, by integrating multiple existing and new techniques including deep learning for tracking. With only one volume of training data, one initial correction, and a few parameter changes, 3DeeCellTracker successfully segmented and tracked ~100 cells in both semi-immobilized and ‘straightened’ freely moving worm's brain, in a naturally beating zebrafish heart, and ~1000 cells in a 3D cultured tumor spheroid. While these datasets were imaged with highly divergent optical systems, our method tracked 90–100% of the cells in most cases, which is comparable or superior to previous results. These results suggest that 3DeeCellTracker could pave the way for revealing dynamic cell activities in image datasets that have been difficult to analyze.

software pipeline
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Validation of the HERA Phase I Epoch of Reionization 21 cm Power Spectrum Software Pipeline

A Scalable and Modular Automated Pipeline for Stitching of Large Electron Microscopy Datasets

MeTiS: a modular pipeline for extracting 3D-printable brain-surface models from conventional and ultra-high field MRI

iCompare: A Package for Automated Comparison of Solar System Integrators*

Parakeet: a digital twin software pipeline to assess the impact of experimental parameters on tomographic reconstructions for cryo-electron tomography

Industrial robot programming by demonstration using stereoscopic vision and inertial sensing

From mathematics to medicine: A practical primer on topological data analysis (TDA) and the development of related analytic tools for the functional discovery of latent structure in fMRI data

Tissue heterogeneity is prevalent in gene expression studies

PUblications Metadata Augmentation (PUMA) pipeline

3DeeCellTracker, a deep learning-based pipeline for segmenting and tracking cells in 3D time lapse images

Export Citation Format

software pipelineRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Validation of the HERA Phase I Epoch of Reionization 21 cm Power Spectrum Software Pipeline

A Scalable and Modular Automated Pipeline for Stitching of Large Electron Microscopy Datasets

MeTiS: a modular pipeline for extracting 3D-printable brain-surface models from conventional and ultra-high field MRI

iCompare: A Package for Automated Comparison of Solar System Integrators*

Parakeet: a digital twin software pipeline to assess the impact of experimental parameters on tomographic reconstructions for cryo-electron tomography

Industrial robot programming by demonstration using stereoscopic vision and inertial sensing

From mathematics to medicine: A practical primer on topological data analysis (TDA) and the development of related analytic tools for the functional discovery of latent structure in fMRI data

Tissue heterogeneity is prevalent in gene expression studies

PUblications Metadata Augmentation (PUMA) pipeline

3DeeCellTracker, a deep learning-based pipeline for segmenting and tracking cells in 3D time lapse images

software pipeline
Recently Published Documents