analysis workflow
Recently Published Documents


TOTAL DOCUMENTS

353
(FIVE YEARS 207)

H-INDEX

17
(FIVE YEARS 7)

2022 ◽  
Author(s):  
Roger Beecham ◽  
Robin Lovelace

Road safety research is a data-rich field with large social impacts. Like in medical research, the ambition is to build knowledge around risk factors that can save lives. Unlike medical research, road safety research generates empirical findings from messy observational datasets. Records of road crashes contain numerous intersecting categorical variables, dominating patterns that are complicated by confounding and, when conditioning on data to make inferences net of this, observed effects that are subject to uncertainty due to diminishing sample sizes. We demonstrate how visual data analysis approaches can inject rigour into exploratory analysis of such datasets. A framework is presented whereby graphics are used to expose, model and evaluate spatial patterns in observational data, as well as protect against false discovery. The framework is supported through an applied data analysis of national crash patterns recorded in STATS19, the main source of road crash information in Great Britain. Our framework moves beyond typical depictions of exploratory data analysis and helps navigate complex data analysis decision spaces typical in modern geographical analysis settings, generating data-driven outputs that support effective policy interventions and public debate.


2022 ◽  
Author(s):  
soumya banerjee

Abstract Objective Achieving sufficient statistical power in a survival analysis usually requires large amounts of data from different sites. Sensitivity of individual-level data, ethical and practical considerations regarding data sharing across institutions could be a potential challenge for achieving this added power. Hence we implemented a federated meta-analysis approach of survival models in DataSHIELD, where only anonymous aggregated data are shared across institutions, while simultaneously allowing for exploratory, interactive modelling. In this case, meta-analysis techniques to combine analysis results from each site are a solution, but a manual analysis workflow hinders exploration. Thus, the aim is to provide a framework for performing meta-analysis of Cox regression models across institutions without manual analysis steps for the data providers. Results We introduce a package ( dsSurvival) which allows privacy preserving meta-analysis of survival models, including the calculation of hazard ratios. Our tool can be of great use in biomedical research where there is a need for building survival models and there are privacy concerns about sharing data.


2022 ◽  
Vol 3 ◽  
Author(s):  
Rocco D’Antuono ◽  
Giuseppina Pisignano

Bioimage analysis workflows allow the measurement of sample properties such as fluorescence intensity and polarization, cell number, and vesicles distribution, but often require the integration of multiple software tools. Furthermore, it is increasingly appreciated that to overcome the limitations of the 2D-view-based image analysis approaches and to correctly understand and interpret biological processes, a 3D segmentation of microscopy data sets becomes imperative. Despite the availability of numerous algorithms for the 2D and 3D segmentation, the latter still offers some challenges for the end-users, who often do not have either an extensive knowledge of the existing software or coding skills to link the output of multiple tools. While several commercial packages are available on the market, fewer are the open-source solutions able to execute a complete 3D analysis workflow. Here we present ZELDA, a new napari plugin that easily integrates the cutting-edge solutions offered by python ecosystem, such as scikit-image for image segmentation, matplotlib for data visualization, and napari multi-dimensional image viewer for 3D rendering. This plugin aims to provide interactive and zero-scripting customizable workflows for cell segmentation, vesicles counting, parent-child relation between objects, signal quantification, and results presentation; all included in the same open-source napari viewer, and “few clicks away”.


2021 ◽  
Vol 12 ◽  
Author(s):  
Carter Allen ◽  
Brittany N. Kuhn ◽  
Nazzareno Cannella ◽  
Ayteria D. Crow ◽  
Analyse T. Roberts ◽  
...  

Opioid use disorder is a psychological condition that affects over 200,000 people per year in the U.S., causing the Centers for Disease Control and Prevention to label the crisis as a rapidly spreading public health epidemic. The behavioral relationship between opioid exposure and development of opioid use disorder (OUD) varies greatly between individuals, implying existence of sup-populations with varying degrees of opioid vulnerability. However, effective pre-clinical identification of these sub-populations remains challenging due to the complex multivariate measurements employed in animal models of OUD. In this study, we propose a novel non-linear network-based data analysis workflow that employs seven behavioral traits to identify opioid use sub-populations and assesses contributions of behavioral variables to opioid vulnerability and resiliency. Through this analysis workflow we determined how behavioral variables across heroin taking, refraining and seeking interact with one another to identify potentially heroin resilient and vulnerable behavioral sub-populations. Data were collected from over 400 heterogeneous stock rats in two geographically distinct locations. Rats underwent heroin self-administration training, followed by a progressive ratio and heroin-primed reinstatement test. Next, rats underwent extinction training and a cue-induced reinstatement test. To enter the analysis workflow, we integrated data from different cohorts of rats and removed possible batch effects. We then constructed a rat-rat similarity network based on their behavioral patterns and implemented community detection on this similarity network using a Bayesian degree-corrected stochastic block model to uncover sub-populations of rats with differing levels of opioid vulnerability. We identified three statistically distinct clusters corresponding to distinct behavioral sub-populations, vulnerable, resilient and intermediate for heroin use, refraining and seeking. We implement this analysis workflow as an open source R package, named mlsbm.


2021 ◽  
Author(s):  
Douglas F Porter ◽  
Raghav M Garg ◽  
Robin M Meyers ◽  
Weili Miao ◽  
Luca Ducoli ◽  
...  

The easyCLIP protocol describes a method for both normal CLIP library construction and the absolute quantification of RNA cross-linking rates, data which could be usefully combined to analyze RNA-protein interactions. Using these cross-linking metrics, significant interactions could be defined relative to a set of random non-RBPs. The original easyCLIP protocol did not use index reads, required custom sequencing primers, and did not have an easily reproducible analysis workflow. This short paper attempts to amend these deficiencies. It also includes some additional technical experiments and investigates the usage of alternative adapters. The results here are intended to allow more options to easily perform and analyze easyCLIP.


Author(s):  
Ralph Kube ◽  
Randy Michael Churchill ◽  
Choong Seock Chang ◽  
Jong Choi ◽  
Ruonan Wang ◽  
...  

Abstract Experiments on fusion plasmas produce high-dimensional data time series with ever increasing magnitude and velocity, but turn-around times for analysis of this data have not kept up. For example, many data analysis tasks are often performed in a manual, ad-hoc manner some time after an experiment. In this article we introduce the DELTA framework that facilitates near real-time streaming analysis of big and fast fusion data. By streaming measurement data from fusion experiments to a high-performance compute center, DELTA allows computationally expensive data analysis tasks to be performed in between plasma pulses. This article describe the modular and expandable software architecture of DELTA and present performance benchmarks of individual components as well as of an example workflows. Focusing on a streaming analysis workflow where Electron cyclotron emission imaging (ECEi) data measured at KSTAR on NERSC's supercomputer we routinely observe data transfer rates of about 4 Gigabit per second. At NERSC, a demanding turbulence analysis workflow effectively utilizes multiple nodes and graphical processing units and executes in under 5 minutes. We further discuss how DELTA uses modern database systems and container orchestration services to provide web-based real-time data visualization. For the case of ECEi data we demonstrate how data visualizations can be augmented with outputs from machine learning models. By providing session leaders and physics operators results of higher order data analysis using live visualizations may make more informed decisions on how to configure the machine for the next shot.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Errol L. G. Samuel ◽  
Secondra L. Holmes ◽  
Damian W. Young

AbstractThe thermal shift assay (TSA)—also known as differential scanning fluorimetry (DSF), thermofluor, and Tm shift—is one of the most popular biophysical screening techniques used in fragment-based ligand discovery (FBLD) to detect protein–ligand interactions. By comparing the thermal stability of a target protein in the presence and absence of a ligand, potential binders can be identified. The technique is easy to set up, has low protein consumption, and can be run on most real-time polymerase chain reaction (PCR) instruments. While data analysis is straightforward in principle, it becomes cumbersome and time-consuming when the screens involve multiple 96- or 384-well plates. There are several approaches that aim to streamline this process, but most involve proprietary software, programming knowledge, or are designed for specific instrument output files. We therefore developed an analysis workflow implemented in the Konstanz Information Miner (KNIME), a free and open-source data analytics platform, which greatly streamlined our data processing timeline for 384-well plates. The implementation is code-free and freely available to the community for improvement and customization to accommodate a wide range of instrument input files and workflows. Graphical Abstract


2021 ◽  
Vol 17 (11) ◽  
pp. e1008946
Author(s):  
Niksa Praljak ◽  
Shamreen Iram ◽  
Utku Goreke ◽  
Gundeep Singh ◽  
Ailis Hill ◽  
...  

Sickle cell disease, a genetic disorder affecting a sizeable global demographic, manifests in sickle red blood cells (sRBCs) with altered shape and biomechanics. sRBCs show heightened adhesive interactions with inflamed endothelium, triggering painful vascular occlusion events. Numerous studies employ microfluidic-assay-based monitoring tools to quantify characteristics of adhered sRBCs from high resolution channel images. The current image analysis workflow relies on detailed morphological characterization and cell counting by a specially trained worker. This is time and labor intensive, and prone to user bias artifacts. Here we establish a morphology based classification scheme to identify two naturally arising sRBC subpopulations—deformable and non-deformable sRBCs—utilizing novel visual markers that link to underlying cell biomechanical properties and hold promise for clinically relevant insights. We then set up a standardized, reproducible, and fully automated image analysis workflow designed to carry out this classification. This relies on a two part deep neural network architecture that works in tandem for segmentation of channel images and classification of adhered cells into subtypes. Network training utilized an extensive data set of images generated by the SCD BioChip, a microfluidic assay which injects clinical whole blood samples into protein-functionalized microchannels, mimicking physiological conditions in the microvasculature. Here we carried out the assay with the sub-endothelial protein laminin. The machine learning approach segmented the resulting channel images with 99.1±0.3% mean IoU on the validation set across 5 k-folds, classified detected sRBCs with 96.0±0.3% mean accuracy on the validation set across 5 k-folds, and matched trained personnel in overall characterization of whole channel images with R2 = 0.992, 0.987 and 0.834 for total, deformable and non-deformable sRBC counts respectively. Average analysis time per channel image was also improved by two orders of magnitude (∼ 2 minutes vs ∼ 2-3 hours) over manual characterization. Finally, the network results show an order of magnitude less variance in counts on repeat trials than humans. This kind of standardization is a prerequisite for the viability of any diagnostic technology, making our system suitable for affordable and high throughput disease monitoring.


Molecules ◽  
2021 ◽  
Vol 26 (23) ◽  
pp. 7192
Author(s):  
Simona De Vita ◽  
Maria Giovanna Chini ◽  
Giuseppe Bifulco ◽  
Gianluigi Lauro

The estimation of the binding of a set of molecules against BRD9 protein was carried out through an in silico molecular dynamics-driven exhaustive analysis to guide the identification of potential novel ligands. Starting from eight crystal structures of this protein co-complexed with known binders and one apo form, we conducted an exhaustive molecular docking/molecular dynamics (MD) investigation. To balance accuracy and an affordable calculation time, the systems were simulated for 100 ns in explicit solvent. Moreover, one complex was simulated for 1 µs to assess the influence of simulation time on the results. A set of MD-derived parameters was computed and compared with molecular docking-derived and experimental data. MM-GBSA and the per-residue interaction energy emerged as the main indicators for the good interaction between the specific binder and the protein counterpart. To assess the performance of the proposed analysis workflow, we tested six molecules featuring different binding affinities for BRD9, obtaining promising outcomes. Further insights were reported to highlight the influence of the starting structure on the molecular dynamics simulations evolution. The data confirmed that a ranking of BRD9 binders using key parameters arising from molecular dynamics is advisable to discard poor ligands before moving on with the synthesis and the biological tests.


Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1865
Author(s):  
Baoting Nong ◽  
Mengbiao Guo ◽  
Weiwen Wang ◽  
Songyang Zhou ◽  
Yuanyan Xiong

Various abnormalities of transcriptional regulation revealed by RNA sequencing (RNA-seq) have been reported in cancers. However, strategies to integrate multi-modal information from RNA-seq, which would help uncover more disease mechanisms, are still limited. Here, we present PipeOne, a cross-platform one-stop analysis workflow for large-scale transcriptome data. It was developed based on Nextflow, a reproducible workflow management system. PipeOne is composed of three modules, data processing and feature matrices construction, disease feature prioritization, and disease subtyping. It first integrates eight different tools to extract different information from RNA-seq data, and then used random forest algorithm to study and stratify patients according to evidences from multiple-modal information. Its application in five cancers (colon, liver, kidney, stomach, or thyroid; total samples n = 2024) identified various dysregulated key features (such as PVT1 expression and ABI3BP alternative splicing) and pathways (especially liver and kidney dysfunction) shared by multiple cancers. Furthermore, we demonstrated clinically-relevant patient subtypes in four of five cancers, with most subtypes characterized by distinct driver somatic mutations, such as TP53, TTN, BRAF, HRAS, MET, KMT2D, and KMT2C mutations. Importantly, these subtyping results were frequently contributed by dysregulated biological processes, such as ribosome biogenesis, RNA binding, and mitochondria functions. PipeOne is efficient and accurate in studying different cancer types to reveal the specificity and cross-cancer contributing factors of each cancer.It could be easily applied to other diseases and is available at GitHub.


Sign in / Sign up

Export Citation Format

Share Document