On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.

Download Full-text

On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

10.1101/473199 ◽

2018 ◽

Cited By ~ 4

Author(s):

Dennis Segebarth ◽

Matthias Griebel ◽

Nikolai Stein ◽

Cora R. von Collenberg ◽

Corinna Martin ◽

...

Keyword(s):

Deep Learning ◽

Signal To Noise Ratio ◽

Life Sciences ◽

Reliability And Validity ◽

Ground Truth ◽

Fluorescent Labeling ◽

Signal To Noise ◽

Data Annotation ◽

Analysis Process ◽

Model Training

AbstractFluorescent labeling of biomolecules is widely used for bioimage analyses throughout the life sciences. Recent advances in deep learning (DL) have opened new possibilities to scale the image analysis processes through automation. However, the annotation of fluorescent features with a low signal-to-noise ratio is frequently based on subjective criteria. Training on subjective annotations may ultimately lead to biased DL models yielding irreproducible results. An end-to-end analysis process that integrates data annotation, ground truth estimation, and model training can mitigate this risk. To highlight the importance of this integrated process, we compare different DL-based analysis approaches. Based on data from different laboratories, we show that ground truth estimation from multiple human annotators is indispensable to establish objectivity in fluorescent feature annotations. We demonstrate that ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible and transparent bioimage analyses using DL methods.

Download Full-text

Unsupervised content-preserving transformation for optical microscopy

Light Science & Applications ◽

10.1038/s41377-021-00484-y ◽

2021 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Xinyang Li ◽

Guoxun Zhang ◽

Hui Qiao ◽

Feng Bao ◽

Yue Deng ◽

...

Keyword(s):

Deep Learning ◽

Optical Microscopy ◽

Training Data ◽

Fluorescence Labeling ◽

Imaging Data ◽

Image Transformation ◽

General Applicability ◽

Data Annotation ◽

Biomedical Image ◽

Wide Range

AbstractThe development of deep learning and open access to a substantial collection of imaging data together provide a potential solution for computational image transformation, which is gradually changing the landscape of optical imaging and biomedical research. However, current implementations of deep learning usually operate in a supervised manner, and their reliance on laborious and error-prone data annotation procedures remains a barrier to more general applicability. Here, we propose an unsupervised image transformation to facilitate the utilization of deep learning for optical microscopy, even in some cases in which supervised models cannot be applied. Through the introduction of a saliency constraint, the unsupervised model, named Unsupervised content-preserving Transformation for Optical Microscopy (UTOM), can learn the mapping between two image domains without requiring paired training data while avoiding distortions of the image content. UTOM shows promising performance in a wide range of biomedical image transformation tasks, including in silico histological staining, fluorescence image restoration, and virtual fluorescence labeling. Quantitative evaluations reveal that UTOM achieves stable and high-fidelity image transformations across different imaging conditions and modalities. We anticipate that our framework will encourage a paradigm shift in training neural networks and enable more applications of artificial intelligence in biomedical imaging.

Download Full-text

Deep learning-driven velocity model building workflow

The Leading Edge ◽

10.1190/tle38110872a1.1 ◽

2019 ◽

Vol 38 (11) ◽

pp. 872a1-872a9 ◽

Cited By ~ 4

Author(s):

Mauricio Araya-Polo ◽

Stuart Farris ◽

Manuel Florez

Keyword(s):

Deep Learning ◽

Seismic Data ◽

Model Building ◽

Ground Truth ◽

Velocity Model ◽

Training Data ◽

Quality Data ◽

Generative Adversarial Network ◽

Data Set ◽

Velocity Models

Exploration seismic data are heavily manipulated before human interpreters are able to extract meaningful information regarding subsurface structures. This manipulation adds modeling and human biases and is limited by methodological shortcomings. Alternatively, using seismic data directly is becoming possible thanks to deep learning (DL) techniques. A DL-based workflow is introduced that uses analog velocity models and realistic raw seismic waveforms as input and produces subsurface velocity models as output. When insufficient data are used for training, DL algorithms tend to overfit or fail. Gathering large amounts of labeled and standardized seismic data sets is not straightforward. This shortage of quality data is addressed by building a generative adversarial network (GAN) to augment the original training data set, which is then used by DL-driven seismic tomography as input. The DL tomographic operator predicts velocity models with high statistical and structural accuracy after being trained with GAN-generated velocity models. Beyond the field of exploration geophysics, the use of machine learning in earth science is challenged by the lack of labeled data or properly interpreted ground truth, since we seldom know what truly exists beneath the earth's surface. The unsupervised approach (using GANs to generate labeled data)illustrates a way to mitigate this problem and opens geology, geophysics, and planetary sciences to more DL applications.

Download Full-text

EVICAN—a balanced dataset for algorithm development in cell and nucleus segmentation

Bioinformatics ◽

10.1093/bioinformatics/btaa225 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3863-3870

Author(s):

Mischa Schwendy ◽

Ronald E Unger ◽

Sapun H Parekh

Keyword(s):

Deep Learning ◽

Cell Biology ◽

Ground Truth ◽

Training Data ◽

Training Dataset ◽

Visual Cell ◽

Application Development ◽

Ground Truth Data ◽

Quantitative Image ◽

Nucleus Segmentation

Abstract Motivation Deep learning use for quantitative image analysis is exponentially increasing. However, training accurate, widely deployable deep learning algorithms requires a plethora of annotated (ground truth) data. Image collections must contain not only thousands of images to provide sufficient example objects (i.e. cells), but also contain an adequate degree of image heterogeneity. Results We present a new dataset, EVICAN—Expert visual cell annotation, comprising partially annotated grayscale images of 30 different cell lines from multiple microscopes, contrast mechanisms and magnifications that is readily usable as training data for computer vision applications. With 4600 images and ∼26 000 segmented cells, our collection offers an unparalleled heterogeneous training dataset for cell biology deep learning application development. Availability and implementation The dataset is freely available (https://edmond.mpdl.mpg.de/imeji/collection/l45s16atmi6Aa4sI?q=). Using a Mask R-CNN implementation, we demonstrate automated segmentation of cells and nuclei from brightfield images with a mean average precision of 61.6 % at a Jaccard Index above 0.5.

Download Full-text

Reducing Manual Operation Time to Obtain a Segmentation Learning Model for Volume Electron Microscopy Using Stepwise Deep Learning With Manual Correction

Microscopy ◽

10.1093/jmicro/dfab025 ◽

2021 ◽

Author(s):

Kohki Konishi ◽

Takao Nonaka ◽

Shunsuke Takei ◽

Keisuke Ohta ◽

Hideo Nishioka ◽

...

Keyword(s):

Electron Microscopy ◽

Deep Learning ◽

Three Dimensional ◽

Operation Time ◽

Training Data ◽

Manual Operation ◽

Annotation Method ◽

Section Electron ◽

Model Training ◽

Manual Correction

Abstract Three-dimensional (3D) observation of a biological sample using serial-section electron microscopy is widely used. However, organelle segmentation requires a significant amount of manual time. Therefore, several studies have been conducted to improve their efficiency. One such promising method is 3D deep learning (DL), which is highly accurate. However, the creation of training data for 3D DL still requires manual time and effort. In this study, we developed a highly efficient integrated image segmentation tool that includes stepwise DL with manual correction. The tool has four functions: efficient tracers for annotation, model training/inference for organelle segmentation using a lightweight convolutional neural network, efficient proofreading, and model refinement. We applied this tool to increase the training data step by step (stepwise annotation method) to segment the mitochondria in the cells of the cerebral cortex. We found that the stepwise annotation method reduced the manual operation time by one-third compared with that of the fully manual method, where all the training data were created manually. Moreover, we demonstrated that the F1 score, the metric of segmentation accuracy, was 0.9 by training the 3D DL model with these training data. The stepwise annotation method using this tool and the 3D DL model improved the segmentation efficiency for various organelles.

Download Full-text

Neural Supermodeling

10.5194/egusphere-egu2020-12601 ◽

2020 ◽

Author(s):

Wim Wiegerinck

Keyword(s):

Deep Learning ◽

Domain Knowledge ◽

Model Performance ◽

Ground Truth ◽

Training Data ◽

Combination Method ◽

Neural Network Approach ◽

Modeling Approach ◽

Promising Tool ◽

Spatially Extended

Deep learning is a modeling approach that has shown impressive results in image processing and is arguably a promising tool for dealing with spatially extended complex systems such earth atmosphere with its visually interpretable patterns. A disadvantage of the neural network approach is that it typically requires an enormous amount of training data.&#160;Another recently proposed modeling approach is supermodeling. In supermodeling it is assumed that a dynamical system &#8211; the truth &#8211; is modelled by a set of good but imperfect models. The idea is to improve model performance by dynamically combining imperfect models during the simulation. The resulting combination of models is called the supermodel. The combination strength has to be learned from data. However, since supermodels do not start from scratch, but make use of existing domain knowledge, they may learn from less data.&#160;One of the ways to combine models is to define the tendencies of the supermodel as linear (weighted) combinations of the imperfect model tendencies. Several methods including linear regression have been proposed to optimize the weights. &#160;However, the combination method might also be nonlinear. In this work we propose and explore a novel combination of deep learning and supermodeling, in which convolutional neural networks are used as tool to combine the predictions of the imperfect models. &#160;The different supermodeling strategies are applied in simulations in a controlled environment with a three-level, quasi-geostrophic spectral model that serves as ground truth and perturbed models that serve as the imperfect models.

Download Full-text

Deep learning-based enhancement of epigenomics data with AtacWorks

Nature Communications ◽

10.1038/s41467-021-21765-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Avantika Lal ◽

Zachary D. Chiang ◽

Nikolai Yakovenko ◽

Fabiana M. Duarte ◽

Johnny Israeli ◽

...

Keyword(s):

Deep Learning ◽

Signal To Noise Ratio ◽

Cell Types ◽

Chromatin Accessibility ◽

Training Data ◽

Regulatory Regions ◽

Hematopoietic Stem ◽

Sequencing Coverage ◽

Lineage Priming ◽

Low Coverage

AbstractATAC-seq is a widely-applied assay used to measure genome-wide chromatin accessibility; however, its ability to detect active regulatory regions can depend on the depth of sequencing coverage and the signal-to-noise ratio. Here we introduce AtacWorks, a deep learning toolkit to denoise sequencing coverage and identify regulatory peaks at base-pair resolution from low cell count, low-coverage, or low-quality ATAC-seq data. Models trained by AtacWorks can detect peaks from cell types not seen in the training data, and are generalizable across diverse sample preparations and experimental platforms. We demonstrate that AtacWorks enhances the sensitivity of single-cell experiments by producing results on par with those of conventional methods using ~10 times as many cells, and further show that this framework can be adapted to enable cross-modality inference of protein-DNA interactions. Finally, we establish that AtacWorks can enable new biological discoveries by identifying active regulatory regions associated with lineage priming in rare subpopulations of hematopoietic stem cells.

Download Full-text

Deep Learning-Based Stacked Denoising and Autoencoder for ECG Heartbeat Classification

Electronics ◽

10.3390/electronics9010135 ◽

2020 ◽

Vol 9 (1) ◽

pp. 135 ◽

Cited By ~ 14

Author(s):

Siti Nurmaini ◽

Annisa Darmawahyuni ◽

Akhmad Noviar Sakti Mukti ◽

Muhammad Naufal Rachmatullah ◽

Firdaus Firdaus ◽

...

Keyword(s):

Deep Learning ◽

Signal To Noise Ratio ◽

Stress Test ◽

Feature Learning ◽

Feature Representation ◽

Training Data ◽

Fine Tuning ◽

Noninvasive Test ◽

Heartbeat Classification ◽

Unseen Data

The electrocardiogram (ECG) is a widely used, noninvasive test for analyzing arrhythmia. However, the ECG signal is prone to contamination by different kinds of noise. Such noise may cause deformation on the ECG heartbeat waveform, leading to cardiologists’ mislabeling or misinterpreting heartbeats due to varying types of artifacts and interference. To address this problem, some previous studies propose a computerized technique based on machine learning (ML) to distinguish between normal and abnormal heartbeats. Unfortunately, ML works on a handcrafted, feature-based approach and lacks feature representation. To overcome such drawbacks, deep learning (DL) is proposed in the pre-training and fine-tuning phases to produce an automated feature representation for multi-class classification of arrhythmia conditions. In the pre-training phase, stacked denoising autoencoders (DAEs) and autoencoders (AEs) are used for feature learning; in the fine-tuning phase, deep neural networks (DNNs) are implemented as a classifier. To the best of our knowledge, this research is the first to implement stacked autoencoders by using DAEs and AEs for feature learning in DL. Physionet’s well-known MIT-BIH Arrhythmia Database, as well as the MIT-BIH Noise Stress Test Database (NSTDB). Only four records are used from the NSTDB dataset: 118 24 dB, 118 −6 dB, 119 24 dB, and 119 −6 dB, with two levels of signal-to-noise ratio (SNRs) at 24 dB and −6 dB. In the validation process, six models are compared to select the best DL model. For all fine-tuned hyperparameters, the best model of ECG heartbeat classification achieves an accuracy, sensitivity, specificity, precision, and F1-score of 99.34%, 93.83%, 99.57%, 89.81%, and 91.44%, respectively. As the results demonstrate, the proposed DL model can extract high-level features not only from the training data but also from unseen data. Such a model has good application prospects in clinical practice.

Download Full-text

Quantitative Comparison of Deep Learning-Based Image Reconstruction Methods for Low-Dose and Sparse-Angle CT Applications

Journal of Imaging ◽

10.3390/jimaging7030044 ◽

2021 ◽

Vol 7 (3) ◽

pp. 44

Author(s):

Johannes Leuschner ◽

Maximilian Schmidt ◽

Poulami Somanya Ganguly ◽

Vladyslav Andriiashen ◽

Sophia Bethany Coban ◽

...

Keyword(s):

Deep Learning ◽

Low Dose ◽

Signal To Noise Ratio ◽

Structural Similarity ◽

Measurement Model ◽

Training Data ◽

Data Driven ◽

Reconstruction Methods ◽

Reconstruction Quality ◽

Public Datasets

The reconstruction of computed tomography (CT) images is an active area of research. Following the rise of deep learning methods, many data-driven models have been proposed in recent years. In this work, we present the results of a data challenge that we organized, bringing together algorithm experts from different institutes to jointly work on quantitative evaluation of several data-driven methods on two large, public datasets during a ten day sprint. We focus on two applications of CT, namely, low-dose CT and sparse-angle CT. This enables us to fairly compare different methods using standardized settings. As a general result, we observe that the deep learning-based methods are able to improve the reconstruction quality metrics in both CT applications while the top performing methods show only minor differences in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). We further discuss a number of other important criteria that should be taken into account when selecting a method, such as the availability of training data, the knowledge of the physical measurement model and the reconstruction speed.

Download Full-text

A Natural Images Pre-Trained Deep Learning Method for Seismic Random Noise Attenuation

Remote Sensing ◽

10.3390/rs14020263 ◽

2022 ◽

Vol 14 (2) ◽

pp. 263

Author(s):

Haixia Zhao ◽

Tingting Bai ◽

Zhiqiang Wang

Keyword(s):

Deep Learning ◽

Seismic Data ◽

Field Data ◽

Noise Suppression ◽

Signal To Noise Ratio ◽

Random Noise ◽

Natural Images ◽

Training Data ◽

Learning Approaches ◽

Learning Method

Seismic field data are usually contaminated by random or complex noise, which seriously affect the quality of seismic data contaminating seismic imaging and seismic interpretation. Improving the signal-to-noise ratio (SNR) of seismic data has always been a key step in seismic data processing. Deep learning approaches have been successfully applied to suppress seismic random noise. The training examples are essential in deep learning methods, especially for the geophysical problems, where the complete training data are not easy to be acquired due to high cost of acquisition. In this work, we propose a natural images pre-trained deep learning method to suppress seismic random noise through insight of the transfer learning. Our network contains pre-trained and post-trained networks: the former is trained by natural images to obtain the preliminary denoising results, while the latter is trained by a small amount of seismic images to fine-tune the denoising effects by semi-supervised learning to enhance the continuity of geological structures. The results of four types of synthetic seismic data and six field data demonstrate that our network has great performance in seismic random noise suppression in terms of both quantitative metrics and intuitive effects.

Download Full-text