A single latent channel is sufficient for biomedical image segmentation

Glottis segmentation is a crucial step to quantify endoscopic footage in laryngeal high-speed videoendoscopy. Recent advances in using deep neural networks for glottis segmentation allow a fully automatic workflow. However, exact knowledge of integral parts of these segmentation deep neural networks remains unknown. Here, we show using systematic ablations that a single latent channel as bottleneck layer is sufficient for glottal area segmentation. We further show that the latent space is an abstraction of the glottal area segmentation relying on three spatially defined pixel subtypes. We provide evidence that the latent space is highly correlated with the glottal area waveform, can be encoded with four bits, and decoded using lean decoders while maintaining a high reconstruction accuracy. Our findings suggest that glottis segmentation is a task that can be highly optimized to gain very efficient and clinical applicable deep neural networks. In future, we believe that online deep learning-assisted monitoring is a game changer in laryngeal examinations.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Fully Automatic Segmentation of the Right Ventricle Via Multi-Task Deep Neural Networks

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8461556 ◽

2018 ◽

Cited By ~ 2

Author(s):

Liang Zhang ◽

Georgios Vasileios Karanikolas ◽

Mehmet Akcakaya ◽

Georgios B. Giannakis

Keyword(s):

Neural Networks ◽

Right Ventricle ◽

Deep Neural Networks ◽

Automatic Segmentation ◽

Fully Automatic ◽

The Right

Download Full-text

A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-20-00498 ◽

2021 ◽

pp. 1-15

Author(s):

Andreas M. Kist ◽

Pablo Gómez ◽

Denis Dubrovskiy ◽

Patrick Schlegel ◽

Melda Kunduk ◽

...

Keyword(s):

Neural Networks ◽

Quantitative Analysis ◽

High Speed ◽

Voice Disorders ◽

Vocal Folds ◽

Video Data ◽

Audio Data ◽

Fully Automatic ◽

Video And Audio

Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533

Download Full-text

Biomedical Image Reconstruction: From the Foundations to Deep Neural Networks

Foundations and Trends® in Signal Processing ◽

10.1561/2000000101 ◽

2019 ◽

Vol 13 (3) ◽

pp. 283-357

Author(s):

Michael T. McCann ◽

Michael Unser

Keyword(s):

Neural Networks ◽

Image Reconstruction ◽

Deep Neural Networks ◽

Biomedical Image

Download Full-text

Ω-Net (Omega-Net): Fully automatic, multi-view cardiac MR detection, orientation, and segmentation with deep neural networks

Medical Image Analysis ◽

10.1016/j.media.2018.05.008 ◽

2018 ◽

Vol 48 ◽

pp. 95-106 ◽

Cited By ~ 34

Author(s):

Davis M. Vigneault ◽

Weidi Xie ◽

Carolyn Y. Ho ◽

David A. Bluemke ◽

J. Alison Noble

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Cardiac Mr ◽

Fully Automatic

Download Full-text

Tool-Use Model to Reproduce the Goal Situations Considering Relationship Among Tools, Objects, Actions and Effects Using Multimodal Deep Neural Networks

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.748716 ◽

2021 ◽

Vol 8 ◽

Author(s):

Namiko Saito ◽

Tetsuya Ogata ◽

Hiroki Mori ◽

Shingo Murata ◽

Shigeki Sugano

Keyword(s):

Neural Networks ◽

Real Time ◽

Tool Use ◽

Deep Neural Networks ◽

Joint Angle ◽

Image Force ◽

Training Data ◽

Task Goal ◽

Latent Space ◽

Target Effects

We propose a tool-use model that enables a robot to act toward a provided goal. It is important to consider features of the four factors; tools, objects actions, and effects at the same time because they are related to each other and one factor can influence the others. The tool-use model is constructed with deep neural networks (DNNs) using multimodal sensorimotor data; image, force, and joint angle information. To allow the robot to learn tool-use, we collect training data by controlling the robot to perform various object operations using several tools with multiple actions that leads different effects. Then the tool-use model is thereby trained and learns sensorimotor coordination and acquires relationships among tools, objects, actions and effects in its latent space. We can give the robot a task goal by providing an image showing the target placement and orientation of the object. Using the goal image with the tool-use model, the robot detects the features of tools and objects, and determines how to act to reproduce the target effects automatically. Then the robot generates actions adjusting to the real time situations even though the tools and objects are unknown and more complicated than trained ones.

Download Full-text

Rethinking glottal midline detection

10.1101/2020.08.20.257428 ◽

2020 ◽

Author(s):

Andreas M. Kist ◽

Julian Zilker ◽

Pablo Gómez ◽

Anne Schützenberger ◽

Michael Döllinger

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Vocal Fold ◽

High Speed ◽

Deep Neural Networks ◽

Comprehensive Evaluation ◽

Vocal Folds ◽

Biophysical Model ◽

Analysis Workflow ◽

Classical Computer

A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We use a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outper-formed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.

Download Full-text

Fully Automatic Brain Tumor Segmentation using End-To-End Incremental Deep Neural Networks in MRI images

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2018.09.007 ◽

2018 ◽

Vol 166 ◽

pp. 39-49 ◽

Cited By ~ 37

Author(s):

Mostefa Ben naceur ◽

Rachida Saouli ◽

Mohamed Akil ◽

Rostom Kachouri

Keyword(s):

Neural Networks ◽

Brain Tumor ◽

Deep Neural Networks ◽

Tumor Segmentation ◽

Brain Tumor Segmentation ◽

Fully Automatic ◽

End To End

Download Full-text

Biomedical image augmentation using Augmentor

Bioinformatics ◽

10.1093/bioinformatics/btz259 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4522-4524 ◽

Cited By ~ 26

Author(s):

Marcus D Bloice ◽

Peter M Roth ◽

Andreas Holzinger

Keyword(s):

Neural Networks ◽

Software Package ◽

Biomedical Imaging ◽

Deep Neural Networks ◽

Source Code ◽

Imaging Features ◽

Software Library ◽

Biomedical Image ◽

To Come ◽

Python Package

Abstract Motivation Image augmentation is a frequently used technique in computer vision and has been seeing increased interest since the popularity of deep learning. Its usefulness is becoming more and more recognized due to deep neural networks requiring larger amounts of data to train, and because in certain fields, such as biomedical imaging, large amounts of labelled data are difficult to come by or expensive to produce. In biomedical imaging, features specific to this domain need to be addressed. Results Here we present the Augmentor software package for image augmentation. It provides a stochastic, pipeline-based approach to image augmentation with a number of features that are relevant to biomedical imaging, such as z-stack augmentation and randomized elastic distortions. The software has been designed to be highly extensible meaning an operation that might be specific to a highly specialized task can easily be added to the library, even at runtime. Although it has been designed as a general software library, it has features that are particularly relevant to biomedical imaging and the techniques required for this domain. Availability and implementation Augmentor is a Python package made available under the terms of the MIT licence. Source code can be found on GitHub under https://github.com/mdbloice/Augmentor and installation is via the pip package manager (A Julia version of the package, developed in parallel by Christof Stocker, is also available under https://github.com/Evizero/Augmentor.jl).

Download Full-text