A single latent channel is sufficient for biomedical image segmentation
Glottis segmentation is a crucial step to quantify endoscopic footage in laryngeal high-speed videoendoscopy. Recent advances in using deep neural networks for glottis segmentation allow a fully automatic workflow. However, exact knowledge of integral parts of these segmentation deep neural networks remains unknown. Here, we show using systematic ablations that a single latent channel as bottleneck layer is sufficient for glottal area segmentation. We further show that the latent space is an abstraction of the glottal area segmentation relying on three spatially defined pixel subtypes. We provide evidence that the latent space is highly correlated with the glottal area waveform, can be encoded with four bits, and decoded using lean decoders while maintaining a high reconstruction accuracy. Our findings suggest that glottis segmentation is a task that can be highly optimized to gain very efficient and clinical applicable deep neural networks. In future, we believe that online deep learning-assisted monitoring is a game changer in laryngeal examinations.