semantic segmentation
Recently Published Documents





Shilpa Pandey ◽  
Gaurav Harit

In this article, we address the problem of localizing text and symbolic annotations on the scanned image of a printed document. Previous approaches have considered the task of annotation extraction as binary classification into printed and handwritten text. In this work, we further subcategorize the annotations as underlines, encirclements, inline text, and marginal text. We have collected a new dataset of 300 documents constituting all classes of annotations marked around or in-between printed text. Using the dataset as a benchmark, we report the results of two saliency formulations—CRF Saliency and Discriminant Saliency, for predicting salient patches, which can correspond to different types of annotations. We also compare our work with recent semantic segmentation techniques using deep models. Our analysis shows that Discriminant Saliency can be considered as the preferred approach for fast localization of patches containing different types of annotations. The saliency models were learned on a small dataset, but still, give comparable performance to the deep networks for pixel-level semantic segmentation. We show that saliency-based methods give better outcomes with limited annotated data compared to more sophisticated segmentation techniques that require a large training set to learn the model.

2022 ◽  
Vol 15 (1) ◽  
pp. 1-35
Vladimir Rybalkin ◽  
Jonas Ney ◽  
Menbere Kina Tekleyohannes ◽  
Norbert Wehn

Multidimensional Long Short-Term Memory (MD-LSTM) neural network is an extension of one-dimensional LSTM for data with more than one dimension. MD-LSTM achieves state-of-the-art results in various applications, including handwritten text recognition, medical imaging, and many more. However, its implementation suffers from the inherently sequential execution that tremendously slows down both training and inference compared to other neural networks. The main goal of the current research is to provide acceleration for inference of MD-LSTM. We advocate that Field-Programmable Gate Array (FPGA) is an alternative platform for deep learning that can offer a solution when the massive parallelism of GPUs does not provide the necessary performance required by the application. In this article, we present the first hardware architecture for MD-LSTM. We conduct a systematic exploration to analyze a tradeoff between precision and accuracy. We use a challenging dataset for semantic segmentation, namely historical document image binarization from the DIBCO 2017 contest and a well-known MNIST dataset for handwritten digit recognition. Based on our new architecture, we implement FPGA-based accelerators that outperform Nvidia Geforce RTX 2080 Ti with respect to throughput by up to 9.9 and Nvidia Jetson AGX Xavier with respect to energy efficiency by up to 48 . Our accelerators achieve higher throughput, energy efficiency, and resource efficiency than FPGA-based implementations of convolutional neural networks (CNNs) for semantic segmentation tasks. For the handwritten digit recognition task, our FPGA implementations provide higher accuracy and can be considered as a solution when accuracy is a priority. Furthermore, they outperform earlier FPGA implementations of one-dimensional LSTMs with respect to throughput, energy efficiency, and resource efficiency.

2022 ◽  
Vol 193 ◽  
pp. 106653
Hejun Wei ◽  
Enyong Xu ◽  
Jinlai Zhang ◽  
Yanmei Meng ◽  
Jin Wei ◽  

2022 ◽  
Vol 122 ◽  
pp. 108290
Quan Zhou ◽  
Xiaofu Wu ◽  
Suofei Zhang ◽  
Bin Kang ◽  
Zongyuan Ge ◽  

10.29007/r6cd ◽  
2022 ◽  
Hoang Nhut Huynh ◽  
My Duyen Nguyen ◽  
Thai Hong Truong ◽  
Quoc Tuan Nguyen Diep ◽  
Anh Tu Tran ◽  

Segmentation is one of the most common methods for analyzing and processing medical images, assisting doctors in making accurate diagnoses by providing detailed information about the required body part. However, segmenting medical images presents a number of challenges, including the need for medical professionals to be trained, the fact that it is time-consuming and prone to errors. As a result, it appears that an automated medical image segmentation system is required. Deep learning algorithms have recently demonstrated superior performance for segmentation tasks, particularly semantic segmentation networks that provide a pixel-level understanding of images. U- Net for image segmentation is one of the modern complex networks in the field of medical imaging; several segmentation networks have been built on its foundation with the advancements of Recurrent Residual convolutional units and the construction of recurrent residual convolutional neural network based on U-Net (R2U-Net). R2U-Net is used to perform trachea and bronchial segmentation on a dataset of 36,000 images. With a variety of experiments, the proposed segmentation resulted in a dice-coefficient of 0.8394 on the test dataset. Finally, a number of research issues are raised, indicating the need for future improvements.

Sign in / Sign up

Export Citation Format

Share Document