scholarly journals Binarization of music score with complex background by deep convolutional neural networks

Author(s):  
Minh-Trieu Tran ◽  
Quang-Nhat Vo ◽  
Guee-Sang Lee

AbstractBinarization is an important step for most of document analysis systems. Regarding music score images with a complex background, the existence of background clutters with a variety of shapes and colors creates many challenges for the binarization. This paper presents a model for binarization of the complex background music score images by fusion of deep convolutional neural networks. Our model is directly trained from image regions using pixel values as inputs and the binary ground truth as labels. By utilizing the generalization capability of the residual network backbone and useful feature learning ability of dense layer, the proposed network structures can differentiate foreground pixels from background clutters, minimize the possibility of overfitting phenomenon and thus can deal with complex background noises appearing in the music score images. Comparing to traditional algorithms, binary images generated by our method have a cleaner background and better-preserved strokes. The experiments with captured and synthetic music score images show promising results compared to existing methods.

Author(s):  
Ridha Ilyas Bendjillali ◽  
Mohammed Beladgham ◽  
Khaled Merit ◽  
Abdelmalik Taleb-Ahmed

<p><span>In the last decade, facial recognition techniques are considered the most important fields of research in biometric technology. In this research paper, we present a Face Recognition (FR) system divided into three steps: The Viola-Jones face detection algorithm, facial image enhancement using Modified Contrast Limited Adaptive Histogram Equalization algorithm (M-CLAHE), and feature learning for classification. For learning the features followed by classification we used VGG16, ResNet50 and Inception-v3 Convolutional Neural Networks (CNN) architectures for the proposed system. Our experimental work was performed on the Extended Yale B database and CMU PIE face database. Finally, the comparison with the other methods on both databases shows the robustness and effectiveness of the proposed approach. Where the Inception-v3 architecture has achieved a rate of 99, 44% and 99, 89% respectively.</span></p>


Water ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3412
Author(s):  
Joakim Bruslund Haurum ◽  
Chris H. Bahnsen ◽  
Malte Pedersen ◽  
Thomas B. Moeslund

Sewer pipe inspections are currently conducted by professionals who remotely control a robot from above ground. This expensive and slow approach is prone to human mistakes. Therefore, there is both an economic and scientific interest in automating the inspection process by creating systems able to recognize sewer defects. However, the extent of research put into automatic water level estimation in sewers has been limited despite being a prerequisite for further analysis of the pipe as only sections above the water level can be visually inspected. In this work, we utilize a dataset of still images obtained from over 5000 inspections carried out for three different Danish water utilities companies. This dataset is used for training and testing decision tree methods and convolutional neural networks (CNNs) for automatic water level estimation. We pose the estimation problem as a classification and regression problem, and compare the results of both approaches. Furthermore, we compare the effect of using different inspection standards for labeling the ground truth water level. By treating the problem as a classification task and using the 2015 Danish sewer inspection standard, where water levels are clustered based on visual appearance, we achieve an averaged F1 score of 79.29% using a fine-tuned ResNet-50 CNN. This shows the potential of using CNNs for water level estimation. We believe including temporal and contextual information will improve the results further.


2021 ◽  
pp. 1-13
Author(s):  
Xiang-Min Liu ◽  
Jian Hu ◽  
Deborah Simon Mwakapesa ◽  
Y.A. Nanehkaran ◽  
Yi-Min Mao ◽  
...  

Deep convolutional neural networks (DCNNs), with their complex network structure and powerful feature learning and feature expression capabilities, have been remarkable successes in many large-scale recognition tasks. However, with the expectation of memory overhead and response time, along with the increasing scale of data, DCNN faces three non-rival challenges in a big data environment: excessive network parameters, slow convergence, and inefficient parallelism. To tackle these three problems, this paper develops a deep convolutional neural networks optimization algorithm (PDCNNO) in the MapReduce framework. The proposed method first pruned the network to obtain a compressed network in order to effectively reduce redundant parameters. Next, a conjugate gradient method based on modified secant equation (CGMSE) is developed in the Map phase to further accelerate the convergence of the network. Finally, a load balancing strategy based on regulate load rate (LBRLA) is proposed in the Reduce phase to quickly achieve equal grouping of data and thus improving the parallel performance of the system. We compared the PDCNNO algorithm with other algorithms on three datasets, including SVHN, EMNIST Digits, and ISLVRC2012. The experimental results show that our algorithm not only reduces the space and time overhead of network training but also obtains a well-performing speed-up ratio in a big data environment.


2021 ◽  
Vol 20 (1) ◽  
pp. 117-136
Author(s):  
Cheng Yang ◽  
Xiang Yu ◽  
Arun Kumar ◽  
G.G. Md. Nawaz Ali ◽  
Peter Han Joo Chong ◽  
...  

This paper introduces a method to use deep convolutional neural networks (CNNs) to automatically replace advertisement (AD) photo on social (or self-media) videos and provides the suitable evaluation method to compare different CNNs. An AD photo can replace a picture inside a video. However, if a human being occludes the replaced picture in the original video, the newly pasted AD photo will block the human occluded part. The deep learning algorithm is implemented to segment the human being from the video. The segmented human pixels are then pasted back to the occluded area, so that the AD photo replacement becomes natural and perfect appearance in the video. This process requires the predicted occlusion edge to be closed to the ground truth occlusion edge, so that the AD photo can be occluded naturally. Therefore, this research introduces a curve fitting method to measure the predicted occlusion edge’s error. By using this method, three CNN methods are applied and compared for the AD replacement. They are mask of regions convolutional neural network (Mask RCNN), a recurrent network for video object segmentation (ROVS) and DeeplabV3. The experimental results show the comparative segmentation accuracy of the different models and DeeplabV3 shows the best performance.


Author(s):  
Yuan Yuan ◽  
Zhitong Xiong ◽  
Qi Wang

RGB image classification has achieved significant performance improvement with the resurge of deep convolutional neural networks. However, mono-modal deep models for RGB image still have several limitations when applied to RGB-D scene recognition. 1) Images for scene classification usually contain more than one typical object with flexible spatial distribution, so the object-level local features should also be considered in addition to global scene representation. 2) Multi-modal features in RGB-D scene classification are still under-utilized. Simply combining these modal-specific features suffers from the semantic gaps between different modalities. 3) Most existing methods neglect the complex relationships among multiple modality features. Considering these limitations, this paper proposes an adaptive crossmodal (ACM) feature learning framework based on graph convolutional neural networks for RGB-D scene recognition. In order to make better use of the modal-specific cues, this approach mines the intra-modality relationships among the selected local features from one modality. To leverage the multi-modal knowledge more effectively, the proposed approach models the inter-modality relationships between two modalities through the cross-modal graph (CMG). We evaluate the proposed method on two public RGB-D scene classification datasets: SUN-RGBD and NYUD V2, and the proposed method achieves state-of-the-art performance.


Sign in / Sign up

Export Citation Format

Share Document