Patch-CNN: Deep learning for logo detection and brand recognition

2021 ◽  
pp. 1-14
Author(s):  
Waqas Yousaf ◽  
Arif Umar ◽  
Syed Hamad Shirazi ◽  
Zakir Khan ◽  
Imran Razzak ◽  
...  

Automatic logo detection and recognition is significantly growing due to the increasing requirements of intelligent documents analysis and retrieval. The main problem to logo detection is intra-class variation, which is generated by the variation in image quality and degradation. The problem of misclassification also occurs while having tiny logo in large image with other objects. To address this problem, Patch-CNN is proposed for logo recognition which uses small patches of logos for training to solve the problem of misclassification. The classification is accomplished by dividing the logo images into small patches and threshold is applied to drop no logo area according to ground truth. The architectures of AlexNet and ResNet are also used for logo detection. We propose a segmentation free architecture for the logo detection and recognition. In literature, the concept of region proposal generation is used to solve logo detection, but these techniques suffer in case of tiny logos. Proposed CNN is especially designed for extracting the detailed features from logo patches. So far, the technique has attained accuracy equals to 0.9901 with acceptable training and testing loss on the dataset used in this work.

2021 ◽  
Vol 2021 (1) ◽  
pp. 21-26
Author(s):  
Abderrezzaq Sendjasni ◽  
Mohamed-Chaker Larabi ◽  
Faouzi Alaya Cheikh

360-degree Image quality assessment (IQA) is facing the major challenge of lack of ground-truth databases. This problem is accentuated for deep learning based approaches where the performances are as good as the available data. In this context, only two databases are used to train and validate deep learning-based IQA models. To compensate this lack, a dataaugmentation technique is investigated in this paper. We use visual scan-path to increase the learning examples from existing training data. Multiple scan-paths are predicted to account for the diversity of human observers. These scan-paths are then used to select viewports from the spherical representation. The results of the data-augmentation training scheme showed an improvement over not using it. We also try to answer the question of using the MOS obtained for the 360-degree image as the quality anchor for the whole set of extracted viewports in comparison to 2D blind quality metrics. The comparison showed the superiority of using the MOS when adopting a patch-based learning.


2021 ◽  
Vol 11 (1) ◽  
pp. 6724-6729
Author(s):  
S. Sahel ◽  
M. Alsahafi ◽  
M. Alghamdi ◽  
T. Alsubait

Logo detection in images and videos is considered a key task for various applications, such as vehicle logo detection for traffic-monitoring systems, copyright infringement detection, and contextual content placement. The main contribution of this work is the application of emerging deep learning techniques to perform brand and logo recognition tasks through the use of multiple modern convolutional neural network models. In this work, pre-trained object detection models are utilized in order to enhance the performance of logo detection tasks when only a portion of labeled training images taken in truthful context is obtainable, evading wide manual classification costs. Superior logo detection results were obtained. In this study, the FlickrLogos-32 dataset was used, which is a common public dataset for logo detection and brand recognition from real-world product images. For model evaluation, the efficiency of creating the model and of its accuracy was considered.


2019 ◽  
Vol 2019 (1) ◽  
pp. 360-368
Author(s):  
Mekides Assefa Abebe ◽  
Jon Yngve Hardeberg

Different whiteboard image degradations highly reduce the legibility of pen-stroke content as well as the overall quality of the images. Consequently, different researchers addressed the problem through different image enhancement techniques. Most of the state-of-the-art approaches applied common image processing techniques such as background foreground segmentation, text extraction, contrast and color enhancements and white balancing. However, such types of conventional enhancement methods are incapable of recovering severely degraded pen-stroke contents and produce artifacts in the presence of complex pen-stroke illustrations. In order to surmount such problems, the authors have proposed a deep learning based solution. They have contributed a new whiteboard image data set and adopted two deep convolutional neural network architectures for whiteboard image quality enhancement applications. Their different evaluations of the trained models demonstrated their superior performances over the conventional methods.


2010 ◽  
Vol 30 (8) ◽  
pp. 2244-2246
Author(s):  
Shao-qing MO ◽  
Zheng-guang LIU ◽  
Jun ZHANG

Author(s):  
Luuk J. Oostveen ◽  
Frederick J. A. Meijer ◽  
Frank de Lange ◽  
Ewoud J. Smit ◽  
Sjoert A. Pegge ◽  
...  

Abstract Objectives To evaluate image quality and reconstruction times of a commercial deep learning reconstruction algorithm (DLR) compared to hybrid-iterative reconstruction (Hybrid-IR) and model-based iterative reconstruction (MBIR) algorithms for cerebral non-contrast CT (NCCT). Methods Cerebral NCCT acquisitions of 50 consecutive patients were reconstructed using DLR, Hybrid-IR and MBIR with a clinical CT system. Image quality, in terms of six subjective characteristics (noise, sharpness, grey-white matter differentiation, artefacts, natural appearance and overall image quality), was scored by five observers. As objective metrics of image quality, the noise magnitude and signal-difference-to-noise ratio (SDNR) of the grey and white matter were calculated. Mean values for the image quality characteristics scored by the observers were estimated using a general linear model to account for multiple readers. The estimated means for the reconstruction methods were pairwise compared. Calculated measures were compared using paired t tests. Results For all image quality characteristics, DLR images were scored significantly higher than MBIR images. Compared to Hybrid-IR, perceived noise and grey-white matter differentiation were better with DLR, while no difference was detected for other image quality characteristics. Noise magnitude was lower for DLR compared to Hybrid-IR and MBIR (5.6, 6.4 and 6.2, respectively) and SDNR higher (2.4, 1.9 and 2.0, respectively). Reconstruction times were 27 s, 44 s and 176 s for Hybrid-IR, DLR and MBIR respectively. Conclusions With a slight increase in reconstruction time, DLR results in lower noise and improved tissue differentiation compared to Hybrid-IR. Image quality of MBIR is significantly lower compared to DLR with much longer reconstruction times. Key Points • Deep learning reconstruction of cerebral non-contrast CT results in lower noise and improved tissue differentiation compared to hybrid-iterative reconstruction. • Deep learning reconstruction of cerebral non-contrast CT results in better image quality in all aspects evaluated compared to model-based iterative reconstruction. • Deep learning reconstruction only needs a slight increase in reconstruction time compared to hybrid-iterative reconstruction, while model-based iterative reconstruction requires considerably longer processing time.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Christian Crouzet ◽  
Gwangjin Jeong ◽  
Rachel H. Chae ◽  
Krystal T. LoPresti ◽  
Cody E. Dunn ◽  
...  

AbstractCerebral microhemorrhages (CMHs) are associated with cerebrovascular disease, cognitive impairment, and normal aging. One method to study CMHs is to analyze histological sections (5–40 μm) stained with Prussian blue. Currently, users manually and subjectively identify and quantify Prussian blue-stained regions of interest, which is prone to inter-individual variability and can lead to significant delays in data analysis. To improve this labor-intensive process, we developed and compared three digital pathology approaches to identify and quantify CMHs from Prussian blue-stained brain sections: (1) ratiometric analysis of RGB pixel values, (2) phasor analysis of RGB images, and (3) deep learning using a mask region-based convolutional neural network. We applied these approaches to a preclinical mouse model of inflammation-induced CMHs. One-hundred CMHs were imaged using a 20 × objective and RGB color camera. To determine the ground truth, four users independently annotated Prussian blue-labeled CMHs. The deep learning and ratiometric approaches performed better than the phasor analysis approach compared to the ground truth. The deep learning approach had the most precision of the three methods. The ratiometric approach has the most versatility and maintained accuracy, albeit with less precision. Our data suggest that implementing these methods to analyze CMH images can drastically increase the processing speed while maintaining precision and accuracy.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Danuta M. Sampson ◽  
David Alonso-Caneiro ◽  
Avenell L. Chew ◽  
Jonathan La ◽  
Danial Roshandel ◽  
...  

AbstractAdaptive optics flood illumination ophthalmoscopy (AO-FIO) is an established imaging tool in the investigation of retinal diseases. However, the clinical interpretation of AO-FIO images can be challenging due to varied image quality. Therefore, image quality assessment is essential before interpretation. An image assessment tool will also assist further work on improving the image quality, either during acquisition or post processing. In this paper, we describe, validate and compare two automated image quality assessment methods; the energy of Laplacian focus operator (LAPE; not commonly used but easily implemented) and convolutional neural network (CNN; effective but more complex approach). We also evaluate the effects of subject age, axial length, refractive error, fixation stability, disease status and retinal location on AO-FIO image quality. Based on analysis of 10,250 images of 50 × 50 μm size, at 41 retinal locations, from 50 subjects we demonstrate that CNN slightly outperforms LAPE in image quality assessment. CNN achieves accuracy of 89%, whereas LAPE metric achieves 73% and 80% (for a linear regression and random forest multiclass classifier methods, respectively) compared to ground truth. Furthermore, the retinal location, age and disease are factors that can influence the likelihood of poor image quality.


Electronics ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 1136
Author(s):  
David Augusto Ribeiro ◽  
Juan Casavílca Silva ◽  
Renata Lopes Rosa ◽  
Muhammad Saadi ◽  
Shahid Mumtaz ◽  
...  

Light field (LF) imaging has multi-view properties that help to create many applications that include auto-refocusing, depth estimation and 3D reconstruction of images, which are required particularly for intelligent transportation systems (ITSs). However, cameras can present a limited angular resolution, becoming a bottleneck in vision applications. Thus, there is a challenge to incorporate angular data due to disparities in the LF images. In recent years, different machine learning algorithms have been applied to both image processing and ITS research areas for different purposes. In this work, a Lightweight Deformable Deep Learning Framework is implemented, in which the problem of disparity into LF images is treated. To this end, an angular alignment module and a soft activation function into the Convolutional Neural Network (CNN) are implemented. For performance assessment, the proposed solution is compared with recent state-of-the-art methods using different LF datasets, each one with specific characteristics. Experimental results demonstrated that the proposed solution achieved a better performance than the other methods. The image quality results obtained outperform state-of-the-art LF image reconstruction methods. Furthermore, our model presents a lower computational complexity, decreasing the execution time.


2020 ◽  
Vol 2020 (1) ◽  
Author(s):  
Guangyi Yang ◽  
Xingyu Ding ◽  
Tian Huang ◽  
Kun Cheng ◽  
Weizheng Jin

Abstract Communications industry has remarkably changed with the development of fifth-generation cellular networks. Image, as an indispensable component of communication, has attracted wide attention. Thus, finding a suitable approach to assess image quality is important. Therefore, we propose a deep learning model for image quality assessment (IQA) based on explicit-implicit dual stream network. We use frequency domain features of kurtosis based on wavelet transform to represent explicit features and spatial features extracted by convolutional neural network (CNN) to represent implicit features. Thus, we constructed an explicit-implicit (EI) parallel deep learning model, namely, EI-IQA model. The EI-IQA model is based on the VGGNet that extracts the spatial domain features. On this basis, the number of network layers of VGGNet is reduced by adding the parallel wavelet kurtosis value frequency domain features. Thus, the training parameters and the sample requirements decline. We verified, by cross-validation of different databases, that the wavelet kurtosis feature fusion method based on deep learning has a more complete feature extraction effect and a better generalisation ability. Thus, the method can simulate the human visual perception system better, and subjective feelings become closer to the human eye. The source code about the proposed EI-IQA model is available on github https://github.com/jacob6/EI-IQA.


Sign in / Sign up

Export Citation Format

Share Document