computer vision applications
Recently Published Documents


TOTAL DOCUMENTS

297
(FIVE YEARS 156)

H-INDEX

16
(FIVE YEARS 8)

2022 ◽  
Vol 16 (4) ◽  
pp. 1-21
Author(s):  
Honghui Xu ◽  
Zhipeng Cai ◽  
Wei Li

Multi-label image recognition has been an indispensable fundamental component for many real computer vision applications. However, a severe threat of privacy leakage in multi-label image recognition has been overlooked by existing studies. To fill this gap, two privacy-preserving models, Privacy-Preserving Multi-label Graph Convolutional Networks (P2-ML-GCN) and Robust P2-ML-GCN (RP2-ML-GCN), are developed in this article, where differential privacy mechanism is implemented on the model’s outputs so as to defend black-box attack and avoid large aggregated noise simultaneously. In particular, a regularization term is exploited in the loss function of RP2-ML-GCN to increase the model prediction accuracy and robustness. After that, a proper differential privacy mechanism is designed with the intention of decreasing the bias of loss function in P2-ML-GCN and increasing prediction accuracy. Besides, we analyze that a bounded global sensitivity can mitigate excessive noise’s side effect and obtain a performance improvement for multi-label image recognition in our models. Theoretical proof shows that our two models can guarantee differential privacy for model’s outputs, weights and input features while preserving model robustness. Finally, comprehensive experiments are conducted to validate the advantages of our proposed models, including the implementation of differential privacy on model’s outputs, the incorporation of regularization term into loss function, and the adoption of bounded global sensitivity for multi-label image recognition.


Information ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 9
Author(s):  
Ulrike Faltings ◽  
Tobias Bettinger ◽  
Swen Barth ◽  
Michael Schäfer

Collecting and labeling of good balanced training data are usually very difficult and challenging under real conditions. In addition to classic modeling methods, Generative Adversarial Networks (GANs) offer a powerful possibility to generate synthetic training data. In this paper, we evaluate the hybrid usage of real-life and generated synthetic training data in different fractions and the effect on model performance. We found that a usage of up to 75% synthetic training data can compensate for both time-consuming and costly manual annotation while the model performance in our Deep Learning (DL) use case stays in the same range compared to a 100% share in hand-annotated real images. Using synthetic training data specifically tailored to induce a balanced dataset, special care can be taken concerning events that happen only on rare occasions and a prompt industrial application of ML models can be executed without too much delay, making these feasible and economically attractive for a wide scope of industrial applications in process and manufacturing industries. Hence, the main outcome of this paper is that our methodology can help to leverage the implementation of many different industrial Machine Learning and Computer Vision applications by making them economically maintainable. It can be concluded that a multitude of industrial ML use cases that require large and balanced training data containing all information that is relevant for the target model can be solved in the future following the findings that are presented in this study.


2021 ◽  
Author(s):  
Solvi Thrastarson ◽  
Robert Torfason ◽  
Sara Klaasen ◽  
Patrick Paitz ◽  
Yesim CUBUK SABUNCU ◽  
...  

2021 ◽  
pp. 1-14
Author(s):  
Prathibha Varghese ◽  
G. Arockia Selva Saroja

Nature-inspired computing has been a real source of motivation for the development of many meta-heuristic algorithms. The biological optic system can be patterned as a cascade of sub-filters from the photoreceptors over the ganglion cells in the fovea to some simple cells in the visual cortex. This spark has inspired many researchers to examine the biological retina in order to learn more about information processing capabilities. The photoreceptor cones and rods in the human fovea resemble hexagon more than a rectangular structure. However, the hexagonal meshes provide higher packing density, consistent neighborhood connectivity, and better angular correction compared to the rectilinear square mesh. In this paper, a novel 2-D interpolation hexagonal lattice conversion algorithm has been proposed to develop an efficient hexagonal mesh framework for computer vision applications. The proposed algorithm comprises effective pseudo-hexagonal structures which guarantee to keep align with our human visual system. It provides the hexagonal simulated images to visually verify without using any hexagonal capture or display device. The simulation results manifest that the proposed algorithm achieves a higher Peak Signal-to-Noise Ratio of 98.45 and offers a high-resolution image with a lesser mean square error of 0.59.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Qinglin Cao ◽  
Letu Qingge ◽  
Pei Yang

Image thresholding is a widely used technology for a lot of computer vision applications, and among various global thresholding algorithms, Otsu-based approaches are very popular due to their simplicity and effectiveness. While the usage of Otsu-based thresholding methods is well discussed, the performance analyses of these methods are rather limited. In this paper, we first review nine Otsu-based approaches and categorize them based on their objective functions, preprocessing, and postprocessing strategies. Second, we conduct several experiments to analyze the model characteristics using different scene parameters both on synthetic images and real-world cell images. We put more attention to examine the variance of foreground object and the effect of the distance between mean values of foreground and background. Third, we explore the robustness of algorithms by introducing two typical kinds of noises under different intensities and compare the running time of each method. Experimental results show that NVE, WOV, and Xing’s methods are more robust to the distance of mean values of foreground and background. The large foreground variance will cause a larger threshold value. Experiments on cell images show that foreground miss detection becomes serious when the intensities of foreground pixels change drastically. We conclude that almost all algorithms are significantly affected by Salt&Pepper and Gaussian noises. Interestingly, we find that ME increases almost linearly with the intensity of Salt&Pepper noise. In terms of algorithms’ time cost, methods with no preprocessing and postprocessing steps have more advantages. All these findings can serve as a guideline for image thresholding when using Otsu-based thresholding approaches.


Information ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 431
Author(s):  
Roberto G. Pacheco ◽  
Kaylani Bochie ◽  
Mateus S. Gilbert ◽  
Rodrigo S. Couto ◽  
Miguel Elias M. Campista

In computer vision applications, mobile devices can transfer the inference of Convolutional Neural Networks (CNNs) to the cloud due to their computational restrictions. Nevertheless, besides introducing more network load concerning the cloud, this approach can make unfeasible applications that require low latency. A possible solution is to use CNNs with early exits at the network edge. These CNNs can pre-classify part of the samples in the intermediate layers based on a confidence criterion. Hence, the device sends to the cloud only samples that have not been satisfactorily classified. This work evaluates the performance of these CNNs at the computational edge, considering an object detection application. For this, we employ a MobiletNetV2 with early exits. The experiments show that the early classification can reduce the data load and the inference time without imposing losses to the application performance.


Author(s):  
Guanglong Liao ◽  
Zhongjie Zhu ◽  
Yongqiang Bai ◽  
Tingna Liu ◽  
Zhibo Xie

AbstractText detection is a key technique and plays an important role in computer vision applications, but efficient and precise text detection is still challenging. In this paper, an efficient scene text detection scheme is proposed based on the Progressive Scale Expansion Network (PSENet). A Mixed Pooling Module (MPM) is designed to effectively capture the dependence of text information at different distances, where different pooling operations are employed to better extract information of text shape. The backbone network is optimized by combining two extensions of the Residual Network (ResNet), i.e., ResNeXt and Res2Net, to enhance feature extraction effectiveness. Experimental results show that the precision of our scheme is improved more than by 5% compared with the original PSENet.


2021 ◽  
Vol 11 (5) ◽  
pp. 7730-7737
Author(s):  
L. Loyani ◽  
D. Machuve

With the advances in technology, computer vision applications using deep learning methods like Convolutional Neural Networks (CNNs) have been extensively applied in agriculture. Deploying these CNN models on mobile phones is beneficial in making them accessible to everyone, especially farmers and agricultural extension officers. This paper aims to automate the detection of damages caused by a devastating tomato pest known as Tuta Absoluta. To accomplish this objective, a CNN segmentation model trained on a tomato leaf image dataset is deployed on a smartphone application for early and real-time diagnosis of the pest and effective management at early tomato growth stages. The application can precisely detect and segment the shapes of Tuta Absoluta-infected areas on tomato leaves with a minimum confidence of 70% in 5 seconds only.


Sign in / Sign up

Export Citation Format

Share Document