computer vision Latest Research Papers

The Generative Models have gained considerable attention in unsupervised learning via a new and practical framework called Generative Adversarial Networks (GAN) due to their outstanding data generation capability. Many GAN models have been proposed, and several practical applications have emerged in various domains of computer vision and machine learning. Despite GANs excellent success, there are still obstacles to stable training. The problems are Nash equilibrium, internal covariate shift, mode collapse, vanishing gradient, and lack of proper evaluation metrics. Therefore, stable training is a crucial issue in different applications for the success of GANs. Herein, we survey several training solutions proposed by different researchers to stabilize GAN training. We discuss (I) the original GAN model and its modified versions, (II) a detailed analysis of various GAN applications in different domains, and (III) a detailed study about the various GAN training obstacles as well as training solutions. Finally, we reveal several issues as well as research outlines to the topic.

Download Full-text

Efficient Channel Attention Based Encoder–Decoder Approach for Image Captioning in Hindi

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3483597 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Santosh Kumar Mishra ◽

Gaurav Rai ◽

Sriparna Saha ◽

Pushpak Bhattacharyya

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Image Understanding ◽

Attention Mechanism ◽

Image Captioning ◽

Textual Description ◽

Hindi Language

Image captioning refers to the process of generating a textual description that describes objects and activities present in a given image. It connects two fields of artificial intelligence, computer vision, and natural language processing. Computer vision and natural language processing deal with image understanding and language modeling, respectively. In the existing literature, most of the works have been carried out for image captioning in the English language. This article presents a novel method for image captioning in the Hindi language using encoder–decoder based deep learning architecture with efficient channel attention. The key contribution of this work is the deployment of an efficient channel attention mechanism with bahdanau attention and a gated recurrent unit for developing an image captioning model in the Hindi language. Color images usually consist of three channels, namely red, green, and blue. The channel attention mechanism focuses on an image’s important channel while performing the convolution, which is basically to assign higher importance to specific channels over others. The channel attention mechanism has been shown to have great potential for improving the efficiency of deep convolution neural networks (CNNs). The proposed encoder–decoder architecture utilizes the recently introduced ECA-NET CNN to integrate the channel attention mechanism. Hindi is the fourth most spoken language globally, widely spoken in India and South Asia; it is India’s official language. By translating the well-known MSCOCO dataset from English to Hindi, a dataset for image captioning in Hindi is manually created. The efficiency of the proposed method is compared with other baselines in terms of Bilingual Evaluation Understudy (BLEU) scores, and the results obtained illustrate that the method proposed outperforms other baselines. The proposed method has attained improvements of 0.59%, 2.51%, 4.38%, and 3.30% in terms of BLEU-1, BLEU-2, BLEU-3, and BLEU-4 scores, respectively, with respect to the state-of-the-art. Qualities of the generated captions are further assessed manually in terms of adequacy and fluency to illustrate the proposed method’s efficacy.

Download Full-text

Feature Matching-based Approaches to Improve the Robustness of Android Visual GUI Testing

ACM Transactions on Software Engineering and Methodology ◽

10.1145/3477427 ◽

2022 ◽

Vol 31 (2) ◽

pp. 1-32

Author(s):

Luca Ardito ◽

Andrea Bottino ◽

Riccardo Coppola ◽

Fabrizio Lamberti ◽

Francesco Manigrasso ◽

...

Keyword(s):

Computer Vision ◽

Feature Matching ◽

State Of The Art ◽

Design Of Algorithms ◽

Computational Burden ◽

Domain Specific ◽

Gui Testing ◽

Wide Range ◽

Full Screen ◽

Feature Based

In automated Visual GUI Testing (VGT) for Android devices, the available tools often suffer from low robustness to mobile fragmentation, leading to incorrect results when running the same tests on different devices. To soften these issues, we evaluate two feature matching-based approaches for widget detection in VGT scripts, which use, respectively, the complete full-screen snapshot of the application ( Fullscreen ) and the cropped images of its widgets ( Cropped ) as visual locators to match on emulated devices. Our analysis includes validating the portability of different feature-based visual locators over various apps and devices and evaluating their robustness in terms of cross-device portability and correctly executed interactions. We assessed our results through a comparison with two state-of-the-art tools, EyeAutomate and Sikuli. Despite a limited increase in the computational burden, our Fullscreen approach outperformed state-of-the-art tools in terms of correctly identified locators across a wide range of devices and led to a 30% increase in passing tests. Our work shows that VGT tools’ dependability can be improved by bridging the testing and computer vision communities. This connection enables the design of algorithms targeted to domain-specific needs and thus inherently more usable and robust.

Download Full-text

Computer vision to recognize construction waste compositions: A novel boundary-aware transformer (BAT) model

Journal of Environmental Management ◽

10.1016/j.jenvman.2021.114405 ◽

2022 ◽

Vol 305 ◽

pp. 114405

Author(s):

Zhiming Dong ◽

Junjie Chen ◽

Weisheng Lu

Keyword(s):

Computer Vision ◽

Construction Waste

Download Full-text

Computer Vision for Autonomous UAV Flight Safety: An Overview and a Vision-based Safe Landing Pipeline Example

ACM Computing Surveys ◽

10.1145/3472288 ◽

2022 ◽

Vol 54 (9) ◽

pp. 1-37

Author(s):

Efstratios Kakaletsis ◽

Charalampos Symeonidis ◽

Maria Tzelepi ◽

Ioannis Mademlis ◽

Anastasios Tefas ◽

...

Keyword(s):

Computer Vision ◽

Point Of View ◽

Wildlife Monitoring ◽

Flight Safety ◽

Crucial Issue ◽

Military Applications ◽

Crowd Monitoring ◽

Uav Navigation ◽

Landing Control ◽

Broad Interest

Recent years have seen an unprecedented spread of Unmanned Aerial Vehicles (UAVs, or “drones”), which are highly useful for both civilian and military applications. Flight safety is a crucial issue in UAV navigation, having to ensure accurate compliance with recently legislated rules and regulations. The emerging use of autonomous drones and UAV swarms raises additional issues, making it necessary to transfuse safety- and regulations-awareness to relevant algorithms and architectures. Computer vision plays a pivotal role in such autonomous functionalities. Although the main aspects of autonomous UAV technologies (e.g., path planning, navigation control, landing control, mapping and localization, target detection/tracking) are already mature and well-covered, ensuring safe flying in the vicinity of crowds, avoidance of passing over persons, or guaranteed emergency landing capabilities in case of malfunctions, are generally treated as an afterthought when designing autonomous UAV platforms for unstructured environments. This fact is reflected in the fragmentary coverage of the above issues in current literature. This overview attempts to remedy this situation, from the point of view of computer vision. It examines the field from multiple aspects, including regulations across the world and relevant current technologies. Finally, since very few attempts have been made so far towards a complete UAV safety flight and landing pipeline, an example computer vision-based UAV flight safety pipeline is introduced, taking into account all issues present in current autonomous drones. The content is relevant to any kind of autonomous drone flight (e.g., for movie/TV production, news-gathering, search and rescue, surveillance, inspection, mapping, wildlife monitoring, crowd monitoring/management), making this a topic of broad interest.

Download Full-text

Automatic recognition and classification of microseismic waveforms based on computer vision

Tunnelling and Underground Space Technology ◽

10.1016/j.tust.2021.104327 ◽

2022 ◽

Vol 121 ◽

pp. 104327

Author(s):

Jiaming Li ◽

Shibin Tang ◽

Kunyao Li ◽

Shichao Zhang ◽

Liexian Tang ◽

...

Keyword(s):

Computer Vision ◽

Automatic Recognition

Download Full-text

Promises and pitfalls of using computer vision to make inferences about landscape preferences: Evidence from an urban-proximate park system

Landscape and Urban Planning ◽

10.1016/j.landurbplan.2021.104315 ◽

2022 ◽

Vol 219 ◽

pp. 104315

Author(s):

Emily J. Wilkins ◽

Derek Van Berkel ◽

Hongchao Zhang ◽

Monica A. Dorning ◽

Scott M. Beck ◽

...

Keyword(s):

Computer Vision

Download Full-text

Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap

ACM Computing Surveys ◽

10.1145/3473330 ◽

2022 ◽

Vol 54 (9) ◽

pp. 1-37

Author(s):

Lingxi Xie ◽

Xin Chen ◽

Kaifeng Bi ◽

Longhui Wei ◽

Yuhui Xu ◽

...

Keyword(s):

Computer Vision ◽

Literature Review ◽

Search Efficiency ◽

Search Methods ◽

Future Directions ◽

Advantages And Disadvantages ◽

Neural Architecture ◽

The Future

Neural architecture search (NAS) has attracted increasing attention. In recent years, individual search methods have been replaced by weight-sharing search methods for higher search efficiency, but the latter methods often suffer lower instability. This article provides a literature review on these methods and owes this issue to the optimization gap . From this perspective, we summarize existing approaches into several categories according to their efforts in bridging the gap, and we analyze both advantages and disadvantages of these methodologies. Finally, we share our opinions on the future directions of NAS and AutoML. Due to the expertise of the authors, this article mainly focuses on the application of NAS to computer vision problems.

Download Full-text

Assessing surface drainage conditions at the street and neighborhood scale: A computer vision and flow direction method applied to lidar data

Computers Environment and Urban Systems ◽

10.1016/j.compenvurbsys.2021.101755 ◽

2022 ◽

Vol 93 ◽

pp. 101755

Author(s):

Cheng-Chun Lee ◽

Nasir G. Gharaibeh

Keyword(s):

Computer Vision ◽

Flow Direction ◽

Lidar Data ◽

Surface Drainage ◽

Neighborhood Scale

Download Full-text

computer vision
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

2D Computer Vision

A Survey on Generative Adversarial Networks: Variants, Applications, and Training

Efficient Channel Attention Based Encoder–Decoder Approach for Image Captioning in Hindi

Feature Matching-based Approaches to Improve the Robustness of Android Visual GUI Testing

Computer vision to recognize construction waste compositions: A novel boundary-aware transformer (BAT) model

Computer Vision for Autonomous UAV Flight Safety: An Overview and a Vision-based Safe Landing Pipeline Example

Automatic recognition and classification of microseismic waveforms based on computer vision

Promises and pitfalls of using computer vision to make inferences about landscape preferences: Evidence from an urban-proximate park system

Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap

Assessing surface drainage conditions at the street and neighborhood scale: A computer vision and flow direction method applied to lidar data

Export Citation Format

computer visionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

2D Computer Vision

A Survey on Generative Adversarial Networks: Variants, Applications, and Training

Efficient Channel Attention Based Encoder–Decoder Approach for Image Captioning in Hindi

Feature Matching-based Approaches to Improve the Robustness of Android Visual GUI Testing

Computer vision to recognize construction waste compositions: A novel boundary-aware transformer (BAT) model

Computer Vision for Autonomous UAV Flight Safety: An Overview and a Vision-based Safe Landing Pipeline Example

Automatic recognition and classification of microseismic waveforms based on computer vision

Promises and pitfalls of using computer vision to make inferences about landscape preferences: Evidence from an urban-proximate park system

Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap

Assessing surface drainage conditions at the street and neighborhood scale: A computer vision and flow direction method applied to lidar data

computer vision
Recently Published Documents