visual feature
Recently Published Documents


TOTAL DOCUMENTS

550
(FIVE YEARS 139)

H-INDEX

37
(FIVE YEARS 4)

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Zhaoyin Jiang ◽  
Fuyou Zhang ◽  
Laishuang Sun

The current era is an information age, and society is turning to the information age. The image processing technology is also widely used in various fields, and the technology of sports action recognition based on image processing technology can also be said to be appropriate. This article uses a spatial visual feature analysis algorithm to implement it. To implement this algorithm, a series of work such as image collection, feature extraction, and action recognition must be completed first and then implemented through texture functions and other related functions. This algorithm can be used to complete the image-based sports action recognition technology at the minimum time cost. This algorithm can help sportsmen better complete training and standardize movements to a certain extent. As for the development of China’s current sports industry structure, it is also steadily improving. The people’s love for sports is getting stronger and stronger, which also makes the development of China’s sports industry still benefit a lot.


Author(s):  
Nguyen Chi Thanh

Colonoscopy image classification is an image classification task that predicts whether colonoscopy images contain polyps or not. It is an important task input for an automatic polyp detection system. Recently, deep neural networks have been widely used for colonoscopy image classification due to the automatic feature extraction with high accuracy. However, training these networks requires a large amount of manually annotated data, which is expensive to acquire and limited by the available resources of endoscopy specialists. We propose a novel method for training colonoscopy image classification networks by using self-supervised visual feature learning to overcome this challenge. We adapt image denoising as a pretext task for self-supervised visual feature learning from unlabeled colonoscopy image dataset, where noise is added to the image for input, and the original image serves as the label. We use an unlabeled colonoscopy image dataset containing 8,500 images collected from the PACS system of Hospital 103 to train the pretext network. The feature exactor of the pretext network trained in a self-supervised way is used for colonoscopy image classification. A small labeled dataset from the public colonoscopy image dataset Kvasir is used to fine-tune the classifier. Our experiments demonstrate that the proposed self-supervised learning method can achieve a high colonoscopy image classification accuracy better than the classifier trained from scratch, especially at a small training dataset. When a dataset with only annotated 200 images is used for training classifiers, the proposed method improves accuracy from 72,16% to 93,15% compared to the baseline classifier.


2021 ◽  
Author(s):  
Xiaoqiang Chen ◽  
Zhihao Jin ◽  
Qicong Wang ◽  
Wenming Yang ◽  
Qingmin Liao ◽  
...  

Author(s):  
Xiaohui Liu ◽  
Fei Liu ◽  
Yijing Li ◽  
Huizhang Shen ◽  
Eric T.K. Lim ◽  
...  

2021 ◽  
Author(s):  
Mohamed Elkholy ◽  
Mohamed Elsheikh ◽  
Naser El-Sheimy

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Elisa Castaldi ◽  
Antonella Pomè ◽  
Guido Marco Cicchini ◽  
David Burr ◽  
Paola Binda

AbstractAlthough luminance is the main determinant of pupil size, the amplitude of the pupillary light response is also modulated by stimulus appearance and attention. Here we ask whether perceived numerosity modulates the pupillary light response. Participants passively observed arrays of black or white dots of matched physical luminance but different physical or illusory numerosity. In half the patterns, pairs of dots were connected by lines to create dumbbell-like shapes, inducing an illusory underestimation of perceived numerosity; in the other half, connectors were either displaced or removed. Constriction to white arrays and dilation to black were stronger for patterns with higher perceived numerosity, either physical or illusory, with the strength of the pupillary light response scaling with the perceived numerosity of the arrays. Our results show that even without an explicit task, numerosity modulates a simple automatic reflex, suggesting that numerosity is a spontaneously encoded visual feature.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Chunxiao Wang ◽  
Jingjing Zhang ◽  
Wei Jiang ◽  
Shuang Wang

Predicting the emotions evoked in a viewer watching movies is an important research element in affective video content analysis over a wide range of applications. Generally, the emotion of the audience is evoked by the combined effect of the audio-visual messages of the movies. Current research has mainly used rough middle- and high-level audio and visual features to predict experienced emotions, but combining semantic information to refine features to improve emotion prediction results is still not well studied. Therefore, on the premise of considering the time structure and semantic units of a movie, this paper proposes a shot-based audio-visual feature representation method and a long short-term memory (LSTM) model incorporating a temporal attention mechanism for experienced emotion prediction. First, the shot-based audio-visual feature representation defines a method for extracting and combining audio and visual features of each shot clip, and the advanced pretraining models in the related audio-visual tasks are used to extract the audio and visual features with different semantic levels. Then, four components are included in the prediction model: a nonlinear multimodal feature fusion layer, a temporal feature capture layer, a temporal attention layer, and a sentiment prediction layer. This paper focuses on experienced emotion prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method performs significantly better than the state-of-the-art while significantly reducing the number of calculations, with increases in the Pearson correlation coefficient (PCC) from 0.46 to 0.62 for arousal and from 0.18 to 0.34 for valence in experienced emotion.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yongzhi Wang ◽  
Lei Zhao ◽  
Qian Zhang ◽  
Ran Zhou ◽  
Liping Wu ◽  
...  

The method of tactile perception can accurately reflect the contact state by collecting force and torque information, but it is not sensitive to the changes in position and posture between assembly objects. The method of visual perception is very sensitive to changes in pose and posture between assembled objects, but they cannot accurately reflect the contact state, especially since the objects are occluded from each other. The robot will perceive the environment more accurately if visual and tactile perception can be combined. Therefore, this paper proposes the alignment method of combined perception for the peg-in-hole assembly with self-supervised deep reinforcement learning. The agent first observes the environment through visual sensors and then predicts the action of the alignment adjustment based on the visual feature of the contact state. Subsequently, the agent judges the contact state based on the force and torque information collected by the force/torque sensor. And the action of the alignment adjustment is selected according to the contact state and used as a visual prediction label. Whereafter, the network of visual perception performs backpropagation to correct the network weights according to the visual prediction label. Finally, the agent will have learned the alignment skill of combined perception with the increase of iterative training. The robot system is built based on CoppeliaSim for simulation training and testing. The simulation results show that the method of combined perception has higher assembly efficiency than single perception.


Sign in / Sign up

Export Citation Format

Share Document