scholarly journals Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3164
Author(s):  
Gayoung Jung ◽  
Jonghun Lee ◽  
Incheol Kim

Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD.

Agriculture ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 651
Author(s):  
Shengyi Zhao ◽  
Yun Peng ◽  
Jizhan Liu ◽  
Shuo Wu

Crop disease diagnosis is of great significance to crop yield and agricultural production. Deep learning methods have become the main research direction to solve the diagnosis of crop diseases. This paper proposed a deep convolutional neural network that integrates an attention mechanism, which can better adapt to the diagnosis of a variety of tomato leaf diseases. The network structure mainly includes residual blocks and attention extraction modules. The model can accurately extract complex features of various diseases. Extensive comparative experiment results show that the proposed model achieves the average identification accuracy of 96.81% on the tomato leaf diseases dataset. It proves that the model has significant advantages in terms of network complexity and real-time performance compared with other models. Moreover, through the model comparison experiment on the grape leaf diseases public dataset, the proposed model also achieves better results, and the average identification accuracy of 99.24%. It is certified that add the attention module can more accurately extract the complex features of a variety of diseases and has fewer parameters. The proposed model provides a high-performance solution for crop diagnosis under the real agricultural environment.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Peizhen Xie ◽  
Ke Zuo ◽  
Jie Liu ◽  
Mingliang Chen ◽  
Shuang Zhao ◽  
...  

At present, deep learning-based medical image diagnosis had achieved high performance in several diseases. However, the black-box nature of the convolutional neural network (CNN) limits their role in diagnosis. In this study, a novel interpretable diagnosis pipeline using the CNN model was proposed. Furthermore, a sizeable melanoma database that contains 841 digital whole-slide images (WSIs) was built to train and evaluate the model. The model achieved strong melanoma classification ability (0.962 areas under the receiver operating characteristic, 0.887 sensitivity, and 0.925 specificity). Moreover, the proposed model outperformed the existing schemes in terms of accuracy that is 20 pathologists (0.933 vs 0.732 accuracy). Finally, the gradient-weighted class activation mapping (Grad-CAM) method was used to show the inner logic of the proposed model and its feasibility to improve diagnosis process in healthcare. The mechanism of feature heat maps which is visualized through a saliency mapping has demonstrated that features learned or extracted by the proposed model are compatible with the accepted pathological features. Conclusively, the proposed model provides a rapid and accurate diagnosis by locating the distinctive features of melanoma to build doctors’ trust in the CNNs’ diagnosis results.


2001 ◽  
Vol 11 (01) ◽  
pp. 11-22 ◽  
Author(s):  
GUILHERME DE A. BARRETO ◽  
ALUIZIO F. R. ARAÚJO

An unsupervised neural network is proposed to learn and recall complex robot trajectories. Two cases are considered: (i) A single trajectory in which a particular arm configuration (state) may occur more than once, and (ii) trajectories sharing states with each other. Ambiguities occur in both cases during recall of such trajectories. The proposed model consists of two groups of synaptic weights trained by competitive and Hebbian learning laws. They are responsible for encoding spatial and temporal features of the input sequences, respectively. Three mechanisms allow the network to deal with repeated or shared states: local and global context units, neurons disabled from learning, and redundancy. The network reproduces the current and the next state of the learned sequences and is able to resolve ambiguities. The model was simulated over various sets of robot trajectories in order to evaluate learning and recall, trajectory sampling effects and robustness.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Muhammad Haroon ◽  
Junaid Baber ◽  
Ihsan Ullah ◽  
Sher Muhammad Daudpota ◽  
Maheen Bakhtyar ◽  
...  

Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key frames. Key frames are further represented by BoVW feature vectors which are quite short and compact compared to classical BoVW model implementations. Two variations of BoVW model are used: (1) classical BoVW model and (2) Vector of Linearly Aggregated Descriptors (VLAD) which is an extension of classical BoVW model. The similarity of the shots is computed by the distances between their key frames feature vectors within the sliding window of length L, rather comparing each shot with very long lists of shots which has been previously practiced, and the value of L is 4. Experiments on cinematic and drama videos show the effectiveness of our proposed framework. The BoVW is 25000-dimensional vector and VLAD is only 2048-dimensional vector in the proposed model. The BoVW achieves 0.90 segmentation accuracy, whereas VLAD achieves 0.83.


2021 ◽  
Vol 11 (2) ◽  
pp. 826
Author(s):  
Seongyong Kim ◽  
Tae Hyeon Jeon ◽  
Ilsun Rhiu ◽  
Jinhyun Ahn ◽  
Dong-Hyuk Im

Over the last several years, in parallel with the general global advancement in mobile technology and a rise in social media network content consumption, multimedia content production and reproduction has increased exponentially. Therefore, enabled by the rapid recent advancements in deep learning technology, research on scene graph generation is being actively conducted to more efficiently search for and classify images desired by users within a large amount of content. This approach lets users accurately find images they are searching for by expressing meaningful information on image content as nodes and edges of a graph. In this study, we propose a scene graph generation method based on using the Resource Description Framework (RDF) model to clarify semantic relations. Furthermore, we also use convolutional neural network (CNN) and recurrent neural network (RNN) deep learning models to generate a scene graph expressed in a controlled vocabulary of the RDF model to understand the relations between image object tags. Finally, we experimentally demonstrate through testing that our proposed technique can express semantic content more effectively than existing approaches.


2021 ◽  
Author(s):  
Yu Huang ◽  
James Li ◽  
Min Shi ◽  
Hanqi Zhuang ◽  
Yufei Tang ◽  
...  

Abstract Ocean current, fluid mechanics, and many other physical systems with spatio-temporal dynamics are essential components of the universe. One key characteristic of such systems is that they can be represented by certain physics laws, such as ordinary/partial differential equations (ODEs/PDEs), irrespective of time or location. Physics-informed machine learning has recently emerged to learn physics from data for accurate prediction, but they often lack a mechanism to leverage localized spatial and temporal correlation or rely on hard-coded physics parameters. In this paper, we advocate a physics-coupled neural network model to learn parameters governing the physics of the system, and further couple the learned physics to assist the learning of recurring dynamics. Here a spatio-temporal physics-coupled neural network (ST-PCNN) model is proposed to achieve three goals: (1) learning the underlying physics parameters, (2) transition of local information between spatio-temporal regions, and (3) forecasting future values for the dynamical system. The physics-coupled learning ensures that the proposed model can be tremendously improved by using learned physics parameters, and can achieve useful long-range forecasting (e.g., more than two weeks). Experiments using simulated wave propagation and field-collected ocean current data validate that ST-PCNN outperforms typical deep learning models and existing physics-informed models.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7987
Author(s):  
Naresh K. Trivedi ◽  
Vinay Gautam ◽  
Abhineet Anand ◽  
Hani Moaiteq Aljahdali ◽  
Santos Gracia Villar ◽  
...  

Tomato is one of the most essential and consumable crops in the world. Tomatoes differ in quantity depending on how they are fertilized. Leaf disease is the primary factor impacting the amount and quality of crop yield. As a result, it is critical to diagnose and classify these disorders appropriately. Different kinds of diseases influence the production of tomatoes. Earlier identification of these diseases would reduce the disease’s effect on tomato plants and enhance good crop yield. Different innovative ways of identifying and classifying certain diseases have been used extensively. The motive of work is to support farmers in identifying early-stage diseases accurately and informing them about these diseases. The Convolutional Neural Network (CNN) is used to effectively define and classify tomato diseases. Google Colab is used to conduct the complete experiment with a dataset containing 3000 images of tomato leaves affected by nine different diseases and a healthy leaf. The complete process is described: Firstly, the input images are preprocessed, and the targeted area of images are segmented from the original images. Secondly, the images are further processed with varying hyper-parameters of the CNN model. Finally, CNN extracts other characteristics from pictures like colors, texture, and edges, etc. The findings demonstrate that the proposed model predictions are 98.49% accurate.


Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1085 ◽  
Author(s):  
Yeongtaek Song ◽  
Incheol Kim

This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model.


Insects ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 565
Author(s):  
Zhiliang Zhang ◽  
Wei Zhan ◽  
Zhangzhang He ◽  
Yafeng Zou

Statistical analysis and research on insect grooming behavior can find more effective methods for pest control. Traditional manual insect grooming behavior statistical methods are time-consuming, labor-intensive, and error-prone. Based on computer vision technology, this paper uses spatio-temporal context to extract video features, uses self-built Convolution Neural Network (CNN) to train the detection model, and proposes a simple and effective Bactrocera minax grooming behavior detection method, which automatically detects the grooming behaviors of the flies and analysis results by a computer program. Applying the method training detection model proposed in this paper, the videos of 22 adult flies with a total of 1320 min of grooming behavior were detected and analyzed, and the total detection accuracy was over 95%, the standard error of the accuracy of the behavior detection of each adult flies was less than 3%, and the difference was less than 15% when compared with the results of manual observation. The experimental results show that the method in this paper greatly reduces the time of manual observation and at the same time ensures the accuracy of insect behavior detection and analysis, which proposes a new informatization analysis method for the behavior statistics of Bactrocera minax and also provides a new idea for related insect behavior identification research.


2020 ◽  
Vol 96 (3s) ◽  
pp. 585-588
Author(s):  
С.Е. Фролова ◽  
Е.С. Янакова

Предлагаются методы построения платформ прототипирования высокопроизводительных систем на кристалле для задач искусственного интеллекта. Изложены требования к платформам подобного класса и принципы изменения проекта СнК для имплементации в прототип. Рассматриваются методы отладки проектов на платформе прототипирования. Приведены результаты работ алгоритмов компьютерного зрения с использованием нейросетевых технологий на FPGA-прототипе семантических ядер ELcore. Methods have been proposed for building prototyping platforms for high-performance systems-on-chip for artificial intelligence tasks. The requirements for platforms of this class and the principles for changing the design of the SoC for implementation in the prototype have been described as well as methods of debugging projects on the prototyping platform. The results of the work of computer vision algorithms using neural network technologies on the FPGA prototype of the ELcore semantic cores have been presented.


Sign in / Sign up

Export Citation Format

Share Document