Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation

Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD.

Download Full-text

Tomato Leaf Disease Diagnosis Based on Improved Convolution Neural Network by Attention Module

Agriculture ◽

10.3390/agriculture11070651 ◽

2021 ◽

Vol 11 (7) ◽

pp. 651

Author(s):

Shengyi Zhao ◽

Yun Peng ◽

Jizhan Liu ◽

Shuo Wu

Keyword(s):

Neural Network ◽

High Performance ◽

Model Comparison ◽

Research Direction ◽

Disease Diagnosis ◽

Tomato Leaf ◽

Identification Accuracy ◽

Main Research ◽

Proposed Model ◽

Complex Features

Crop disease diagnosis is of great significance to crop yield and agricultural production. Deep learning methods have become the main research direction to solve the diagnosis of crop diseases. This paper proposed a deep convolutional neural network that integrates an attention mechanism, which can better adapt to the diagnosis of a variety of tomato leaf diseases. The network structure mainly includes residual blocks and attention extraction modules. The model can accurately extract complex features of various diseases. Extensive comparative experiment results show that the proposed model achieves the average identification accuracy of 96.81% on the tomato leaf diseases dataset. It proves that the model has significant advantages in terms of network complexity and real-time performance compared with other models. Moreover, through the model comparison experiment on the grape leaf diseases public dataset, the proposed model also achieves better results, and the average identification accuracy of 99.24%. It is certified that add the attention module can more accurately extract the complex features of a variety of diseases and has fewer parameters. The proposed model provides a high-performance solution for crop diagnosis under the real agricultural environment.

Download Full-text

Interpretable Diagnosis for Whole-Slide Melanoma Histology Images Using Convolutional Neural Network

Journal of Healthcare Engineering ◽

10.1155/2021/8396438 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Peizhen Xie ◽

Ke Zuo ◽

Jie Liu ◽

Mingliang Chen ◽

Shuang Zhao ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Performance ◽

Operating Characteristic ◽

Distinctive Features ◽

Proposed Model ◽

Medical Image Diagnosis ◽

Melanoma Classification ◽

Activation Mapping ◽

Whole Slide Images

At present, deep learning-based medical image diagnosis had achieved high performance in several diseases. However, the black-box nature of the convolutional neural network (CNN) limits their role in diagnosis. In this study, a novel interpretable diagnosis pipeline using the CNN model was proposed. Furthermore, a sizeable melanoma database that contains 841 digital whole-slide images (WSIs) was built to train and evaluate the model. The model achieved strong melanoma classification ability (0.962 areas under the receiver operating characteristic, 0.887 sensitivity, and 0.925 specificity). Moreover, the proposed model outperformed the existing schemes in terms of accuracy that is 20 pathologists (0.933 vs 0.732 accuracy). Finally, the gradient-weighted class activation mapping (Grad-CAM) method was used to show the inner logic of the proposed model and its feasibility to improve diagnosis process in healthcare. The mechanism of feature heat maps which is visualized through a saliency mapping has demonstrated that features learned or extracted by the proposed model are compatible with the accepted pathological features. Conclusively, the proposed model provides a rapid and accurate diagnosis by locating the distinctive features of melanoma to build doctors’ trust in the CNNs’ diagnosis results.

Download Full-text

UNSUPERVISED LEARNING AND TEMPORAL CONTEXT TO RECALL COMPLEX ROBOT TRAJECTORIES

International Journal of Neural Systems ◽

10.1142/s0129065701000461 ◽

2001 ◽

Vol 11 (01) ◽

pp. 11-22 ◽

Cited By ~ 1

Author(s):

GUILHERME DE A. BARRETO ◽

ALUIZIO F. R. ARAÚJO

Keyword(s):

Neural Network ◽

Unsupervised Learning ◽

Hebbian Learning ◽

Temporal Context ◽

Global Context ◽

Temporal Features ◽

Proposed Model ◽

Unsupervised Neural Network

An unsupervised neural network is proposed to learn and recall complex robot trajectories. Two cases are considered: (i) A single trajectory in which a particular arm configuration (state) may occur more than once, and (ii) trajectories sharing states with each other. Ambiguities occur in both cases during recall of such trajectories. The proposed model consists of two groups of synaptic weights trained by competitive and Hebbian learning laws. They are responsible for encoding spatial and temporal features of the input sequences, respectively. Three mechanisms allow the network to deal with repeated or shared states: local and global context units, neurons disabled from learning, and redundancy. The network reproduces the current and the next state of the learned sequences and is able to resolve ambiguities. The model was simulated over various sets of robot trajectories in order to evaluate learning and recall, trajectory sampling effects and robustness.

Download Full-text

Video Scene Detection Using Compact Bag of Visual Word Models

Advances in Multimedia ◽

10.1155/2018/2564963 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Muhammad Haroon ◽

Junaid Baber ◽

Ihsan Ullah ◽

Sher Muhammad Daudpota ◽

Maheen Bakhtyar ◽

...

Keyword(s):

Video Segmentation ◽

Sliding Window ◽

Visual Word ◽

Dimensional Vector ◽

Scene Detection ◽

Feature Vectors ◽

Proposed Model ◽

Video Scene ◽

Segmentation Accuracy ◽

Key Frames

Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key frames. Key frames are further represented by BoVW feature vectors which are quite short and compact compared to classical BoVW model implementations. Two variations of BoVW model are used: (1) classical BoVW model and (2) Vector of Linearly Aggregated Descriptors (VLAD) which is an extension of classical BoVW model. The similarity of the shots is computed by the distances between their key frames feature vectors within the sliding window of length L, rather comparing each shot with very long lists of shots which has been previously practiced, and the value of L is 4. Experiments on cinematic and drama videos show the effectiveness of our proposed framework. The BoVW is 25000-dimensional vector and VLAD is only 2048-dimensional vector in the proposed model. The BoVW achieves 0.90 segmentation accuracy, whereas VLAD achieves 0.83.

Download Full-text

Semantic Scene Graph Generation Using RDF Model and Deep Learning

Applied Sciences ◽

10.3390/app11020826 ◽

2021 ◽

Vol 11 (2) ◽

pp. 826

Author(s):

Seongyong Kim ◽

Tae Hyeon Jeon ◽

Ilsun Rhiu ◽

Jinhyun Ahn ◽

Dong-Hyuk Im

Keyword(s):

Neural Network ◽

Deep Learning ◽

Semantic Content ◽

Controlled Vocabulary ◽

Learning Technology ◽

Scene Graph ◽

Content Consumption ◽

Description Framework ◽

Semantic Scene ◽

Graph Generation

Over the last several years, in parallel with the general global advancement in mobile technology and a rise in social media network content consumption, multimedia content production and reproduction has increased exponentially. Therefore, enabled by the rapid recent advancements in deep learning technology, research on scene graph generation is being actively conducted to more efficiently search for and classify images desired by users within a large amount of content. This approach lets users accurately find images they are searching for by expressing meaningful information on image content as nodes and edges of a graph. In this study, we propose a scene graph generation method based on using the Resource Description Framework (RDF) model to clarify semantic relations. Furthermore, we also use convolutional neural network (CNN) and recurrent neural network (RNN) deep learning models to generate a scene graph expressed in a controlled vocabulary of the RDF model to understand the relations between image object tags. Finally, we experimentally demonstrate through testing that our proposed technique can express semantic content more effectively than existing approaches.

Download Full-text

ST-PCNN: Spatio-Temporal Physics-Coupled Neural Networks for Dynamics Forecasting

10.21203/rs.3.rs-966026/v1 ◽

2021 ◽

Author(s):

Yu Huang ◽

James Li ◽

Min Shi ◽

Hanqi Zhuang ◽

Yufei Tang ◽

...

Keyword(s):

Neural Network ◽

Temporal Dynamics ◽

Temporal Correlation ◽

Current Data ◽

Ocean Current ◽

Coupled Neural Networks ◽

Proposed Model ◽

Essential Components ◽

Spatio Temporal ◽

Learn Physics

Abstract Ocean current, fluid mechanics, and many other physical systems with spatio-temporal dynamics are essential components of the universe. One key characteristic of such systems is that they can be represented by certain physics laws, such as ordinary/partial differential equations (ODEs/PDEs), irrespective of time or location. Physics-informed machine learning has recently emerged to learn physics from data for accurate prediction, but they often lack a mechanism to leverage localized spatial and temporal correlation or rely on hard-coded physics parameters. In this paper, we advocate a physics-coupled neural network model to learn parameters governing the physics of the system, and further couple the learned physics to assist the learning of recurring dynamics. Here a spatio-temporal physics-coupled neural network (ST-PCNN) model is proposed to achieve three goals: (1) learning the underlying physics parameters, (2) transition of local information between spatio-temporal regions, and (3) forecasting future values for the dynamical system. The physics-coupled learning ensures that the proposed model can be tremendously improved by using learned physics parameters, and can achieve useful long-range forecasting (e.g., more than two weeks). Experiments using simulated wave propagation and field-collected ocean current data validate that ST-PCNN outperforms typical deep learning models and existing physics-informed models.

Download Full-text

Early Detection and Classification of Tomato Leaf Disease Using High-Performance Deep Neural Network

Sensors ◽

10.3390/s21237987 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7987

Author(s):

Naresh K. Trivedi ◽

Vinay Gautam ◽

Abhineet Anand ◽

Hani Moaiteq Aljahdali ◽

Santos Gracia Villar ◽

...

Keyword(s):

Neural Network ◽

Crop Yield ◽

High Performance ◽

Early Stage ◽

Tomato Plants ◽

Leaf Disease ◽

Proposed Model ◽

Tomato Diseases

Tomato is one of the most essential and consumable crops in the world. Tomatoes differ in quantity depending on how they are fertilized. Leaf disease is the primary factor impacting the amount and quality of crop yield. As a result, it is critical to diagnose and classify these disorders appropriately. Different kinds of diseases influence the production of tomatoes. Earlier identification of these diseases would reduce the disease’s effect on tomato plants and enhance good crop yield. Different innovative ways of identifying and classifying certain diseases have been used extensively. The motive of work is to support farmers in identifying early-stage diseases accurately and informing them about these diseases. The Convolutional Neural Network (CNN) is used to effectively define and classify tomato diseases. Google Colab is used to conduct the complete experiment with a dataset containing 3000 images of tomato leaves affected by nine different diseases and a healthy leaf. The complete process is described: Firstly, the input images are preprocessed, and the targeted area of images are segmented from the original images. Secondly, the images are further processed with varying hyper-parameters of the CNN model. Finally, CNN extracts other characteristics from pictures like colors, texture, and edges, etc. The findings demonstrate that the proposed model predictions are 98.49% accurate.

Download Full-text

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals

Sensors ◽

10.3390/s19051085 ◽

2019 ◽

Vol 19 (5) ◽

pp. 1085 ◽

Cited By ~ 2

Author(s):

Yeongtaek Song ◽

Incheol Kim

Keyword(s):

Neural Network ◽

Network Model ◽

Neural Network Model ◽

Deep Neural Network ◽

Temporal Region ◽

Action Detection ◽

Coarse Level ◽

Proposed Model ◽

Spatio Temporal ◽

Temporal Action

This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model.

Download Full-text

Application of Spatio-Temporal Context and Convolution Neural Network (CNN) in Grooming Behavior of Bactrocera minax (Diptera: Trypetidae) Detection and Statistics

Insects ◽

10.3390/insects11090565 ◽

2020 ◽

Vol 11 (9) ◽

pp. 565

Author(s):

Zhiliang Zhang ◽

Wei Zhan ◽

Zhangzhang He ◽

Yafeng Zou

Keyword(s):

Neural Network ◽

Insect Behavior ◽

Convolution Neural Network ◽

Temporal Context ◽

Detection Accuracy ◽

Grooming Behavior ◽

Behavior Detection ◽

Detection Model ◽

Spatio Temporal ◽

The Difference

Statistical analysis and research on insect grooming behavior can find more effective methods for pest control. Traditional manual insect grooming behavior statistical methods are time-consuming, labor-intensive, and error-prone. Based on computer vision technology, this paper uses spatio-temporal context to extract video features, uses self-built Convolution Neural Network (CNN) to train the detection model, and proposes a simple and effective Bactrocera minax grooming behavior detection method, which automatically detects the grooming behaviors of the flies and analysis results by a computer program. Applying the method training detection model proposed in this paper, the videos of 22 adult flies with a total of 1320 min of grooming behavior were detected and analyzed, and the total detection accuracy was over 95%, the standard error of the accuracy of the behavior detection of each adult flies was less than 3%, and the difference was less than 15% when compared with the results of manual observation. The experimental results show that the method in this paper greatly reduces the time of manual observation and at the same time ensures the accuracy of insect behavior detection and analysis, which proposes a new informatization analysis method for the behavior statistics of Bactrocera minax and also provides a new idea for related insect behavior identification research.

Download Full-text

МЕТОДЫ ДОСТИЖЕНИЯ МАКСИМАЛЬНОЙ ЭФФЕКТИВНОСТИ ПЛАТФОРМЫ ПРОТОТИПИРОВАНИЯ ВЫСОКОПРОИЗВОДИТЕЛЬНЫХ СИСТЕМ НА КРИСТАЛЛЕ НА ЗАДАЧАХ ИСКУССТВЕННОГО ИНТЕЛЛЕКТА

Nanoindustry Russia ◽

10.22184/1993-8578.2020.13.3s.585.588 ◽

2020 ◽

Vol 96 (3s) ◽

pp. 585-588

Author(s):

С.Е. Фролова ◽

Е.С. Янакова

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Computer Vision ◽

High Performance ◽

Systems On Chip ◽

High Performance Systems ◽

On Chip ◽

Network Technologies ◽

Neural Network Technologies

Предлагаются методы построения платформ прототипирования высокопроизводительных систем на кристалле для задач искусственного интеллекта. Изложены требования к платформам подобного класса и принципы изменения проекта СнК для имплементации в прототип. Рассматриваются методы отладки проектов на платформе прототипирования. Приведены результаты работ алгоритмов компьютерного зрения с использованием нейросетевых технологий на FPGA-прототипе семантических ядер ELcore. Methods have been proposed for building prototyping platforms for high-performance systems-on-chip for artificial intelligence tasks. The requirements for platforms of this class and the principles for changing the design of the SoC for implementation in the prototype have been described as well as methods of debugging projects on the prototyping platform. The results of the work of computer vision algorithms using neural network technologies on the FPGA prototype of the ELcore semantic cores have been presented.

Download Full-text