scholarly journals Multi-Evidence and Multi-Modal Fusion Network for Ground-Based Cloud Recognition

2020 ◽  
Vol 12 (3) ◽  
pp. 464
Author(s):  
Shuang Liu ◽  
Mei Li ◽  
Zhong Zhang ◽  
Baihua Xiao ◽  
Tariq S. Durrani

In recent times, deep neural networks have drawn much attention in ground-based cloud recognition. Yet such kind of approaches simply center upon learning global features from visual information, which causes incomplete representations for ground-based clouds. In this paper, we propose a novel method named multi-evidence and multi-modal fusion network (MMFN) for ground-based cloud recognition, which could learn extended cloud information by fusing heterogeneous features in a unified framework. Namely, MMFN exploits multiple pieces of evidence, i.e., global and local visual features, from ground-based cloud images using the main network and the attentive network. In the attentive network, local visual features are extracted from attentive maps which are obtained by refining salient patterns from convolutional activation maps. Meanwhile, the multi-modal network in MMFN learns multi-modal features for ground-based cloud. To fully fuse the multi-modal and multi-evidence visual features, we design two fusion layers in MMFN to incorporate multi-modal features with global and local visual features, respectively. Furthermore, we release the first multi-modal ground-based cloud dataset named MGCD which not only contains the ground-based cloud images but also contains the multi-modal information corresponding to each cloud image. The MMFN is evaluated on MGCD and achieves a classification accuracy of 88.63% comparative to the state-of-the-art methods, which validates its effectiveness for ground-based cloud recognition.

2022 ◽  
Vol 14 (2) ◽  
pp. 259
Author(s):  
Yuting Yang ◽  
Kenneth Kin-Man Lam ◽  
Xin Sun ◽  
Junyu Dong ◽  
Redouane Lguensat

Marine hydrological elements are of vital importance in marine surveys. The evolution of these elements can have a profound effect on the relationship between human activities and marine hydrology. Therefore, the detection and explanation of the evolution laws of marine hydrological elements are urgently needed. In this paper, a novel method, named Evolution Trend Recognition (ETR), is proposed to recognize the trend of ocean fronts, being the most important information in the ocean dynamic process. Therefore, in this paper, we focus on the task of ocean-front trend classification. A novel classification algorithm is first proposed for recognizing the ocean-front trend, in terms of the ocean-front scale and strength. Then, the GoogLeNet Inception network is trained to classify the ocean-front trend, i.e., enhancing or attenuating. The ocean-front trend is classified using the deep neural network, as well as a physics-informed classification algorithm. The two classification results are combined to make the final decision on the trend classification. Furthermore, two novel databases were created for this research, and their generation method is described, to foster research in this direction. These two databases are called the Ocean-Front Tracking Dataset (OFTraD) and the Ocean-Front Trend Dataset (OFTreD). Moreover, experiment results show that our proposed method on OFTreD achieves a higher classification accuracy, which is 97.5%, than state-of-the-art networks. This demonstrates that the proposed ETR algorithm is highly promising for trend classification.


Author(s):  
Yun-Peng Liu ◽  
Ning Xu ◽  
Yu Zhang ◽  
Xin Geng

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.


2020 ◽  
Vol 34 (6) ◽  
pp. 1963-1983
Author(s):  
Maryam Habibi ◽  
Johannes Starlinger ◽  
Ulf Leser

Abstract Tables are a common way to present information in an intuitive and concise manner. They are used extensively in media such as scientific articles or web pages. Automatically analyzing the content of tables bears special challenges. One of the most basic tasks is determination of the orientation of a table: In column tables, columns represent one entity with the different attribute values present in the different rows; row tables are vice versa, and matrix tables give information on pairs of entities. In this paper, we address the problem of classifying a given table into one of the three layouts horizontal (for row tables), vertical (for column tables), and matrix. We describe DeepTable, a novel method based on deep neural networks designed for learning from sets. Contrary to previous state-of-the-art methods, this basis makes DeepTable invariant to the permutation of rows or columns, which is a highly desirable property as in most tables the order of rows and columns does not carry specific information. We evaluate our method using a silver standard corpus of 5500 tables extracted from biomedical articles where the layout was determined heuristically. DeepTable outperforms previous methods in both precision and recall on our corpus. In a second evaluation, we manually labeled a corpus of 300 tables and were able to confirm DeepTable to reach superior performance in the table layout classification task. The codes and resources introduced here are available at https://github.com/Marhabibi/DeepTable.


Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1689
Author(s):  
Lan Huang ◽  
Shaoqing Jiao ◽  
Sen Yang ◽  
Shuangquan Zhang ◽  
Xiaopeng Zhu ◽  
...  

Long noncoding RNA (lncRNA) plays a crucial role in many critical biological processes and participates in complex human diseases through interaction with proteins. Considering that identifying lncRNA–protein interactions through experimental methods is expensive and time-consuming, we propose a novel method based on deep learning that combines raw sequence composition features and hand-designed features, called LGFC-CNN, to predict lncRNA–protein interactions. The two sequence preprocessing methods and CNN modules (GloCNN and LocCNN) are utilized to extract the raw sequence global and local features. Meanwhile, we select hand-designed features by comparing the predictive effect of different lncRNA and protein features combinations. Furthermore, we obtain the structure features and unifying the dimensions through Fourier transform. In the end, the four types of features are integrated to comprehensively predict the lncRNA–protein interactions. Compared with other state-of-the-art methods on three lncRNA–protein interaction datasets, LGFC-CNN achieves the best performance with an accuracy of 94.14%, on RPI21850; an accuracy of 92.94%, on RPI7317; and an accuracy of 98.19% on RPI1847. The results show that our LGFC-CNN can effectively predict the lncRNA–protein interactions by combining raw sequence composition features and hand-designed features.


Author(s):  
Gaode Chen ◽  
Xinghua Zhang ◽  
Yanyan Zhao ◽  
Cong Xue ◽  
Ji Xiang

Sequential recommendation systems alleviate the problem of information overload, and have attracted increasing attention in the literature. Most prior works usually obtain an overall representation based on the user’s behavior sequence, which can not sufficiently reflect the multiple interests of the user. To this end, we propose a novel method called PIMI to mitigate this issue. PIMI can model the user’s multi-interest representation effectively by considering both the periodicity and interactivity in the item sequence. Specifically, we design a periodicity-aware module to utilize the time interval information between user’s behaviors. Meanwhile, an ingenious graph is proposed to enhance the interactivity between items in user’s behavior sequence, which can capture both global and local item features. Finally, a multi-interest extraction module is applied to describe user’s multiple interests based on the obtained item representation. Extensive experiments on two real-world datasets Amazon and Taobao show that PIMI outperforms state-of-the-art methods consistently.


Atmosphere ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 584
Author(s):  
Wei-Ta Chu ◽  
Yu-Hsuan Liang ◽  
Kai-Chia Ho

We attempted to employ convolutional neural networks to extract visual features and developed recurrent neural networks for weather property estimation using only image data. Four common weather properties are estimated, i.e., temperature, humidity, visibility, and wind speed. Based on the success of previous works on temperature prediction, we extended them in terms of two aspects. First, by considering the effectiveness of deep multi-task learning, we jointly estimated four weather properties on the basis of the same visual information. Second, we propose that weather property estimations considering temporal evolution can be conducted from two perspectives, i.e., day-wise or hour-wise. A two-dimensional recurrent neural network is thus proposed to unify the two perspectives. In the evaluation, we show that better prediction accuracy can be obtained compared to the state-of-the-art models. We believe that the proposed approach is the first visual weather property estimation model trained based on multi-task learning.


2019 ◽  
Vol 9 (16) ◽  
pp. 3260 ◽  
Author(s):  
Jiangyun Li ◽  
Peng Yao ◽  
Longteng Guo ◽  
Weicun Zhang

Image captioning attempts to generate a description given an image, usually taking Convolutional Neural Network as the encoder to extract the visual features and a sequence model, among which the self-attention mechanism has achieved advanced progress recently, as the decoder to generate descriptions. However, this predominant encoder-decoder architecture has some problems to be solved. On the encoder side, without the semantic concepts, the extracted visual features do not make full use of the image information. On the decoder side, the sequence self-attention only relies on word representations, lacking the guidance of visual information and easily influenced by the language prior. In this paper, we propose a novel boosted transformer model with two attention modules for the above-mentioned problems, i.e., “Concept-Guided Attention” (CGA) and “Vision-Guided Attention” (VGA). Our model utilizes CGA in the encoder, to obtain the boosted visual features by integrating the instance-level concepts into the visual features. In the decoder, we stack VGA, which uses the visual information as a bridge to model internal relationships among the sequences and can be an auxiliary module of sequence self-attention. Quantitative and qualitative results on the Microsoft COCO dataset demonstrate the better performance of our model than the state-of-the-art approaches.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 660
Author(s):  
Alvydas Šoliūnas

Background:The present study concerns parallel and serial processing of visual information, or more specifically, whether visual objects are identified successively or simultaneously in multiple object stimulus. Some findings in scene perception demonstrate the potential parallel processing of different sources of information in a stimulus; however, more extensive investigation is needed.Methods:We presented one, two or three visual objects of different categories for 100 ms and afterwards asked subjects whether a specified category was present in the stimulus. We varied the number of objects, the number of categories and the type of object shape distortion (distortion of either global or local features). Results:The response time and accuracy data corresponded to data from a previous experiment, which demonstrated that performance efficiency mostly depends on the number of categories but not on the number of objects. Two and three objects of the same category were identified with the same accuracy and the same response time, but two objects were identified faster and more accurately than three objects if they belonged to different categories. Distortion type did not affect the pattern of performance.Conclusions:The findings suggest the idea that objects of the same category can be identified simultaneously and that identification involves both local and global features.


Author(s):  
Tuan Hoang ◽  
Thanh-Toan Do ◽  
Tam V. Nguyen ◽  
Ngai-Man Cheung

This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. However, this approach would result in some mismatch: the gradient descent updates full-precision weights, but it does not update the quantized weights. To address this issue, we propose a novel method that enables direct updating of quantized weights with learnable quantization levels to minimize the cost function using gradient descent. Second, to obtain low bit-width activations, existing works consider all channels equally. However, the activation quantizers could be biased toward a few channels with high-variance. To address this issue, we propose a method to take into account the quantization errors of individual channels. With this approach, we can learn activation quantizers that minimize the quantization errors in the majority of channels. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on the image classification task, using AlexNet, ResNet and MobileNetV2 architectures on CIFAR-100 and ImageNet datasets.


2009 ◽  
Vol 101 (3) ◽  
pp. 1444-1462 ◽  
Author(s):  
Hiroki Tanaka ◽  
Izumi Ohzawa

Neurons with surround suppression have been implicated in processing high-order visual features such as contrast- or texture-defined boundaries and subjective contours. However, little is known regarding how these neurons encode high-order visual information in a systematic manner as a population. To address this issue, we have measured detailed spatial structures of classical center and suppressive surround regions of receptive fields of primary visual cortex (V1) neurons and examined how a population of such neurons allow encoding of various high-order features and shapes in visual scenes. Using a novel method to reconstruct structures, we found that the center and surround regions are often both elongated parallel to each other, reminiscent of on and off subregions of simple cells without surround suppression. These structures allow V1 neurons to extract high-order contours of various orientations and spatial frequencies, with a variety of optimal values across neurons. The results show that a wide range of orientations and widths of the high-order features are systematically represented by the population of V1 neurons with surround suppression.


Sign in / Sign up

Export Citation Format

Share Document