Repetitive Reprediction Deep Decipher for Semi-Supervised Learning

Most recent semi-supervised deep learning (deep SSL) methods used a similar paradigm: use network predictions to update pseudo-labels and use pseudo-labels to update network parameters iteratively. However, they lack theoretical support and cannot explain why predictions are good candidates for pseudo-labels. In this paper, we propose a principled end-to-end framework named deep decipher (D2) for SSL. Within the D2 framework, we prove that pseudo-labels are related to network predictions by an exponential link function, which gives a theoretical support for using predictions as pseudo-labels. Furthermore, we demonstrate that updating pseudo-labels by network predictions will make them uncertain. To mitigate this problem, we propose a training strategy called repetitive reprediction (R2). Finally, the proposed R2-D2 method is tested on the large-scale ImageNet dataset and outperforms state-of-the-art methods by 5 percentage points.

Download Full-text

SHEDR: An End-to-End Deep Neural Event Detection and Recommendation Framework for Hyperlocal News Using Social Media

INFORMS Journal on Computing ◽

10.1287/ijoc.2021.1112 ◽

2021 ◽

Author(s):

Yuheng Hu ◽

Yili Hong

Keyword(s):

Neural Network ◽

Social Media ◽

Deep Learning ◽

Event Detection ◽

Large Scale ◽

Short Term Memory ◽

State Of The Art ◽

Neural Network Models ◽

Neural Event ◽

End To End

Residents often rely on newspapers and television to gather hyperlocal news for community awareness and engagement. More recently, social media have emerged as an increasingly important source of hyperlocal news. Thus far, the literature on using social media to create desirable societal benefits, such as civic awareness and engagement, is still in its infancy. One key challenge in this research stream is to timely and accurately distill information from noisy social media data streams to community members. In this work, we develop SHEDR (social media–based hyperlocal event detection and recommendation), an end-to-end neural event detection and recommendation framework with a particular use case for Twitter to facilitate residents’ information seeking of hyperlocal events. The key model innovation in SHEDR lies in the design of the hyperlocal event detector and the event recommender. First, we harness the power of two popular deep neural network models, the convolutional neural network (CNN) and long short-term memory (LSTM), in a novel joint CNN-LSTM model to characterize spatiotemporal dependencies for capturing unusualness in a region of interest, which is classified as a hyperlocal event. Next, we develop a neural pairwise ranking algorithm for recommending detected hyperlocal events to residents based on their interests. To alleviate the sparsity issue and improve personalization, our algorithm incorporates several types of contextual information covering topic, social, and geographical proximities. We perform comprehensive evaluations based on two large-scale data sets comprising geotagged tweets covering Seattle and Chicago. We demonstrate the effectiveness of our framework in comparison with several state-of-the-art approaches. We show that our hyperlocal event detection and recommendation models consistently and significantly outperform other approaches in terms of precision, recall, and F-1 scores. Summary of Contribution: In this paper, we focus on a novel and important, yet largely underexplored application of computing—how to improve civic engagement in local neighborhoods via local news sharing and consumption based on social media feeds. To address this question, we propose two new computational and data-driven methods: (1) a deep learning–based hyperlocal event detection algorithm that scans spatially and temporally to detect hyperlocal events from geotagged Twitter feeds; and (2) A personalized deep learning–based hyperlocal event recommender system that systematically integrates several contextual cues such as topical, geographical, and social proximity to recommend the detected hyperlocal events to potential users. We conduct a series of experiments to examine our proposed models. The outcomes demonstrate that our algorithms are significantly better than the state-of-the-art models and can provide users with more relevant information about the local neighborhoods that they live in, which in turn may boost their community engagement.

Download Full-text

PestNet: An End-to-End Deep Learning Approach for Large-Scale Multi-Class Pest Detection and Classification

IEEE Access ◽

10.1109/access.2019.2909522 ◽

2019 ◽

Vol 7 ◽

pp. 45301-45312 ◽

Cited By ~ 14

Author(s):

Liu Liu ◽

Rujing Wang ◽

Chengjun Xie ◽

Po Yang ◽

Fangyuan Wang ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Learning Approach ◽

Pest Detection ◽

End To End

Download Full-text

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6174 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6917-6924 ◽

Cited By ~ 1

Author(s):

Ya Zhao ◽

Rui Xu ◽

Xinchao Wang ◽

Peng Hou ◽

Haihong Tang ◽

...

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Error Rate ◽

Large Scale ◽

State Of The Art ◽

Lip Reading ◽

Speech Recognizers ◽

Lip Movement ◽

Knowledge Distillation ◽

The One

Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multi-granularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer's prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.

Download Full-text

A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

Sensors ◽

10.3390/s20113305 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3305 ◽

Cited By ~ 1

Author(s):

Huogen Wang ◽

Zhanjie Song ◽

Wanqing Li ◽

Pichao Wang

Keyword(s):

Neural Network ◽

Action Recognition ◽

Canonical Correlation ◽

Large Scale ◽

State Of The Art ◽

Hybrid Network ◽

Support Vector ◽

Multiple Modalities ◽

Large Margin ◽

Percentage Points

The paper presents a novel hybrid network for large-scale action recognition from multiple modalities. The network is built upon the proposed weighted dynamic images. It effectively leverages the strengths of the emerging Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches to specifically address the challenges that occur in large-scale action recognition and are not fully dealt with by the state-of-the-art methods. Specifically, the proposed hybrid network consists of a CNN based component and an RNN based component. Features extracted by the two components are fused through canonical correlation analysis and then fed to a linear Support Vector Machine (SVM) for classification. The proposed network achieved state-of-the-art results on the ChaLearn LAP IsoGD, NTU RGB+D and Multi-modal & Multi-view & Interactive ( M 2 I ) datasets and outperformed existing methods by a large margin (over 10 percentage points in some cases).

Download Full-text

Cooperative Hybrid Semi-Supervised Learning for Text Sentiment Classification

Symmetry ◽

10.3390/sym11020133 ◽

2019 ◽

Vol 11 (2) ◽

pp. 133 ◽

Cited By ~ 2

Author(s):

Yang Li ◽

Ying Lv ◽

Suge Wang ◽

Jiye Liang ◽

Juanzi Li ◽

...

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Ensemble Classifier ◽

Sentiment Classification ◽

Training Dataset ◽

Support Vector ◽

Seed Selection ◽

Training Strategy ◽

Whole Process ◽

Self Learning

A large-scale and high-quality training dataset is an important guarantee to learn an ideal classifier for text sentiment classification. However, manually constructing such a training dataset with sentiment labels is a labor-intensive and time-consuming task. Therefore, based on the idea of effectively utilizing unlabeled samples, a synthetical framework that covers the whole process of semi-supervised learning from seed selection, iterative modification of the training text set, to the co-training strategy of the classifier is proposed in this paper for text sentiment classification. To provide an important basis for selecting the seed texts and modifying the training text set, three kinds of measures—the cluster similarity degree of an unlabeled text, the cluster uncertainty degree of a pseudo-label text to a learner, and the reliability degree of a pseudo-label text to a learner—are defined. With these measures, a seed selection method based on Random Swap clustering, a hybrid modification method of the training text set based on active learning and self-learning, and an alternately co-training strategy of the ensemble classifier of the Maximum Entropy and Support Vector Machine are proposed and combined into our framework. The experimental results on three Chinese datasets (COAE2014, COAE2015, and a Hotel review, respectively) and five English datasets (Books, DVD, Electronics, Kitchen, and MR, respectively) in the real world verify the effectiveness of the proposed framework.

Download Full-text

Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data

Future Internet ◽

10.3390/fi11090190 ◽

2019 ◽

Vol 11 (9) ◽

pp. 190 ◽

Cited By ~ 3

Author(s):

Jamal ◽

Xianqiao ◽

Aldabbas

Keyword(s):

Deep Learning ◽

Large Scale ◽

State Of The Art ◽

Hybrid Approach ◽

Principal Component ◽

Specific Topic ◽

Weighting Method ◽

Psychological Conditions ◽

Twitter Data ◽

Wide Range

Emotions detection in social media is very effective to measure the mood of people about a specific topic, news, or product. It has a wide range of applications, including identifying psychological conditions such as anxiety or depression in users. However, it is a challenging task to distinguish useful emotions’ features from a large corpus of text because emotions are subjective, with limited fuzzy boundaries that may be expressed in different terminologies and perceptions. To tackle this issue, this paper presents a hybrid approach of deep learning based on TensorFlow with Keras for emotions detection on a large scale of imbalanced tweets’ data. First, preprocessing steps are used to get useful features from raw tweets without noisy data. Second, the entropy weighting method is used to compute the importance of each feature. Third, class balancer is applied to balance each class. Fourth, Principal Component Analysis (PCA) is applied to transform high correlated features into normalized forms. Finally, the TensorFlow based deep learning with Keras algorithm is proposed to predict high-quality features for emotions classification. The proposed methodology is analyzed on a dataset of 1,600,000 tweets collected from the website ‘kaggle’. Comparison is made of the proposed approach with other state of the art techniques on different training ratios. It is proved that the proposed approach outperformed among other techniques.

Download Full-text

Mixed Maximum Loss Design for Optic Disc and Optic Cup Segmentation with Deep Learning from Imbalanced Samples

Sensors ◽

10.3390/s19204401 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4401 ◽

Cited By ~ 3

Author(s):

Yong-li Xu ◽

Shuai Lu ◽

Han-xiong Li ◽

Rui-rui Li

Keyword(s):

Deep Learning ◽

Optic Disc ◽

Learning Strategy ◽

State Of The Art ◽

Loss Minimization ◽

Training Strategy ◽

Multi Scale ◽

Maximum Loss ◽

Screening Performance ◽

Optic Cup

Glaucoma is a serious eye disease that can cause permanent blindness and is difficult to diagnose early. Optic disc (OD) and optic cup (OC) play a pivotal role in the screening of glaucoma. Therefore, accurate segmentation of OD and OC from fundus images is a key task in the automatic screening of glaucoma. In this paper, we designed a U-shaped convolutional neural network with multi-scale input and multi-kernel modules (MSMKU) for OD and OC segmentation. Such a design gives MSMKU a rich receptive field and is able to effectively represent multi-scale features. In addition, we designed a mixed maximum loss minimization learning strategy (MMLM) for training the proposed MSMKU. This training strategy can adaptively sort the samples by the loss function and re-weight the samples through data enhancement, thereby synchronously improving the prediction performance of all samples. Experiments show that the proposed method has obtained a state-of-the-art breakthrough result for OD and OC segmentation on the RIM-ONE-V3 and DRISHTI-GS datasets. At the same time, the proposed method achieved satisfactory glaucoma screening performance on the RIM-ONE-V3 and DRISHTI-GS datasets. On datasets with an imbalanced distribution between typical and rare sample images, the proposed method obtained a higher accuracy than existing deep learning methods.

Download Full-text

Scene text removal via cascaded text stroke detection and erasing

Computational Visual Media ◽

10.1007/s41095-021-0242-8 ◽

2021 ◽

Vol 8 (2) ◽

pp. 273-287

Author(s):

Xuewei Bian ◽

Chaoqun Wang ◽

Weize Quan ◽

Juntao Ye ◽

Xiaopeng Zhang ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Processing Unit ◽

Final Model ◽

Scene Text ◽

End To End

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Download Full-text

Crop Rotation Modeling for Deep Learning-Based Parcel Classification from Satellite Time Series

Remote Sensing ◽

10.3390/rs13224599 ◽

2021 ◽

Vol 13 (22) ◽

pp. 4599

Author(s):

Félix Quinton ◽

Loic Landrieu

Keyword(s):

Time Series ◽

Deep Learning ◽

Crop Rotation ◽

Large Scale ◽

State Of The Art ◽

Crop Rotations ◽

Learning Approach ◽

Type Mapping ◽

Current State ◽

Crop Type

While annual crop rotations play a crucial role for agricultural optimization, they have been largely ignored for automated crop type mapping. In this paper, we take advantage of the increasing quantity of annotated satellite data to propose to model simultaneously the inter- and intra-annual agricultural dynamics of yearly parcel classification with a deep learning approach. Along with simple training adjustments, our model provides an improvement of over 6.3% mIoU over the current state-of-the-art of crop classification, and a reduction of over 21% of the error rate. Furthermore, we release the first large-scale multi-year agricultural dataset with over 300,000 annotated parcels.

Download Full-text

A Deep Learning Algorithm for the Max-Cut Problem Based on Pointer Network Structure with Supervised Learning and Reinforcement Learning Strategies

Mathematics ◽

10.3390/math8020298 ◽

2020 ◽

Vol 8 (2) ◽

pp. 298 ◽

Cited By ~ 2

Author(s):

Shenshen Gu ◽

Yue Yang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Reinforcement Learning ◽

Combinatorial Optimization ◽

Supervised Learning ◽

Learning Strategies ◽

Large Scale ◽

Deep Neural Network ◽

Max Cut Problem ◽

Cut Problems

The Max-cut problem is a well-known combinatorial optimization problem, which has many real-world applications. However, the problem has been proven to be non-deterministic polynomial-hard (NP-hard), which means that exact solution algorithms are not suitable for large-scale situations, as it is too time-consuming to obtain a solution. Therefore, designing heuristic algorithms is a promising but challenging direction to effectively solve large-scale Max-cut problems. For this reason, we propose a unique method which combines a pointer network and two deep learning strategies (supervised learning and reinforcement learning) in this paper, in order to address this challenge. A pointer network is a sequence-to-sequence deep neural network, which can extract data features in a purely data-driven way to discover the hidden laws behind data. Combining the characteristics of the Max-cut problem, we designed the input and output mechanisms of the pointer network model, and we used supervised learning and reinforcement learning to train the model to evaluate the model performance. Through experiments, we illustrated that our model can be well applied to solve large-scale Max-cut problems. Our experimental results also revealed that the new method will further encourage broader exploration of deep neural network for large-scale combinatorial optimization problems.

Download Full-text