Use of Distributed Semi-Supervised Clustering for Text Classification

Pei Li; Ze Deng

doi:10.1142/s0218126619501275

Sentiment Classification Using Convolutional Neural Networks

Applied Sciences ◽

10.3390/app9112347 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2347 ◽

Cited By ~ 18

Author(s):

Hannah Kim ◽

Young-Seob Jeong

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Networks ◽

Text Classification ◽

State Of The Art ◽

Sentiment Classification ◽

Learning Models ◽

Text Data ◽

Textual Data ◽

Better Than

As the number of textual data is exponentially increasing, it becomes more important to develop models to analyze the text data automatically. The texts may contain various labels such as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some industrial fields, so many studies of text classification have appeared. Recently, the Convolutional Neural Network (CNN) has been adopted for the task of text classification and has shown quite successful results. In this paper, we propose convolutional neural networks for the task of sentiment classification. Through experiments with three well-known datasets, we show that employing consecutive convolutional layers is effective for relatively longer texts, and our networks are better than other state-of-the-art deep learning models.

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

CharTeC-Net: An Efficient and Lightweight Character-Based Convolutional Network for Text Classification

Journal of Electrical and Computer Engineering ◽

10.1155/2020/9701427 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Aboubakar Nasser Samatin Njikam ◽

Huan Zhao

Keyword(s):

Text Classification ◽

Building Block ◽

Large Scale ◽

State Of The Art ◽

Building Blocks ◽

Training Data ◽

Superior Performance ◽

Classification Problems ◽

Computationally Efficient ◽

Convolutional Network

This paper introduces an extremely lightweight (with just over around two hundred thousand parameters) and computationally efficient CNN architecture, named CharTeC-Net (Character-based Text Classification Network), for character-based text classification problems. This new architecture is composed of four building blocks for feature extraction. Each of these building blocks, except the last one, uses 1 × 1 pointwise convolutional layers to add more nonlinearity to the network and to increase the dimensions within each building block. In addition, shortcut connections are used in each building block to facilitate the flow of gradients over the network, but more importantly to ensure that the original signal present in the training data is shared across each building block. Experiments on eight standard large-scale text classification and sentiment analysis datasets demonstrate CharTeC-Net’s superior performance over baseline methods and yields competitive accuracy compared with state-of-the-art methods, although CharTeC-Net has only between 181,427 and 225,323 parameters and weighs less than 1 megabyte.

Enhancement of Text Analysis Using Context-Aware Normalization of Social Media Informal Text

Applied Sciences ◽

10.3390/app11178172 ◽

2021 ◽

Vol 11 (17) ◽

pp. 8172

Author(s):

Jebran Khan ◽

Sungchang Lee

Keyword(s):

Social Media ◽

Text Analysis ◽

State Of The Art ◽

Context Aware ◽

Intended Meaning ◽

Text Data ◽

Recall Accuracy ◽

Wide Range ◽

Textual Data ◽

Noisy Text

We proposed an application and data variations-independent, generic social media Textual Variations Handler (TVH) to deal with a wide range of noise in textual data generated in various social media (SM) applications for enhanced text analysis. The aim is to build an effective hybrid normalization technique that ensures the use of useful information of the noisy text in its intended form instead of filtering them out to analyze SM text better. The proposed TVH performs context-aware text normalization based on intended meaning to avoid the wrong word substitution. We integrate the TVH with state-of-the-art (SOTA) deep-learning-based text analysis methods to enhance their performance for noisy SM text data. The proposed scheme shows promising improvement in the text analysis of informal SM text in terms of precision, recall, accuracy, and F1-score in simulation.

Active semi-supervised framework with data editing

Computer Science and Information Systems ◽

10.2298/csis120202045z ◽

2012 ◽

Vol 9 (4) ◽

pp. 1513-1532 ◽

Cited By ~ 4

Author(s):

Xue Zhang ◽

Wangxin Xiao

Keyword(s):

Active Learning ◽

Supervised Learning ◽

Text Classification ◽

State Of The Art ◽

The Self ◽

Training Data ◽

Data Sets ◽

Text Data ◽

Data Editing ◽

Data Problem

In order to address the insufficient training data problem, many active semi-supervised algorithms have been proposed. The self-labeled training data in semi-supervised learning may contain much noise due to the insufficient training data. Such noise may snowball themselves in the following learning process and thus hurt the generalization ability of the final hypothesis. Extremely few labeled training data in sparsely labeled text classification aggravate such situation. If such noise could be identified and removed by some strategy, the performance of the active semi-supervised algorithms should be improved. However, such useful techniques of identifying and removing noise have been seldom explored in existing active semi-supervised algorithms. In this paper, we propose an active semi-supervised framework with data editing (we call it ASSDE) to improve sparsely labeled text classification. A data editing technique is used to identify and remove noise introduced by semi-supervised labeling. We carry out the data editing technique by fully utilizing the advantage of active learning, which is novel according to our knowledge. The fusion of active learning with data editing makes ASSDE more robust to the sparsity and the distribution bias of the training data. It further simplifies the design of semi-supervised learning which makes ASSDE more efficient. Extensive experimental study on several real-world text data sets shows the encouraging results of the proposed framework for sparsely labeled text classification, compared with several state-of-the-art methods.

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692.v1 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

A Bottom-Up Clustering Approach to Unsupervised Person Re-Identification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018738 ◽

2019 ◽

Vol 33 ◽

pp. 8738-8745 ◽

Cited By ~ 41

Author(s):

Yutian Lin ◽

Xuanyi Dong ◽

Liang Zheng ◽

Yan Yan ◽

Yi Yang

Keyword(s):

Supervised Learning ◽

Large Scale ◽

State Of The Art ◽

Training Data ◽

Real World Data ◽

Bottom Up ◽

Data Volume ◽

Clustering Approach ◽

The Individual ◽

The Relationship

Most person re-identification (re-ID) approaches are based on supervised learning, which requires intensive manual annotation for training data. However, it is not only resourceintensive to acquire identity annotation but also impractical to label the large-scale real-world data. To relieve this problem, we propose a bottom-up clustering (BUC) approach to jointly optimize a convolutional neural network (CNN) and the relationship among the individual samples. Our algorithm considers two fundamental facts in the re-ID task, i.e., diversity across different identities and similarity within the same identity. Specifically, our algorithm starts with regarding individual sample as a different identity, which maximizes the diversity over each identity. Then it gradually groups similar samples into one identity, which increases the similarity within each identity. We utilizes a diversity regularization term in the bottom-up clustering procedure to balance the data volume of each cluster. Finally, the model achieves an effective trade-off between the diversity and similarity. We conduct extensive experiments on the large-scale image and video re-ID datasets, including Market-1501, DukeMTMCreID, MARS and DukeMTMC-VideoReID. The experimental results demonstrate that our algorithm is not only superior to state-of-the-art unsupervised re-ID approaches, but also performs favorably than competing transfer learning and semi-supervised learning methods.

MEDA: Meta-Learning with Data Augmentation for Few-Shot Text Classification

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/541 ◽

2021 ◽

Author(s):

Pengfei Sun ◽

Yawen Ouyang ◽

Wenming Zhang ◽

Xin-yu Dai

Keyword(s):

Text Classification ◽

Data Augmentation ◽

State Of The Art ◽

Promising Technique ◽

Learning Methods ◽

Text Data ◽

Art Methods ◽

Visual Tasks ◽

Meta Learning

Meta-learning has recently emerged as a promising technique to address the challenge of few-shot learning. However, standard meta-learning methods mainly focus on visual tasks, which makes it hard for them to deal with diverse text data directly. In this paper, we introduce a novel framework for few-shot text classification, which is named as MEta-learning with Data Augmentation (MEDA). MEDA is composed of two modules, a ball generator and a meta-learner, which are learned jointly. The ball generator is to increase the number of shots per class by generating more samples, so that meta-learner can be trained with both original and augmented samples. It is worth noting that ball generator is agnostic to the choice of the meta-learning methods. Experiment results show that on both datasets, MEDA outperforms existing state-of-the-art methods and significantly improves the performance of meta-learning on few-shot text classification.

Design, Operating Results and Experiences of Two Large-Scale Treatment Plants with Biological Nitrogen and Phosphate Elimination

Water Science & Technology ◽

10.2166/wst.1992.0499 ◽

1992 ◽

Vol 25 (4-5) ◽

pp. 225-232

Author(s):

C. F. Seyfried ◽

P. Hartwig

Keyword(s):

Waste Water ◽

Water Treatment ◽

Large Scale ◽

Waste Water Treatment ◽

Treatment Plant ◽

Water Treatment Plant ◽

Waste Water Treatment Plant ◽

Main Stream ◽

Side Stream ◽

Biological Nitrogen

This is a report on the design and operating results of two waste water treatment plants which make use of biological nitrogen and phosphate elimination. Both plants are characterized by load situations that are unfavourable for biological P elimination. The influent of the HILDESHEIM WASTE WATER TREATMENT PLANT contains nitrates and little BOD5. Use of the ISAH process ensures the optimum exploitation of the easily degradable substrate for the redissolution of phosphates. Over 70 % phosphate elimination and effluent concentrations of 1.3 mg PO4-P/I have been achieved. Due to severe seasonal fluctuations in loading the activated sludge plant of the HUSUM WASTE WATER TREATMENT PLANT has to be operated in the stabilization range (F/M ≤ 0.05 kg/(kg·d)) in order not to infringe the required effluent values of 3.9 mg NH4-N/l (2-h-average). The production of surplus sludge is at times too small to allow biological phosphate elimination to be effected in the main stream process. The CISAH (Combined ISAH) process is a combination of the fullstream with the side stream process. It is used in order to achieve the optimum exploitation of biological phosphate elimination by the precipitation of a stripped side stream with a high phosphate content when necessary.

Documentary data and the study of past droughts: a global state of the art

Climate of the Past ◽

10.5194/cp-14-1915-2018 ◽

2018 ◽

Vol 14 (12) ◽

pp. 1915-1960 ◽

Cited By ~ 34

Author(s):

Rudolf Brázdil ◽

Andrea Kiss ◽

Jürg Luterbacher ◽

David J. Nash ◽

Ladislava Řezníčková

Keyword(s):

Large Scale ◽

State Of The Art ◽

Drought Indices ◽

Documentary Evidence ◽

Climatic Trends ◽

Instrumental Observations ◽

Spatio Temporal ◽

Epigraphic Evidence ◽

Administrative Evidence

Abstract. The use of documentary evidence to investigate past climatic trends and events has become a recognised approach in recent decades. This contribution presents the state of the art in its application to droughts. The range of documentary evidence is very wide, including general annals, chronicles, memoirs and diaries kept by missionaries, travellers and those specifically interested in the weather; records kept by administrators tasked with keeping accounts and other financial and economic records; legal-administrative evidence; religious sources; letters; songs; newspapers and journals; pictographic evidence; chronograms; epigraphic evidence; early instrumental observations; society commentaries; and compilations and books. These are available from many parts of the world. This variety of documentary information is evaluated with respect to the reconstruction of hydroclimatic conditions (precipitation, drought frequency and drought indices). Documentary-based drought reconstructions are then addressed in terms of long-term spatio-temporal fluctuations, major drought events, relationships with external forcing and large-scale climate drivers, socio-economic impacts and human responses. Documentary-based drought series are also considered from the viewpoint of spatio-temporal variability for certain continents, and their employment together with hydroclimate reconstructions from other proxies (in particular tree rings) is discussed. Finally, conclusions are drawn, and challenges for the future use of documentary evidence in the study of droughts are presented.