Recurrent Coupled Topic Modeling over Sequential Documents

The abundant sequential documents such as online archival, social media, and news feeds are streamingly updated, where each chunk of documents is incorporated with smoothly evolving yet dependent topics. Such digital texts have attracted extensive research on dynamic topic modeling to infer hidden evolving topics and their temporal dependencies. However, most of the existing approaches focus on single-topic-thread evolution and ignore the fact that a current topic may be coupled with multiple relevant prior topics. In addition, these approaches also incur the intractable inference problem when inferring latent parameters, resulting in a high computational cost and performance degradation. In this work, we assume that a current topic evolves from all prior topics with corresponding coupling weights, forming the multi-topic-thread evolution . Our method models the dependencies between evolving topics and thoroughly encodes their complex multi-couplings across time steps. To conquer the intractable inference challenge, a new solution with a set of novel data augmentation techniques is proposed, which successfully discomposes the multi-couplings between evolving topics. A fully conjugate model is thus obtained to guarantee the effectiveness and efficiency of the inference technique. A novel Gibbs sampler with a backward–forward filter algorithm efficiently learns latent time-evolving parameters in a closed-form. In addition, the latent Indian Buffet Process compound distribution is exploited to automatically infer the overall topic number and customize the sparse topic proportions for each sequential document without bias. The proposed method is evaluated on both synthetic and real-world datasets against the competitive baselines, demonstrating its superiority over the baselines in terms of the low per-word perplexity, high coherent topics, and better document time prediction.

Download Full-text

The Data-augmentation Techniques in Item Response Modeling: Current Approaches and New Developments

Advances in Psychological Science ◽

10.3724/sp.j.1042.2014.01036 ◽

2014 ◽

Vol 22 (6) ◽

pp. 1036

Author(s):

Wei TIAN ◽

Tao XIN ◽

Chunhua KANG

Keyword(s):

Item Response ◽

Data Augmentation ◽

New Developments ◽

Item Response Modeling ◽

Augmentation Techniques ◽

Response Modeling

Download Full-text

Automatic Labeled Dialogue Generation for Nursing Record Systems

Journal of Personalized Medicine ◽

10.3390/jpm10030062 ◽

2020 ◽

Vol 10 (3) ◽

pp. 62

Author(s):

Tittaya Mairittha ◽

Nattaya Mairittha ◽

Sozo Inoue

Keyword(s):

Data Augmentation ◽

Short Term Memory ◽

Generative Models ◽

Abstract Knowledge ◽

Augmentation Techniques ◽

Nursing Record ◽

Long Short Term Memory ◽

The Individual ◽

High Level ◽

Embedding Methods

The integration of digital voice assistants in nursing residences is becoming increasingly important to facilitate nursing productivity with documentation. A key idea behind this system is training natural language understanding (NLU) modules that enable the machine to classify the purpose of the user utterance (intent) and extract pieces of valuable information present in the utterance (entity). One of the main obstacles when creating robust NLU is the lack of sufficient labeled data, which generally relies on human labeling. This process is cost-intensive and time-consuming, particularly in the high-level nursing care domain, which requires abstract knowledge. In this paper, we propose an automatic dialogue labeling framework of NLU tasks, specifically for nursing record systems. First, we apply data augmentation techniques to create a collection of variant sample utterances. The individual evaluation result strongly shows a stratification rate, with regard to both fluency and accuracy in utterances. We also investigate the possibility of applying deep generative models for our augmented dataset. The preliminary character-based model based on long short-term memory (LSTM) obtains an accuracy of 90% and generates various reasonable texts with BLEU scores of 0.76. Secondly, we introduce an idea for intent and entity labeling by using feature embeddings and semantic similarity-based clustering. We also empirically evaluate different embedding methods for learning good representations that are most suitable to use with our data and clustering tasks. Experimental results show that fastText embeddings produce strong performances both for intent labeling and on entity labeling, which achieves an accuracy level of 0.79 and 0.78 f1-scores and 0.67 and 0.61 silhouette scores, respectively.

Download Full-text

Emotion Recognition on Edge Devices: Training and Deployment

Sensors ◽

10.3390/s21134496 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4496

Author(s):

Vlad Pandelea ◽

Edoardo Ragusa ◽

Tommaso Apicella ◽

Paolo Gastaldo ◽

Erik Cambria

Keyword(s):

Emotion Recognition ◽

Language Processing ◽

Computational Cost ◽

Sequential Learning ◽

High Quality ◽

Fast Training ◽

Online Sequential Learning ◽

And Performance ◽

Resource Constrained Devices ◽

Constrained Devices

Emotion recognition, among other natural language processing tasks, has greatly benefited from the use of large transformer models. Deploying these models on resource-constrained devices, however, is a major challenge due to their computational cost. In this paper, we show that the combination of large transformers, as high-quality feature extractors, and simple hardware-friendly classifiers based on linear separators can achieve competitive performance while allowing real-time inference and fast training. Various solutions including batch and Online Sequential Learning are analyzed. Additionally, our experiments show that latency and performance can be further improved via dimensionality reduction and pre-training, respectively. The resulting system is implemented on two types of edge device, namely an edge accelerator and two smartphones.

Download Full-text

Generative Adversarial Networks to Improve the Robustness of Visual Defect Segmentation by Semantic Networks in Manufacturing Components

Applied Sciences ◽

10.3390/app11146368 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6368

Author(s):

Fátima A. Saiz ◽

Garazi Alfaro ◽

Iñigo Barandiaran ◽

Manuel Graña

Keyword(s):

Ad Hoc ◽

Data Augmentation ◽

Semantic Network ◽

Semantic Networks ◽

Stereo Image ◽

Generative Adversarial Networks ◽

Specific Class ◽

Adversarial Networks ◽

Augmentation Techniques ◽

Image Acquisition System

This paper describes the application of Semantic Networks for the detection of defects in images of metallic manufactured components in a situation where the number of available samples of defects is small, which is rather common in real practical environments. In order to overcome this shortage of data, the common approach is to use conventional data augmentation techniques. We resort to Generative Adversarial Networks (GANs) that have shown the capability to generate highly convincing samples of a specific class as a result of a game between a discriminator and a generator module. Here, we apply the GANs to generate samples of images of metallic manufactured components with specific defects, in order to improve training of Semantic Networks (specifically DeepLabV3+ and Pyramid Attention Network (PAN) networks) carrying out the defect detection and segmentation. Our process carries out the generation of defect images using the StyleGAN2 with the DiffAugment method, followed by a conventional data augmentation over the entire enriched dataset, achieving a large balanced dataset that allows robust training of the Semantic Network. We demonstrate the approach on a private dataset generated for an industrial client, where images are captured by an ad-hoc photometric-stereo image acquisition system, and a public dataset, the Northeastern University surface defect database (NEU). The proposed approach achieves an improvement of 7% and 6% in an intersection over union (IoU) measure of detection performance on each dataset over the conventional data augmentation.

Download Full-text

Data Augmentation Techniques on Arabic Data for Named Entity Recognition

Procedia Computer Science ◽

10.1016/j.procs.2021.05.092 ◽

2021 ◽

Vol 189 ◽

pp. 292-299

Author(s):

Caroline Sabty ◽

Islam Omar ◽

Fady Wasfalla ◽

Mohamed Islam ◽

Slim Abdennadher

Keyword(s):

Data Augmentation ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Augmentation Techniques

Download Full-text

Neural Data Augmentation Techniques for Time Series Data and its Benefits

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla51294.2020.00026 ◽

2020 ◽

Author(s):

Anindya Sarkar ◽

Anirudh Sunder Raj ◽

Raghu Sesha Iyengar

Keyword(s):

Time Series ◽

Data Augmentation ◽

Time Series Data ◽

Series Data ◽

Neural Data ◽

Augmentation Techniques

Download Full-text

A review: preprocessing techniques and data augmentation for sentiment analysis

Computational Social Networks ◽

10.1186/s40649-020-00080-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Huu-Thanh Duong ◽

Tram-Anh Nguyen-Thi

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Supervised Learning ◽

Data Augmentation ◽

Original Data ◽

Training Data ◽

Unseen Data ◽

Augmentation Techniques ◽

User Intervention

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.

Download Full-text

Image Data Augmentation techniques for Deep Learning -A Mirror Review

10.1109/icrito51393.2021.9596262 ◽

2021 ◽

Author(s):

Dipen Saini ◽

Rahul Malik

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Image Data ◽

Augmentation Techniques

Download Full-text

GaitSense: Towards Ubiquitous Gait-Based Human Identification with Wi-Fi

ACM Transactions on Sensor Networks ◽

10.1145/3466638 ◽

2022 ◽

Vol 18 (1) ◽

pp. 1-24

Author(s):

Yi Zhang ◽

Yue Zheng ◽

Guidong Zhang ◽

Kun Qian ◽

Chen Qian ◽

...

Keyword(s):

Data Augmentation ◽

Gait Recognition ◽

Wearable Sensors ◽

Human Identification ◽

Training Data ◽

Identification Accuracy ◽

Identification System ◽

Gait Patterns ◽

Training Samples ◽

Augmentation Techniques

Gait, the walking manner of a person, has been perceived as a physical and behavioral trait for human identification. Compared with cameras and wearable sensors, Wi-Fi-based gait recognition is more attractive because Wi-Fi infrastructure is almost available everywhere and is able to sense passively without the requirement of on-body devices. However, existing Wi-Fi sensing approaches impose strong assumptions of fixed user walking trajectories, sufficient training data, and identification of already known users. In this article, we present GaitSense , a Wi-Fi-based human identification system, to overcome the above unrealistic assumptions. To deal with various walking trajectories and speeds, GaitSense first extracts target specific features that best characterize gait patterns and applies novel normalization algorithms to eliminate gait irrelevant perturbation in signals. On this basis, GaitSense reduces the training efforts in new deployment scenarios by transfer learning and data augmentation techniques. GaitSense also enables a distinct feature of illegal user identification by anomaly detection, making the system readily available for real-world deployment. Our implementation and evaluation with commodity Wi-Fi devices demonstrate a consistent identification accuracy across various deployment scenarios with little training samples, pushing the limit of gait recognition with Wi-Fi signals.

Download Full-text

Heterogeneous Influence Maximization Through Community Detection in Social Networks

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2021100107 ◽

2021 ◽

Vol 12 (4) ◽

pp. 118-131

Author(s):

Jaya Krishna Raguru ◽

Devi Prasad Sharma

Keyword(s):

Community Detection ◽

Greedy Algorithms ◽

Computational Cost ◽

Optimal Solution ◽

Influence Maximization ◽

Centrality Measures ◽

Influence Spread ◽

Real World Datasets ◽

Initial Seed ◽

High Computational Cost

The problem of identifying a seed set composed of K nodes that increase influence spread over a social network is known as influence maximization (IM). Past works showed this problem to be NP-hard and an optimal solution to this problem using greedy algorithms achieved only 63% of spread. However, this approach is expensive and suffered from performance issues like high computational cost. Furthermore, in a network with communities, IM spread is not always certain. In this paper, heterogeneous influence maximization through community detection (HIMCD) algorithm is proposed. This approach addresses initial seed nodes selection in communities using various centrality measures, and these seed nodes act as sources for influence spread. A parallel influence maximization is applied with the aid of seed node set contained in each group. In this approach, graph is partitioned and IM computations are done in a distributed manner. Extensive experiments with two real-world datasets reveals that HCDIM achieves substantial performance improvement over state-of-the-art techniques.

Download Full-text