scholarly journals Recurrent Coupled Topic Modeling over Sequential Documents

2021 ◽  
Vol 16 (1) ◽  
pp. 1-32
Author(s):  
Jinjin Guo ◽  
Longbing Cao ◽  
Zhiguo Gong

The abundant sequential documents such as online archival, social media, and news feeds are streamingly updated, where each chunk of documents is incorporated with smoothly evolving yet dependent topics. Such digital texts have attracted extensive research on dynamic topic modeling to infer hidden evolving topics and their temporal dependencies. However, most of the existing approaches focus on single-topic-thread evolution and ignore the fact that a current topic may be coupled with multiple relevant prior topics. In addition, these approaches also incur the intractable inference problem when inferring latent parameters, resulting in a high computational cost and performance degradation. In this work, we assume that a current topic evolves from all prior topics with corresponding coupling weights, forming the multi-topic-thread evolution . Our method models the dependencies between evolving topics and thoroughly encodes their complex multi-couplings across time steps. To conquer the intractable inference challenge, a new solution with a set of novel data augmentation techniques is proposed, which successfully discomposes the multi-couplings between evolving topics. A fully conjugate model is thus obtained to guarantee the effectiveness and efficiency of the inference technique. A novel Gibbs sampler with a backward–forward filter algorithm efficiently learns latent time-evolving parameters in a closed-form. In addition, the latent Indian Buffet Process compound distribution is exploited to automatically infer the overall topic number and customize the sparse topic proportions for each sequential document without bias. The proposed method is evaluated on both synthetic and real-world datasets against the competitive baselines, demonstrating its superiority over the baselines in terms of the low per-word perplexity, high coherent topics, and better document time prediction.

2020 ◽  
Vol 10 (3) ◽  
pp. 62
Author(s):  
Tittaya Mairittha ◽  
Nattaya Mairittha ◽  
Sozo Inoue

The integration of digital voice assistants in nursing residences is becoming increasingly important to facilitate nursing productivity with documentation. A key idea behind this system is training natural language understanding (NLU) modules that enable the machine to classify the purpose of the user utterance (intent) and extract pieces of valuable information present in the utterance (entity). One of the main obstacles when creating robust NLU is the lack of sufficient labeled data, which generally relies on human labeling. This process is cost-intensive and time-consuming, particularly in the high-level nursing care domain, which requires abstract knowledge. In this paper, we propose an automatic dialogue labeling framework of NLU tasks, specifically for nursing record systems. First, we apply data augmentation techniques to create a collection of variant sample utterances. The individual evaluation result strongly shows a stratification rate, with regard to both fluency and accuracy in utterances. We also investigate the possibility of applying deep generative models for our augmented dataset. The preliminary character-based model based on long short-term memory (LSTM) obtains an accuracy of 90% and generates various reasonable texts with BLEU scores of 0.76. Secondly, we introduce an idea for intent and entity labeling by using feature embeddings and semantic similarity-based clustering. We also empirically evaluate different embedding methods for learning good representations that are most suitable to use with our data and clustering tasks. Experimental results show that fastText embeddings produce strong performances both for intent labeling and on entity labeling, which achieves an accuracy level of 0.79 and 0.78 f1-scores and 0.67 and 0.61 silhouette scores, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4496
Author(s):  
Vlad Pandelea ◽  
Edoardo Ragusa ◽  
Tommaso Apicella ◽  
Paolo Gastaldo ◽  
Erik Cambria

Emotion recognition, among other natural language processing tasks, has greatly benefited from the use of large transformer models. Deploying these models on resource-constrained devices, however, is a major challenge due to their computational cost. In this paper, we show that the combination of large transformers, as high-quality feature extractors, and simple hardware-friendly classifiers based on linear separators can achieve competitive performance while allowing real-time inference and fast training. Various solutions including batch and Online Sequential Learning are analyzed. Additionally, our experiments show that latency and performance can be further improved via dimensionality reduction and pre-training, respectively. The resulting system is implemented on two types of edge device, namely an edge accelerator and two smartphones.


2021 ◽  
Vol 11 (14) ◽  
pp. 6368
Author(s):  
Fátima A. Saiz ◽  
Garazi Alfaro ◽  
Iñigo Barandiaran ◽  
Manuel Graña

This paper describes the application of Semantic Networks for the detection of defects in images of metallic manufactured components in a situation where the number of available samples of defects is small, which is rather common in real practical environments. In order to overcome this shortage of data, the common approach is to use conventional data augmentation techniques. We resort to Generative Adversarial Networks (GANs) that have shown the capability to generate highly convincing samples of a specific class as a result of a game between a discriminator and a generator module. Here, we apply the GANs to generate samples of images of metallic manufactured components with specific defects, in order to improve training of Semantic Networks (specifically DeepLabV3+ and Pyramid Attention Network (PAN) networks) carrying out the defect detection and segmentation. Our process carries out the generation of defect images using the StyleGAN2 with the DiffAugment method, followed by a conventional data augmentation over the entire enriched dataset, achieving a large balanced dataset that allows robust training of the Semantic Network. We demonstrate the approach on a private dataset generated for an industrial client, where images are captured by an ad-hoc photometric-stereo image acquisition system, and a public dataset, the Northeastern University surface defect database (NEU). The proposed approach achieves an improvement of 7% and 6% in an intersection over union (IoU) measure of detection performance on each dataset over the conventional data augmentation.


2021 ◽  
Vol 189 ◽  
pp. 292-299
Author(s):  
Caroline Sabty ◽  
Islam Omar ◽  
Fady Wasfalla ◽  
Mohamed Islam ◽  
Slim Abdennadher

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Huu-Thanh Duong ◽  
Tram-Anh Nguyen-Thi

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.


2022 ◽  
Vol 18 (1) ◽  
pp. 1-24
Author(s):  
Yi Zhang ◽  
Yue Zheng ◽  
Guidong Zhang ◽  
Kun Qian ◽  
Chen Qian ◽  
...  

Gait, the walking manner of a person, has been perceived as a physical and behavioral trait for human identification. Compared with cameras and wearable sensors, Wi-Fi-based gait recognition is more attractive because Wi-Fi infrastructure is almost available everywhere and is able to sense passively without the requirement of on-body devices. However, existing Wi-Fi sensing approaches impose strong assumptions of fixed user walking trajectories, sufficient training data, and identification of already known users. In this article, we present GaitSense , a Wi-Fi-based human identification system, to overcome the above unrealistic assumptions. To deal with various walking trajectories and speeds, GaitSense first extracts target specific features that best characterize gait patterns and applies novel normalization algorithms to eliminate gait irrelevant perturbation in signals. On this basis, GaitSense reduces the training efforts in new deployment scenarios by transfer learning and data augmentation techniques. GaitSense also enables a distinct feature of illegal user identification by anomaly detection, making the system readily available for real-world deployment. Our implementation and evaluation with commodity Wi-Fi devices demonstrate a consistent identification accuracy across various deployment scenarios with little training samples, pushing the limit of gait recognition with Wi-Fi signals.


2021 ◽  
Vol 12 (4) ◽  
pp. 118-131
Author(s):  
Jaya Krishna Raguru ◽  
Devi Prasad Sharma

The problem of identifying a seed set composed of K nodes that increase influence spread over a social network is known as influence maximization (IM). Past works showed this problem to be NP-hard and an optimal solution to this problem using greedy algorithms achieved only 63% of spread. However, this approach is expensive and suffered from performance issues like high computational cost. Furthermore, in a network with communities, IM spread is not always certain. In this paper, heterogeneous influence maximization through community detection (HIMCD) algorithm is proposed. This approach addresses initial seed nodes selection in communities using various centrality measures, and these seed nodes act as sources for influence spread. A parallel influence maximization is applied with the aid of seed node set contained in each group. In this approach, graph is partitioned and IM computations are done in a distributed manner. Extensive experiments with two real-world datasets reveals that HCDIM achieves substantial performance improvement over state-of-the-art techniques.


Sign in / Sign up

Export Citation Format

Share Document