DCNN-LSTM Based Audio Classification Combining Multiple Feature Engineering and Data Augmentation Techniques

The integration of digital voice assistants in nursing residences is becoming increasingly important to facilitate nursing productivity with documentation. A key idea behind this system is training natural language understanding (NLU) modules that enable the machine to classify the purpose of the user utterance (intent) and extract pieces of valuable information present in the utterance (entity). One of the main obstacles when creating robust NLU is the lack of sufficient labeled data, which generally relies on human labeling. This process is cost-intensive and time-consuming, particularly in the high-level nursing care domain, which requires abstract knowledge. In this paper, we propose an automatic dialogue labeling framework of NLU tasks, specifically for nursing record systems. First, we apply data augmentation techniques to create a collection of variant sample utterances. The individual evaluation result strongly shows a stratification rate, with regard to both fluency and accuracy in utterances. We also investigate the possibility of applying deep generative models for our augmented dataset. The preliminary character-based model based on long short-term memory (LSTM) obtains an accuracy of 90% and generates various reasonable texts with BLEU scores of 0.76. Secondly, we introduce an idea for intent and entity labeling by using feature embeddings and semantic similarity-based clustering. We also empirically evaluate different embedding methods for learning good representations that are most suitable to use with our data and clustering tasks. Experimental results show that fastText embeddings produce strong performances both for intent labeling and on entity labeling, which achieves an accuracy level of 0.79 and 0.78 f1-scores and 0.67 and 0.61 silhouette scores, respectively.

Download Full-text

Generative Adversarial Networks to Improve the Robustness of Visual Defect Segmentation by Semantic Networks in Manufacturing Components

Applied Sciences ◽

10.3390/app11146368 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6368

Author(s):

Fátima A. Saiz ◽

Garazi Alfaro ◽

Iñigo Barandiaran ◽

Manuel Graña

Keyword(s):

Ad Hoc ◽

Data Augmentation ◽

Semantic Network ◽

Semantic Networks ◽

Stereo Image ◽

Generative Adversarial Networks ◽

Specific Class ◽

Adversarial Networks ◽

Augmentation Techniques ◽

Image Acquisition System

This paper describes the application of Semantic Networks for the detection of defects in images of metallic manufactured components in a situation where the number of available samples of defects is small, which is rather common in real practical environments. In order to overcome this shortage of data, the common approach is to use conventional data augmentation techniques. We resort to Generative Adversarial Networks (GANs) that have shown the capability to generate highly convincing samples of a specific class as a result of a game between a discriminator and a generator module. Here, we apply the GANs to generate samples of images of metallic manufactured components with specific defects, in order to improve training of Semantic Networks (specifically DeepLabV3+ and Pyramid Attention Network (PAN) networks) carrying out the defect detection and segmentation. Our process carries out the generation of defect images using the StyleGAN2 with the DiffAugment method, followed by a conventional data augmentation over the entire enriched dataset, achieving a large balanced dataset that allows robust training of the Semantic Network. We demonstrate the approach on a private dataset generated for an industrial client, where images are captured by an ad-hoc photometric-stereo image acquisition system, and a public dataset, the Northeastern University surface defect database (NEU). The proposed approach achieves an improvement of 7% and 6% in an intersection over union (IoU) measure of detection performance on each dataset over the conventional data augmentation.

Download Full-text

Data Augmentation Techniques on Arabic Data for Named Entity Recognition

Procedia Computer Science ◽

10.1016/j.procs.2021.05.092 ◽

2021 ◽

Vol 189 ◽

pp. 292-299

Author(s):

Caroline Sabty ◽

Islam Omar ◽

Fady Wasfalla ◽

Mohamed Islam ◽

Slim Abdennadher

Keyword(s):

Data Augmentation ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Augmentation Techniques

Download Full-text

Neural Data Augmentation Techniques for Time Series Data and its Benefits

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla51294.2020.00026 ◽

2020 ◽

Author(s):

Anindya Sarkar ◽

Anirudh Sunder Raj ◽

Raghu Sesha Iyengar

Keyword(s):

Time Series ◽

Data Augmentation ◽

Time Series Data ◽

Series Data ◽

Neural Data ◽

Augmentation Techniques

Download Full-text

A review: preprocessing techniques and data augmentation for sentiment analysis

Computational Social Networks ◽

10.1186/s40649-020-00080-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Huu-Thanh Duong ◽

Tram-Anh Nguyen-Thi

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Supervised Learning ◽

Data Augmentation ◽

Original Data ◽

Training Data ◽

Unseen Data ◽

Augmentation Techniques ◽

User Intervention

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.

Download Full-text

Image Data Augmentation techniques for Deep Learning -A Mirror Review

10.1109/icrito51393.2021.9596262 ◽

2021 ◽

Author(s):

Dipen Saini ◽

Rahul Malik

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Image Data ◽

Augmentation Techniques

Download Full-text

GaitSense: Towards Ubiquitous Gait-Based Human Identification with Wi-Fi

ACM Transactions on Sensor Networks ◽

10.1145/3466638 ◽

2022 ◽

Vol 18 (1) ◽

pp. 1-24

Author(s):

Yi Zhang ◽

Yue Zheng ◽

Guidong Zhang ◽

Kun Qian ◽

Chen Qian ◽

...

Keyword(s):

Data Augmentation ◽

Gait Recognition ◽

Wearable Sensors ◽

Human Identification ◽

Training Data ◽

Identification Accuracy ◽

Identification System ◽

Gait Patterns ◽

Training Samples ◽

Augmentation Techniques

Gait, the walking manner of a person, has been perceived as a physical and behavioral trait for human identification. Compared with cameras and wearable sensors, Wi-Fi-based gait recognition is more attractive because Wi-Fi infrastructure is almost available everywhere and is able to sense passively without the requirement of on-body devices. However, existing Wi-Fi sensing approaches impose strong assumptions of fixed user walking trajectories, sufficient training data, and identification of already known users. In this article, we present GaitSense , a Wi-Fi-based human identification system, to overcome the above unrealistic assumptions. To deal with various walking trajectories and speeds, GaitSense first extracts target specific features that best characterize gait patterns and applies novel normalization algorithms to eliminate gait irrelevant perturbation in signals. On this basis, GaitSense reduces the training efforts in new deployment scenarios by transfer learning and data augmentation techniques. GaitSense also enables a distinct feature of illegal user identification by anomaly detection, making the system readily available for real-world deployment. Our implementation and evaluation with commodity Wi-Fi devices demonstrate a consistent identification accuracy across various deployment scenarios with little training samples, pushing the limit of gait recognition with Wi-Fi signals.

Download Full-text

Data Augmentation for Roller Bearing Health Indicator Estimation Using Multi-Channel Frequency Data Representations

10.1115/detc2021-66701 ◽

2021 ◽

Author(s):

Jacob Hendriks ◽

Patrick Dumond

Keyword(s):

Data Augmentation ◽

Roller Bearing ◽

Health Indicators ◽

Remaining Useful Life ◽

Failure Data ◽

Data Representations ◽

Masking Noise ◽

Augmentation Techniques ◽

Added Benefit ◽

Useful Life

Abstract This paper demonstrates various data augmentation techniques that can be used when working with limited run-to-failure data to estimate health indicators related to the remaining useful life of roller bearings. The PRONOSTIA bearing prognosis dataset is used for benchmarking data augmentation techniques. The input to the networks are multi-dimensional frequency representations obtained by combining the spectra taken from two accelerometers. Data augmentation techniques are adapted from other machine learning fields and include adding Gaussian noise, region masking, masking noise, and pitch shifting. Augmented datasets are used in training a conventional CNN architecture comprising two convolutional and pooling layer sequences with batch normalization. Results from individually separating each bearing’s data for the purpose of validation shows that all methods, except pitch shifting, give improved validation accuracy on average. Masking noise and region masking both show the added benefit of dataset regularization by giving results that are more consistent after repeatedly training each configuration with new randomly generated augmented datasets. It is shown that gradually deteriorating bearings and bearings with abrupt failure are not treated significantly differently by the augmentation techniques.

Download Full-text

Lesion Detection in Breast Tomosynthesis Using Efficient Deep Learning and Data Augmentation Techniques

10.3233/faia210150 ◽

2021 ◽

Author(s):

Loay Hassan ◽

Mohamed Abedl-Nasser ◽

Adel Saleh ◽

Domenec Puig

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Breast Lesion ◽

Data Augmentation ◽

Digital Breast Tomosynthesis ◽

Lesion Detection ◽

Detection Methods ◽

Breast Tomosynthesis ◽

Mammographic Images ◽

Augmentation Techniques

Digital breast tomosynthesis (DBT) is one of the powerful breast cancer screening technologies. DBT can improve the ability of radiologists to detect breast cancer, especially in the case of dense breasts, where it beats mammography. Although many automated methods were proposed to detect breast lesions in mammographic images, very few methods were proposed for DBT due to the unavailability of enough annotated DBT images for training object detectors. In this paper, we present fully automated deep-learning breast lesion detection methods. Specifically, we study the effectiveness of two data augmentation techniques (channel replication and channel-concatenation) with five state-of-the-art deep learning detection models. Our preliminary results on a challenging publically available DBT dataset showed that the channel-concatenation data augmentation technique can significantly improve the breast lesion detection results for deep learning-based breast lesion detectors.

Download Full-text

DCNN-LSTM Based Audio Classification Combining Multiple Feature Engineering and Data Augmentation Techniques

The Data-augmentation Techniques in Item Response Modeling: Current Approaches and New Developments

Automatic Labeled Dialogue Generation for Nursing Record Systems

Generative Adversarial Networks to Improve the Robustness of Visual Defect Segmentation by Semantic Networks in Manufacturing Components

Data Augmentation Techniques on Arabic Data for Named Entity Recognition

Neural Data Augmentation Techniques for Time Series Data and its Benefits

A review: preprocessing techniques and data augmentation for sentiment analysis

Image Data Augmentation techniques for Deep Learning -A Mirror Review

GaitSense: Towards Ubiquitous Gait-Based Human Identification with Wi-Fi

Data Augmentation for Roller Bearing Health Indicator Estimation Using Multi-Channel Frequency Data Representations

Lesion Detection in Breast Tomosynthesis Using Efficient Deep Learning and Data Augmentation Techniques

Export Citation Format