data generator Latest Research Papers

<div>Accurate peak determination from noise-corrupted photoplethysmogram (PPG) signal is the basis for further analysis of physiological quantities such as heart rate and heart rate variability. In the past decades, many methods have been proposed to provide reliable peak detection. These peak detection methods include rule-based algorithms, adaptive thresholds, and signal processing techniques. However, they are designed for noise-free PPG signals and are insufficient for PPG signals with low signal-to-noise ratio (SNR). This paper focuses on enhancing PPG noise-resiliency and proposes a robust peak detection algorithm for noise and motion artifact corrupted PPG signals. Our algorithm is based on Convolutional Neural Networks (CNN) with dilated convolutions. Using dilated convolutions provides a large receptive field, making our CNN model robust at time series processing. In this study, we use a dataset collected from wearable devices in health monitoring under free-living conditions. In addition, a data generator is developed for producing noisy PPG data used for training the network. The method performance is compared against other state-of-the-art methods and tested in SNRs ranging from 0 to 45 dB. Our method obtains better accuracy in all the SNRs, compared with the existing adaptive threshold and transform-based methods. The proposed method shows an overall precision, recall, and F1-score 80%, 80%, and 80% in all the SNR ranges. However, these figures for the other methods are below 78%, 77%, and 77%, respectively. The proposed method proves to be accurate for detecting PPG peaks even in the presence of noise.</div>

Download Full-text

A Deep Learning-Based Dirt Detection Computer Vision System for Floor-Cleaning Robots with Improved Data Collection

Technologies ◽

10.3390/technologies9040094 ◽

2021 ◽

Vol 9 (4) ◽

pp. 94

Author(s):

Daniel Canedo ◽

Pedro Fonseca ◽

Petia Georgieva ◽

António J. R. Neves

Keyword(s):

Vision System ◽

Synthetic Data ◽

Real Data ◽

False Positives ◽

Training Dataset ◽

Novel Approach ◽

Data Generator ◽

Automation And Control ◽

Cleaning Robots ◽

Save Energy

Floor-cleaning robots are becoming increasingly more sophisticated over time and with the addition of digital cameras supported by a robust vision system they become more autonomous, both in terms of their navigation skills but also in their capabilities of analyzing the surrounding environment. This document proposes a vision system based on the YOLOv5 framework for detecting dirty spots on the floor. The purpose of such a vision system is to save energy and resources, since the cleaning system of the robot will be activated only when a dirty spot is detected and the quantity of resources will vary according to the dirty area. In this context, false positives are highly undesirable. On the other hand, false negatives will lead to a poor cleaning performance of the robot. For this reason, a synthetic data generator found in the literature was improved and adapted for this work to tackle the lack of real data in this area. This synthetic data generator allows for large datasets with numerous samples of floors and dirty spots. A novel approach in selecting floor images for the training dataset is proposed. In this approach, the floor is segmented from other objects in the image such that dirty spots are only generated on the floor and do not overlap those objects. This helps the models to distinguish between dirty spots and objects in the image, which reduces the number of false positives. Furthermore, a relevant dataset of the Automation and Control Institute (ACIN) was found to be partially labelled. Consequently, this dataset was annotated from scratch, tripling the number of labelled images and correcting some poor annotations from the original labels. Finally, this document shows the process of generating synthetic data which is used for training YOLOv5 models. These models were tested on a real dataset (ACIN) and the best model attained a mean average precision (mAP) of 0.874 for detecting solid dirt. These results further prove that our proposal is able to use synthetic data for the training step and effectively detect dirt on real data. According to our knowledge, there are no previous works reporting the use of YOLOv5 models in this application.

Download Full-text

An ABAQUS® plug-in for generating virtual data required for inverse analysis of unidirectional composites using artificial neural networks

Engineering With Computers ◽

10.1007/s00366-021-01525-1 ◽

2021 ◽

Author(s):

Yaser Ismail ◽

Lei Wan ◽

Jiayun Chen ◽

Jianqiao Ye ◽

Dongmin Yang

Keyword(s):

Mechanical Properties ◽

Neural Networks ◽

Finite Element ◽

Artificial Neural Networks ◽

Material Properties ◽

Latin Hypercube Sampling ◽

Finite Element Models ◽

Data Generator ◽

Artificial Neural ◽

Virtual Data

AbstractThis paper presents a robust ABAQUS® plug-in called Virtual Data Generator (VDGen) for generating virtual data for identifying the uncertain material properties in unidirectional lamina through artificial neural networks (ANNs). The plug-in supports the 3D finite element models of unit cells with square and hexagonal fibre arrays, uses Latin-Hypercube sampling methods and robustly imposes periodic boundary conditions. Using the data generated from the plug-in, ANN is demonstrated to explicitly and accurately parameterise the relationship between fibre mechanical properties and fibre/matrix interphase parameters at microscale and the mechanical properties of a UD lamina at macroscale. The plug-in tool is applicable to general unidirectional lamina and enables easy establishment of high-fidelity micromechanical finite element models with identified material properties.

Download Full-text

RPA and L-System Based Synthetic Data Generator for Cost-efficient Deep Learning Model Training

10.1109/ecice52819.2021.9645719 ◽

2021 ◽

Author(s):

Erick Fiestas S. ◽

Oscar E. Ramos ◽

Sixto Prado G.

Keyword(s):

Deep Learning ◽

Synthetic Data ◽

Learning Model ◽

L System ◽

Data Generator ◽

Model Training ◽

Cost Efficient ◽

Deep Learning Model

Download Full-text

A multi-dimensional quality comparison of synthetic data generators (Preprint)

10.2196/preprints.34269 ◽

2021 ◽

Author(s):

Fida Dankar ◽

Mahmoud K. Ibrahim ◽

Leila Ismail

Keyword(s):

Machine Learning ◽

Synthetic Data ◽

Quality Criteria ◽

Real Data ◽

Supervised Machine Learning ◽

Specific Analysis ◽

Data Generator ◽

High Utility ◽

Synthetic Datasets ◽

Utility Metrics

BACKGROUND Synthetic datasets are gradually emerging as solutions for fast and inclusive health data sharing. Multiple synthetic data generators have been introduced in the last decade fueled by advancement in machine learning, yet their utility is not well understood. Few recent papers tried to compare the utility of synthetic data generators, each focused on different evaluation metrics and presented conclusions targeted at specific analysis. OBJECTIVE This work aims to understand the overall utility (referred to as quality) of four recent synthetic data generators by identifying multiple criteria for high-utility for synthetic data. METHODS We investigate commonly used utility metrics for masked data evaluation and classify them into criteria/categories depending on the function they attempt to preserve: attribute fidelity, bivariate fidelity, population fidelity, and application fidelity. Then we chose a representative metric from each of the identified categories based on popularity and consistency. The set of metrics together, referred to as quality criteria, are used to evaluate the overall utility of four recent synthetic data generators across 19 datasets of different sizes and feature counts. Moreover, correlations between the identified metrics are investigated in an attempt to streamline synthetic data utility. RESULTS Our results indicate that a non-parametric machine learning synthetic data generator (Synthpop) provides the best utility values across all quality criteria along with the highest stability. It displays the best overall accuracy in supervised machine learning and often agrees with real dataset on the learning model with the highest accuracy. On another front, our results suggest no strong correlation between the different metrics, which implies that all categories/dimensions are required when evaluating the overall utility of synthetic data. CONCLUSIONS The paper used four quality criteria to inform on the synthesizer with the best overall utility. The results are promising with small decreases in accuracy observed from the winning synthesizer when tested with real datasets (in comparison with models trained on real data). Further research into one (overall) quality measure would greatly help data holders in optimizing the utility of the released dataset.

Download Full-text

Fast and Accurate Recognition for Codes on Complex Backgrounds for Real-Life Industrial Applications

Journal of Engineering Research ◽

10.36909/jer.10603 ◽

2021 ◽

Author(s):

Qiaokang Liang ◽

◽

Qiao Ge ◽

Wei Sun ◽

Dan Zhang ◽

...

Keyword(s):

Image Processing ◽

Deep Learning ◽

Character Recognition ◽

Real Life ◽

Recognition System ◽

Industrial Applications ◽

Processing Methods ◽

Data Generator ◽

Model Training ◽

Traditional Image

In the food and beverage industry, the existing recognition of code characters on the surface of complex packaging usually suffers from low accuracy and low speed. This work presents an efficient and accurate inkjet code recognition system based on the combination of the deep learning and traditional image processing methods. The proposed system mainly consists of three sequential modules, i.e., the characters region extraction by modified YOLOv3-tiny network, the character processing by the traditional image processing methods such as binarization and the modified character projection segmentation, and the character recognition by a Convolutional recurrent neural network (CRNN) model based on a modified version of MobileNetV3. In this system, only a small amount of tag data has been made and an effective character data generator is designed to randomly generate different experimental data for the CRNN model training. To the best of our knowledge, this report for the first time describes that deep learning has been applied to the recognition of codes on complex background for the real-life industrial application. Experimental results have been provided to verify the accuracy and effectiveness of the proposed model, demonstrating a recognition accuracy of 0.986 and a processing speed of 100 ms per bottle in the end-to-end character recognition system.

Download Full-text

SAGAD: Synthetic Data Generator for Tabular Datasets

10.5753/sbbd.2021.17861 ◽

2021 ◽

Author(s):

Henrique Matheus F. da Silva ◽

Rafael S. Pereira Silva ◽

Fábio Porto

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Learning Algorithms ◽

Conditional Entropy ◽

Poor Performance ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Small Data ◽

Optimization Approach ◽

Data Generator

The accuracy of machine learning models implementing classification tasks is strongly dependent on the quality of the training dataset. This is a challenge for domains where data is not abundant, such as personalized medicine,or unbalance, as in the case of images of plant species, where some species have very few samples while others offer large number of samples. In both scenarios,the resulting models tend to offer poor performance. In this paper we present two techniques to face this challenge. Firstly, we present a data augmentation method called SAGAD, based on conditional entropy. SAGAD can balance minority classes in conjunction with the increase of the overall size of the trainingset. In our experiments, the application of SAGAD in small data problems with different machine learning algorithms yielded significant improvement in performance. We additionally present an extension of SAGAD for iterative learning algorithms, called DABEL, which generates new samples for each epoch usingan optimization approach that continuously improves the model’s performance. The adoption of SAGAD and DABEL consistently extends the training dataset towards improved target classification performance.

Download Full-text

RGen: Data Generator for Benchmarking Big Data Workloads

Engineering Proceedings ◽

10.3390/engproc2021007013 ◽

2021 ◽

Vol 7 (1) ◽

pp. 13

Author(s):

Rubén Pérez-Jove ◽

Roberto R. Expósito ◽

Juan Touriño

Keyword(s):

Big Data ◽

Experimental Evaluation ◽

The Other ◽

Text Generation ◽

Other Hand ◽

Parallel Data ◽

Data Generator ◽

The One ◽

Graph Generation

This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.

Download Full-text

An introduction to distributed training of deep neural networks for segmentation tasks with large seismic data sets

Geophysics ◽

10.1190/geo2021-0130.1 ◽

2021 ◽

Vol 86 (6) ◽

pp. KS151-KS160

Author(s):

Claire Birnie ◽

Haithem Jarraya ◽

Fredrik Hansteen

Keyword(s):

Neural Networks ◽

Input Data ◽

Model Performance ◽

Microseismic Monitoring ◽

Training Data ◽

Data Sets ◽

Distributed Training ◽

Spatiotemporal Information ◽

Training Approach ◽

Data Generator

Deep learning applications are drastically progressing in seismic processing and interpretation tasks. However, most approaches subsample data volumes and restrict model sizes to minimize computational requirements. Subsampling the data risks losing vital spatiotemporal information which could aid training, whereas restricting model sizes can impact model performance, or in some extreme cases renders more complicated tasks such as segmentation impossible. We have determined how to tackle the two main issues of training of large neural networks (NNs): memory limitations and impracticably large training times. Typically, training data are preloaded into memory prior to training, a particular challenge for seismic applications in which the data format is typically four times larger than that used for standard image processing tasks (float32 versus uint8). Based on an example from microseismic monitoring, we evaluate how more than 750 GB of data can be used to train a model by using a data generator approach, which only stores in memory the data required for that training batch. Furthermore, efficient training over large models is illustrated through the training of a seven-layer U-Net with input data dimensions of [Formula: see text] (approximately [Formula: see text] million parameters). Through a batch-splitting distributed training approach, the training times are reduced by a factor of four. The combination of data generators and distributed training removes any necessity of data subsampling or restriction of NN sizes, offering the opportunity to use larger networks, higher resolution input data, or move from 2D to 3D problem spaces.

Download Full-text

Robust PPG Peak Detection Using Dilated Convolutional Neural Networks

10.36227/techrxiv.16529310.v2 ◽

2021 ◽

Author(s):

Kianoosh Kazemi ◽

Juho Laitala ◽

Iman Azimi ◽

Pasi Liljeberg ◽

Amir M. Rahmani

Keyword(s):

Neural Networks ◽

Heart Rate ◽

Convolutional Neural Networks ◽

Signal To Noise Ratio ◽

Motion Artifact ◽

Detection Algorithm ◽

Peak Detection ◽

Detection Methods ◽

Method Performance ◽

Data Generator

<div>Accurate peak determination from noise-corrupted photoplethysmogram (PPG) signal is the basis for further analysis of physiological quantities such as heart rate and heart rate variability. In the past decades, many methods have been proposed to provide reliable peak detection. These peak detection methods include rule-based algorithms, adaptive thresholds, and signal processing techniques. However, they are designed for noise-free PPG signals and are insufficient for PPG signals with low signal-to-noise ratio (SNR). This paper focuses on enhancing PPG noise-resiliency and proposes a robust peak detection algorithm for noise and motion artifact corrupted PPG signals. Our algorithm is based on Convolutional Neural Networks (CNN) with dilated convolutions. Using dilated convolutions provides a large receptive field, making our CNN model robust at time series processing. In this study, we use a dataset collected from wearable devices in health monitoring under free-living conditions. In addition, a data generator is developed for producing noisy PPG data used for training the network. The method performance is compared against other state-of-the-art methods and tested in SNRs ranging from 0 to 45 dB. Our method obtains better accuracy in all the SNRs, compared with the existing adaptive threshold and transform-based methods. The proposed method shows an overall precision, recall, and F1-score 80%, 80%, and 80% in all the SNR ranges. However, these figures for the other methods are below 78%, 77%, and 77%, respectively. The proposed method proves to be accurate for detecting PPG peaks even in the presence of noise.</div>

Download Full-text

data generator
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Robust PPG Peak Detection Using Dilated Convolutional Neural Networks

A Deep Learning-Based Dirt Detection Computer Vision System for Floor-Cleaning Robots with Improved Data Collection

An ABAQUS® plug-in for generating virtual data required for inverse analysis of unidirectional composites using artificial neural networks

RPA and L-System Based Synthetic Data Generator for Cost-efficient Deep Learning Model Training

A multi-dimensional quality comparison of synthetic data generators (Preprint)

Fast and Accurate Recognition for Codes on Complex Backgrounds for Real-Life Industrial Applications

SAGAD: Synthetic Data Generator for Tabular Datasets

RGen: Data Generator for Benchmarking Big Data Workloads

An introduction to distributed training of deep neural networks for segmentation tasks with large seismic data sets

Robust PPG Peak Detection Using Dilated Convolutional Neural Networks

Export Citation Format

data generatorRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Robust PPG Peak Detection Using Dilated Convolutional Neural Networks

A Deep Learning-Based Dirt Detection Computer Vision System for Floor-Cleaning Robots with Improved Data Collection

An ABAQUS® plug-in for generating virtual data required for inverse analysis of unidirectional composites using artificial neural networks

RPA and L-System Based Synthetic Data Generator for Cost-efficient Deep Learning Model Training

A multi-dimensional quality comparison of synthetic data generators (Preprint)

Fast and Accurate Recognition for Codes on Complex Backgrounds for Real-Life Industrial Applications

SAGAD: Synthetic Data Generator for Tabular Datasets

RGen: Data Generator for Benchmarking Big Data Workloads

An introduction to distributed training of deep neural networks for segmentation tasks with large seismic data sets

Robust PPG Peak Detection Using Dilated Convolutional Neural Networks

data generator
Recently Published Documents