data generator
Recently Published Documents


TOTAL DOCUMENTS

195
(FIVE YEARS 76)

H-INDEX

13
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Kianoosh Kazemi ◽  
Juho Laitala ◽  
Iman Azimi ◽  
Pasi Liljeberg ◽  
Amir M. Rahmani

<div>Accurate peak determination from noise-corrupted photoplethysmogram (PPG) signal is the basis for further analysis of physiological quantities such as heart rate and heart rate variability. In the past decades, many methods have been proposed to provide reliable peak detection. These peak detection methods include rule-based algorithms, adaptive thresholds, and signal processing techniques. However, they are designed for noise-free PPG signals and are insufficient for PPG signals with low signal-to-noise ratio (SNR). This paper focuses on enhancing PPG noise-resiliency and proposes a robust peak detection algorithm for noise and motion artifact corrupted PPG signals. Our algorithm is based on Convolutional Neural Networks (CNN) with dilated convolutions. Using dilated convolutions provides a large receptive field, making our CNN model robust at time series processing. In this study, we use a dataset collected from wearable devices in health monitoring under free-living conditions. In addition, a data generator is developed for producing noisy PPG data used for training the network. The method performance is compared against other state-of-the-art methods and tested in SNRs ranging from 0 to 45 dB. Our method obtains better accuracy in all the SNRs, compared with the existing adaptive threshold and transform-based methods. The proposed method shows an overall precision, recall, and F1-score 80%, 80%, and 80% in all the SNR ranges. However, these figures for the other methods are below 78%, 77%, and 77%, respectively. The proposed method proves to be accurate for detecting PPG peaks even in the presence of noise.</div>


Technologies ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 94
Author(s):  
Daniel Canedo ◽  
Pedro Fonseca ◽  
Petia Georgieva ◽  
António J. R. Neves

Floor-cleaning robots are becoming increasingly more sophisticated over time and with the addition of digital cameras supported by a robust vision system they become more autonomous, both in terms of their navigation skills but also in their capabilities of analyzing the surrounding environment. This document proposes a vision system based on the YOLOv5 framework for detecting dirty spots on the floor. The purpose of such a vision system is to save energy and resources, since the cleaning system of the robot will be activated only when a dirty spot is detected and the quantity of resources will vary according to the dirty area. In this context, false positives are highly undesirable. On the other hand, false negatives will lead to a poor cleaning performance of the robot. For this reason, a synthetic data generator found in the literature was improved and adapted for this work to tackle the lack of real data in this area. This synthetic data generator allows for large datasets with numerous samples of floors and dirty spots. A novel approach in selecting floor images for the training dataset is proposed. In this approach, the floor is segmented from other objects in the image such that dirty spots are only generated on the floor and do not overlap those objects. This helps the models to distinguish between dirty spots and objects in the image, which reduces the number of false positives. Furthermore, a relevant dataset of the Automation and Control Institute (ACIN) was found to be partially labelled. Consequently, this dataset was annotated from scratch, tripling the number of labelled images and correcting some poor annotations from the original labels. Finally, this document shows the process of generating synthetic data which is used for training YOLOv5 models. These models were tested on a real dataset (ACIN) and the best model attained a mean average precision (mAP) of 0.874 for detecting solid dirt. These results further prove that our proposal is able to use synthetic data for the training step and effectively detect dirt on real data. According to our knowledge, there are no previous works reporting the use of YOLOv5 models in this application.


Author(s):  
Yaser Ismail ◽  
Lei Wan ◽  
Jiayun Chen ◽  
Jianqiao Ye ◽  
Dongmin Yang

AbstractThis paper presents a robust ABAQUS® plug-in called Virtual Data Generator (VDGen) for generating virtual data for identifying the uncertain material properties in unidirectional lamina through artificial neural networks (ANNs). The plug-in supports the 3D finite element models of unit cells with square and hexagonal fibre arrays, uses Latin-Hypercube sampling methods and robustly imposes periodic boundary conditions. Using the data generated from the plug-in, ANN is demonstrated to explicitly and accurately parameterise the relationship between fibre mechanical properties and fibre/matrix interphase parameters at microscale and the mechanical properties of a UD lamina at macroscale. The plug-in tool is applicable to general unidirectional lamina and enables easy establishment of high-fidelity micromechanical finite element models with identified material properties.


2021 ◽  
Author(s):  
Fida Dankar ◽  
Mahmoud K. Ibrahim ◽  
Leila Ismail

BACKGROUND Synthetic datasets are gradually emerging as solutions for fast and inclusive health data sharing. Multiple synthetic data generators have been introduced in the last decade fueled by advancement in machine learning, yet their utility is not well understood. Few recent papers tried to compare the utility of synthetic data generators, each focused on different evaluation metrics and presented conclusions targeted at specific analysis. OBJECTIVE This work aims to understand the overall utility (referred to as quality) of four recent synthetic data generators by identifying multiple criteria for high-utility for synthetic data. METHODS We investigate commonly used utility metrics for masked data evaluation and classify them into criteria/categories depending on the function they attempt to preserve: attribute fidelity, bivariate fidelity, population fidelity, and application fidelity. Then we chose a representative metric from each of the identified categories based on popularity and consistency. The set of metrics together, referred to as quality criteria, are used to evaluate the overall utility of four recent synthetic data generators across 19 datasets of different sizes and feature counts. Moreover, correlations between the identified metrics are investigated in an attempt to streamline synthetic data utility. RESULTS Our results indicate that a non-parametric machine learning synthetic data generator (Synthpop) provides the best utility values across all quality criteria along with the highest stability. It displays the best overall accuracy in supervised machine learning and often agrees with real dataset on the learning model with the highest accuracy. On another front, our results suggest no strong correlation between the different metrics, which implies that all categories/dimensions are required when evaluating the overall utility of synthetic data. CONCLUSIONS The paper used four quality criteria to inform on the synthesizer with the best overall utility. The results are promising with small decreases in accuracy observed from the winning synthesizer when tested with real datasets (in comparison with models trained on real data). Further research into one (overall) quality measure would greatly help data holders in optimizing the utility of the released dataset.


Author(s):  
Qiaokang Liang ◽  
◽  
Qiao Ge ◽  
Wei Sun ◽  
Dan Zhang ◽  
...  

In the food and beverage industry, the existing recognition of code characters on the surface of complex packaging usually suffers from low accuracy and low speed. This work presents an efficient and accurate inkjet code recognition system based on the combination of the deep learning and traditional image processing methods. The proposed system mainly consists of three sequential modules, i.e., the characters region extraction by modified YOLOv3-tiny network, the character processing by the traditional image processing methods such as binarization and the modified character projection segmentation, and the character recognition by a Convolutional recurrent neural network (CRNN) model based on a modified version of MobileNetV3. In this system, only a small amount of tag data has been made and an effective character data generator is designed to randomly generate different experimental data for the CRNN model training. To the best of our knowledge, this report for the first time describes that deep learning has been applied to the recognition of codes on complex background for the real-life industrial application. Experimental results have been provided to verify the accuracy and effectiveness of the proposed model, demonstrating a recognition accuracy of 0.986 and a processing speed of 100 ms per bottle in the end-to-end character recognition system.


2021 ◽  
Author(s):  
Henrique Matheus F. da Silva ◽  
Rafael S. Pereira Silva ◽  
Fábio Porto

The accuracy of machine learning models implementing classification tasks is strongly dependent on the quality of the training dataset. This is a challenge for domains where data is not abundant, such as personalized medicine,or unbalance, as in the case of images of plant species, where some species have very few samples while others offer large number of samples. In both scenarios,the resulting models tend to offer poor performance. In this paper we present two techniques to face this challenge. Firstly, we present a data augmentation method called SAGAD, based on conditional entropy. SAGAD can balance minority classes in conjunction with the increase of the overall size of the trainingset. In our experiments, the application of SAGAD in small data problems with different machine learning algorithms yielded significant improvement in performance. We additionally present an extension of SAGAD for iterative learning algorithms, called DABEL, which generates new samples for each epoch usingan optimization approach that continuously improves the model’s performance. The adoption of SAGAD and DABEL consistently extends the training dataset towards improved target classification performance.


2021 ◽  
Vol 7 (1) ◽  
pp. 13
Author(s):  
Rubén Pérez-Jove ◽  
Roberto R. Expósito ◽  
Juan Touriño

This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.


Geophysics ◽  
2021 ◽  
Vol 86 (6) ◽  
pp. KS151-KS160
Author(s):  
Claire Birnie ◽  
Haithem Jarraya ◽  
Fredrik Hansteen

Deep learning applications are drastically progressing in seismic processing and interpretation tasks. However, most approaches subsample data volumes and restrict model sizes to minimize computational requirements. Subsampling the data risks losing vital spatiotemporal information which could aid training, whereas restricting model sizes can impact model performance, or in some extreme cases renders more complicated tasks such as segmentation impossible. We have determined how to tackle the two main issues of training of large neural networks (NNs): memory limitations and impracticably large training times. Typically, training data are preloaded into memory prior to training, a particular challenge for seismic applications in which the data format is typically four times larger than that used for standard image processing tasks (float32 versus uint8). Based on an example from microseismic monitoring, we evaluate how more than 750 GB of data can be used to train a model by using a data generator approach, which only stores in memory the data required for that training batch. Furthermore, efficient training over large models is illustrated through the training of a seven-layer U-Net with input data dimensions of [Formula: see text] (approximately [Formula: see text] million parameters). Through a batch-splitting distributed training approach, the training times are reduced by a factor of four. The combination of data generators and distributed training removes any necessity of data subsampling or restriction of NN sizes, offering the opportunity to use larger networks, higher resolution input data, or move from 2D to 3D problem spaces.


2021 ◽  
Author(s):  
Kianoosh Kazemi ◽  
Juho Laitala ◽  
Iman Azimi ◽  
Pasi Liljeberg ◽  
Amir M. Rahmani

<div>Accurate peak determination from noise-corrupted photoplethysmogram (PPG) signal is the basis for further analysis of physiological quantities such as heart rate and heart rate variability. In the past decades, many methods have been proposed to provide reliable peak detection. These peak detection methods include rule-based algorithms, adaptive thresholds, and signal processing techniques. However, they are designed for noise-free PPG signals and are insufficient for PPG signals with low signal-to-noise ratio (SNR). This paper focuses on enhancing PPG noise-resiliency and proposes a robust peak detection algorithm for noise and motion artifact corrupted PPG signals. Our algorithm is based on Convolutional Neural Networks (CNN) with dilated convolutions. Using dilated convolutions provides a large receptive field, making our CNN model robust at time series processing. In this study, we use a dataset collected from wearable devices in health monitoring under free-living conditions. In addition, a data generator is developed for producing noisy PPG data used for training the network. The method performance is compared against other state-of-the-art methods and tested in SNRs ranging from 0 to 45 dB. Our method obtains better accuracy in all the SNRs, compared with the existing adaptive threshold and transform-based methods. The proposed method shows an overall precision, recall, and F1-score 80%, 80%, and 80% in all the SNR ranges. However, these figures for the other methods are below 78%, 77%, and 77%, respectively. The proposed method proves to be accurate for detecting PPG peaks even in the presence of noise.</div>


Sign in / Sign up

Export Citation Format

Share Document