scholarly journals SynSys: A Synthetic Data Generation System for Healthcare Applications

Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1181 ◽  
Author(s):  
Jessamyn Dahmen ◽  
Diane Cook

Creation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine learning-based synthetic data generation method, to improve upon these limitations. We use this method to generate synthetic time series data that is composed of nested sequences using hidden Markov models and regression models which are initially trained on real datasets. We test our synthetic data generation technique on a real annotated smart home dataset. We use time series distance measures as a baseline to determine how realistic the generated data is compared to real data and demonstrate that SynSys produces more realistic data in terms of distance compared to random data generation, data from another home, and data from another time period. Finally, we apply our synthetic data generation technique to the problem of generating data when only a small amount of ground truth data is available. Using semi-supervised learning we demonstrate that SynSys is able to improve activity recognition accuracy compared to using the small amount of real data alone.

2020 ◽  
Author(s):  
David Meyer

<p>The use of real data for training machine learning (ML) models are often a cause of major limitations. For example, real data may be (a) representative of a subset of situations and domains, (b) expensive to produce, (c) limited to specific individuals due to licensing restrictions. Although the use of synthetic data are becoming increasingly popular in computer vision, ML models used in weather and climate models still rely on the use of large real data datasets. Here we present some recent work towards the generation of synthetic data for weather and climate applications and outline some of the major challenges and limitations encountered.</p>


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3694
Author(s):  
Fernando-Juan Pérez-Porras ◽  
Paula Triviño-Tarradas ◽  
Carmen Cima-Rodríguez ◽  
Jose-Emilio Meroño-de-Larriva ◽  
Alfonso García-Ferrer ◽  
...  

Wildfires are becoming more frequent in different parts of the globe, and the ability to predict when and where they will occur is a complex process. Identifying wildfire events with high probability of becoming a large wildfire is an important task for supporting initial attack planning. Different methods, including those that are physics-based, statistical, and based on machine learning (ML) are used in wildfire analysis. Among the whole, those based on machine learning are relatively novel. In addition, because the number of wildfires is much greater than the number of large wildfires, the dataset to be used in a ML model is imbalanced, resulting in overfitting or underfitting the results. In this manuscript, we propose to generate synthetic data from variables of interest together with ML models for the prediction of large wildfires. Specifically, five synthetic data generation methods have been evaluated, and their results are analyzed with four ML methods. The results yield an improvement in the prediction power when synthetic data are used, offering a new method to be taken into account in Decision Support Systems (DSS) when managing wildfires.


2018 ◽  
Author(s):  
Elijah Bogart ◽  
Richard Creswell ◽  
Georg K. Gerber

AbstractLongitudinal studies are crucial for discovering casual relationships between the microbiome and human disease. We present Microbiome Interpretable Temporal Rule Engine (MITRE), the first machine learning method specifically designed for predicting host status from microbiome time-series data. Our method maintains interpretability by learning predictive rules over automatically inferred time-periods and phylogenetically related microbes. We validate MITRE’s performance on semi-synthetic data, and five real datasets measuring microbiome composition over time in infant and adult cohorts. Our results demonstrate that MITRE performs on par or outperforms “black box” machine learning approaches, providing a powerful new tool enabling discovery of biologically interpretable relationships between microbiome and human host.


Symmetry ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1176
Author(s):  
Aleksei Boikov ◽  
Vladimir Payor ◽  
Roman Savelev ◽  
Alexandr Kolesnikov

The paper presents a methodology for training neural networks for vision tasks on synthesized data on the example of steel defect recognition in automated production control systems. The article describes the process of dataset procedural generation of steel slab defects with a symmetrical distribution. The results of training two neural networks Unet and Xception on a generated data grid and testing them on real data are presented. The performance of these neural networks was assessed using real data from the Severstal: Steel Defect Detection set. In both cases, the neural networks showed good results in the classification and segmentation of surface defects of steel workpieces in the image. Dice score on synthetic data reaches 0.62, and accuracy—0.81.


2021 ◽  
Vol 3 ◽  
Author(s):  
Peter Goodin ◽  
Andrew J. Gardner ◽  
Nasim Dokani ◽  
Ben Nizette ◽  
Saeed Ahmadizadeh ◽  
...  

Background: Exposure to thousands of head and body impacts during a career in contact and collision sports may contribute to current or later life issues related to brain health. Wearable technology enables the measurement of impact exposure. The validation of impact detection is required for accurate exposure monitoring. In this study, we present a method of automatic identification (classification) of head and body impacts using an instrumented mouthguard, video-verified impacts, and machine-learning algorithms.Methods: Time series data were collected via the Nexus A9 mouthguard from 60 elite level men (mean age = 26.33; SD = 3.79) and four women (mean age = 25.50; SD = 5.91) from the Australian Rules Football players from eight clubs, participating in 119 games during the 2020 season. Ground truth data labeling on the captures used in this machine learning study was performed through the analysis of game footage by two expert video reviewers using SportCode and Catapult Vision. The visual labeling process occurred independently of the mouthguard time series data. True positive captures (captures where the reviewer directly observed contact between the mouthguard wearer and another player, the ball, or the ground) were defined as hits. Spectral and convolutional kernel based features were extracted from time series data. Performances of untuned classification algorithms from scikit-learn in addition to XGBoost were assessed to select the best performing baseline method for tuning.Results: Based on performance, XGBoost was selected as the classifier algorithm for tuning. A total of 13,712 video verified captures were collected and used to train and validate the classifier. True positive detection ranged from 94.67% in the Test set to 100% in the hold out set. True negatives ranged from 95.65 to 96.83% in the test and rest sets, respectively.Discussion and conclusion: This study suggests the potential for high performing impact classification models to be used for Australian Rules Football and highlights the importance of frequencies <150 Hz for the identification of these impacts.


Author(s):  
V. V. Danilov ◽  
O. M. Gerget ◽  
D. Y. Kolpashchikov ◽  
N. V. Laptev ◽  
R. A. Manakov ◽  
...  

Abstract. In the era of data-driven machine learning algorithms, data represents a new oil. The application of machine learning algorithms shows they need large heterogeneous datasets that crucially are correctly labeled. However, data collection and its labeling are time-consuming and labor-intensive processes. A particular task we solve using machine learning is related to the segmentation of medical devices in echocardiographic images during minimally invasive surgery. However, the lack of data motivated us to develop an algorithm generating synthetic samples based on real datasets. The concept of this algorithm is to place a medical device (catheter) in an empty cavity of an anatomical structure, for example, in a heart chamber, and then transform it. To create random transformations of the catheter, the algorithm uses a coordinate system that uniquely identifies each point regardless of the bend and the shape of the object. It is proposed to take a cylindrical coordinate system as a basis, modifying it by replacing the Z-axis with a spline along which the h-coordinate is measured. Having used the proposed algorithm, we generated new images with the catheter inserted into different heart cavities while varying its location and shape. Afterward, we compared the results of deep neural networks trained on the datasets comprised of real and synthetic data. The network trained on both real and synthetic datasets performed more accurate segmentation than the model trained only on real data. For instance, modified U-net trained on combined datasets performed segmentation with the Dice similarity coefficient of 92.6±2.2%, while the same model trained only on real samples achieved the level of 86.5±3.6%. Using a synthetic dataset allowed decreasing the accuracy spread and improving the generalization of the model. It is worth noting that the proposed algorithm allows reducing subjectivity, minimizing the labeling routine, increasing the number of samples, and improving the heterogeneity.


Sign in / Sign up

Export Citation Format

Share Document