scholarly journals Machine Learning Methods and Synthetic Data Generation to Predict Large Wildfires

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3694
Author(s):  
Fernando-Juan Pérez-Porras ◽  
Paula Triviño-Tarradas ◽  
Carmen Cima-Rodríguez ◽  
Jose-Emilio Meroño-de-Larriva ◽  
Alfonso García-Ferrer ◽  
...  

Wildfires are becoming more frequent in different parts of the globe, and the ability to predict when and where they will occur is a complex process. Identifying wildfire events with high probability of becoming a large wildfire is an important task for supporting initial attack planning. Different methods, including those that are physics-based, statistical, and based on machine learning (ML) are used in wildfire analysis. Among the whole, those based on machine learning are relatively novel. In addition, because the number of wildfires is much greater than the number of large wildfires, the dataset to be used in a ML model is imbalanced, resulting in overfitting or underfitting the results. In this manuscript, we propose to generate synthetic data from variables of interest together with ML models for the prediction of large wildfires. Specifically, five synthetic data generation methods have been evaluated, and their results are analyzed with four ML methods. The results yield an improvement in the prediction power when synthetic data are used, offering a new method to be taken into account in Decision Support Systems (DSS) when managing wildfires.

2020 ◽  
Author(s):  
David Meyer

<p>The use of real data for training machine learning (ML) models are often a cause of major limitations. For example, real data may be (a) representative of a subset of situations and domains, (b) expensive to produce, (c) limited to specific individuals due to licensing restrictions. Although the use of synthetic data are becoming increasingly popular in computer vision, ML models used in weather and climate models still rely on the use of large real data datasets. Here we present some recent work towards the generation of synthetic data for weather and climate applications and outline some of the major challenges and limitations encountered.</p>


Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1181 ◽  
Author(s):  
Jessamyn Dahmen ◽  
Diane Cook

Creation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine learning-based synthetic data generation method, to improve upon these limitations. We use this method to generate synthetic time series data that is composed of nested sequences using hidden Markov models and regression models which are initially trained on real datasets. We test our synthetic data generation technique on a real annotated smart home dataset. We use time series distance measures as a baseline to determine how realistic the generated data is compared to real data and demonstrate that SynSys produces more realistic data in terms of distance compared to random data generation, data from another home, and data from another time period. Finally, we apply our synthetic data generation technique to the problem of generating data when only a small amount of ground truth data is available. Using semi-supervised learning we demonstrate that SynSys is able to improve activity recognition accuracy compared to using the small amount of real data alone.


Author(s):  
Simon Fahle ◽  
Thomas Glaser ◽  
Andreas Kneißler ◽  
Bernd Kuhlenkötter

AbstractAs artificial intelligence and especially machine learning gained a lot of attention during the last few years, methods and models have been improving and are becoming easily applicable. This possibility was used to develop a quality prediction system using supervised machine learning methods in form of time series classification models to predict ovality in radial-axial ring rolling. Different preprocessing steps and model implementations have been used to improve quality prediction. A semi-supervised approach is used to improve the prediction and analyze, to what extend it can improve current research in machine learning for quality prediciton. Moreover, first research steps are taken towards a synthetic data generation within the radial-axial ring rolling domain using generative adversarial networks.


2021 ◽  
Vol 11 (5) ◽  
pp. 2158
Author(s):  
Fida K. Dankar ◽  
Mahmoud Ibrahim

Synthetic data provides a privacy protecting mechanism for the broad usage and sharing of healthcare data for secondary purposes. It is considered a safe approach for the sharing of sensitive data as it generates an artificial dataset that contains no identifiable information. Synthetic data is increasing in popularity with multiple synthetic data generators developed in the past decade, yet its utility is still a subject of research. This paper is concerned with evaluating the effect of various synthetic data generation and usage settings on the utility of the generated synthetic data and its derived models. Specifically, we investigate (i) the effect of data pre-processing on the utility of the synthetic data generated, (ii) whether tuning should be applied to the synthetic datasets when generating supervised machine learning models, and (iii) whether sharing preliminary machine learning results can improve the synthetic data models. Lastly, (iv) we investigate whether one utility measure (Propensity score) can predict the accuracy of the machine learning models generated from the synthetic data when employed in real life. We use two popular measures of synthetic data utility, propensity score and classification accuracy, to compare the different settings. We adopt a recent mechanism for the calculation of propensity, which looks carefully into the choice of model for the propensity score calculation. Accordingly, this paper takes a new direction with investigating the effect of various data generation and usage settings on the quality of the generated data and its ensuing models. The goal is to inform on the best strategies to follow when generating and using synthetic data.


2021 ◽  
Vol 171 ◽  
pp. 112578
Author(s):  
Niharika Dalsania ◽  
Zeel Patel ◽  
Shishir Purohit ◽  
Bhaskar Chaudhury

2007 ◽  
Author(s):  
Marek K. Jakubowski ◽  
David Pogorzala ◽  
Timothy J. Hattenberger ◽  
Scott D. Brown ◽  
John R. Schott

Sign in / Sign up

Export Citation Format

Share Document