scholarly journals BOOSTING SEGMENTATION ACCURACY OF THE DEEP LEARNING MODELS BASED ON THE SYNTHETIC DATA GENERATION

Author(s):  
V. V. Danilov ◽  
O. M. Gerget ◽  
D. Y. Kolpashchikov ◽  
N. V. Laptev ◽  
R. A. Manakov ◽  
...  

Abstract. In the era of data-driven machine learning algorithms, data represents a new oil. The application of machine learning algorithms shows they need large heterogeneous datasets that crucially are correctly labeled. However, data collection and its labeling are time-consuming and labor-intensive processes. A particular task we solve using machine learning is related to the segmentation of medical devices in echocardiographic images during minimally invasive surgery. However, the lack of data motivated us to develop an algorithm generating synthetic samples based on real datasets. The concept of this algorithm is to place a medical device (catheter) in an empty cavity of an anatomical structure, for example, in a heart chamber, and then transform it. To create random transformations of the catheter, the algorithm uses a coordinate system that uniquely identifies each point regardless of the bend and the shape of the object. It is proposed to take a cylindrical coordinate system as a basis, modifying it by replacing the Z-axis with a spline along which the h-coordinate is measured. Having used the proposed algorithm, we generated new images with the catheter inserted into different heart cavities while varying its location and shape. Afterward, we compared the results of deep neural networks trained on the datasets comprised of real and synthetic data. The network trained on both real and synthetic datasets performed more accurate segmentation than the model trained only on real data. For instance, modified U-net trained on combined datasets performed segmentation with the Dice similarity coefficient of 92.6±2.2%, while the same model trained only on real samples achieved the level of 86.5±3.6%. Using a synthetic dataset allowed decreasing the accuracy spread and improving the generalization of the model. It is worth noting that the proposed algorithm allows reducing subjectivity, minimizing the labeling routine, increasing the number of samples, and improving the heterogeneity.

2008 ◽  
Vol 47 (01) ◽  
pp. 70-75 ◽  
Author(s):  
V. Jakkula ◽  
D. J. Cook

Summary Objectives: To many people, home is a sanctuary. With the maturing of smart home technologies, many people with cognitive and physical disabilities can lead independent lives in their own homes for extended periods of time. In this paper, we investigate the design of machine learning algorithms that support this goal. We hypothesize that machine learning algorithms can be designed to automatically learn models of resident behavior in a smart home, and that the results can be used to perform automated health monitoring and to detect anomalies. Methods: Specifically, our algorithms draw upon the temporal nature of sensor data collected in a smart home to build a model of expected activities and to detect unexpected, and possibly health-critical, events in the home. Results: We validate our algorithms using synthetic data and real activity data collected from volunteers in an automated smart environment. Conclusions: The results from our experiments support our hypothesis that a model can be learned from observed smart home data and used to report anomalies, as they occur, in a smart home.


Water ◽  
2021 ◽  
Vol 13 (23) ◽  
pp. 3461
Author(s):  
Panagiotis Christias ◽  
Mariana Mocanu

Agricultural systems are constantly stressed due to higher demands for products. Consequently, water resources consumed on irrigation are increased. In combination with the climatic change, those are major obstacles to maintaining sustainable development, especially in a semi-arid land. This paper presents an end-to-end Machine Learning framework for predicting the potential profit from olive farms. The objective is to estimate the optimal economic gain while preserving water resources on irrigation by considering various related factors such as climatic conditions, crop management practices, soil characteristics, and crop yield. The case study focuses on olive tree farms located on the Hellenic Island of Crete. Real data from the farms and the weather in the area will be used. The target is to build a framework that will preprocess input data, compare the results among a group of Machine Learning algorithms and propose the best-predicted value of economic profit. Various aspects during this process will be thoroughly examined such as the bias-variance tradeoff and the problem of overfitting, data transforms, feature engineering and selection, ensemble methods as well as pursuing optimal resampling towards better model accuracy. Results indicated that through data preprocessing and resampling, Machine Learning algorithms performance is enhanced. Ultimately, prediction accuracy and reliability are greatly improved compared to algorithms’ performances without the framework’s operation.


2021 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.


2020 ◽  
Vol 7 (1) ◽  
pp. 1718821 ◽  
Author(s):  
Yan I. Kuchin ◽  
Ravil I. Mukhamediev ◽  
Kirill O. Yakunin ◽  
Duc Pham

2020 ◽  
Vol 18 (2) ◽  
pp. 62-75
Author(s):  
Andrey O. Matveev ◽  
Alexander V. Bystrov ◽  
Vitaly I. Bibaev ◽  
Nikita I. Povarov

Auto-completion is an essential feature of any popular text editor for some language. It allows users to avoid the process of annoying typing of long expressions in their projects. There are a lot of different works in this direction in scientific research and commercial products. These works are very different and either use some special features and heuristics to improve code completion or use machine learning techniques. Most of these approaches rely on synthetic data and do not take into account the behavior of real users. The article proposes an approach to improve the automatic code completion mechanism for the Python language by collecting information about usage of this mechanism by real users. The obtained data is used to train the model to rank completion variants with machine learning algorithms. To train the model, two types of features are used: contextual and elemental. Contextual features describe information about the code next to the cursor position in a text editor. Elemental features describe the characteristics of the proposed variant, for example, the length of the matching prefix. When building a model, it is important to take into account the limits of the response time of the model and its size. Also, in the paper, various approaches of assessing the quality of the final model are considered.


2020 ◽  
Author(s):  
David Meyer

<p>The use of real data for training machine learning (ML) models are often a cause of major limitations. For example, real data may be (a) representative of a subset of situations and domains, (b) expensive to produce, (c) limited to specific individuals due to licensing restrictions. Although the use of synthetic data are becoming increasingly popular in computer vision, ML models used in weather and climate models still rely on the use of large real data datasets. Here we present some recent work towards the generation of synthetic data for weather and climate applications and outline some of the major challenges and limitations encountered.</p>


2021 ◽  
Author(s):  
David Meyer ◽  
Thomas Nagler ◽  
Robin J. Hogan

Abstract. Can we improve machine learning (ML) emulators with synthetic data? The use of real data for training ML models is often the cause of major limitations. For example, real data may be (a) only representative of a subset of situations and domains, (b) expensive to source, (c) limited to specific individuals due to licensing restrictions. Although the use of synthetic data is becoming increasingly popular in computer vision, the training of ML emulators in weather and climate still relies on the use of real data datasets. Here we investigate whether the use of copula-based synthetically-augmented datasets improves the prediction of ML emulators for estimating the downwelling longwave radiation. Results show that bulk errors are cut by up to 75 % for the mean bias error (from 0.08 to −0.02 W m−2) and by up to 62 % (from 1.17 to 0.44 W m−2) for the mean absolute error, thus showing potential for improving the generalization of future ML emulators.


Sign in / Sign up

Export Citation Format

Share Document