BOOSTING SEGMENTATION ACCURACY OF THE DEEP LEARNING MODELS BASED ON THE SYNTHETIC DATA GENERATION

Abstract. In the era of data-driven machine learning algorithms, data represents a new oil. The application of machine learning algorithms shows they need large heterogeneous datasets that crucially are correctly labeled. However, data collection and its labeling are time-consuming and labor-intensive processes. A particular task we solve using machine learning is related to the segmentation of medical devices in echocardiographic images during minimally invasive surgery. However, the lack of data motivated us to develop an algorithm generating synthetic samples based on real datasets. The concept of this algorithm is to place a medical device (catheter) in an empty cavity of an anatomical structure, for example, in a heart chamber, and then transform it. To create random transformations of the catheter, the algorithm uses a coordinate system that uniquely identifies each point regardless of the bend and the shape of the object. It is proposed to take a cylindrical coordinate system as a basis, modifying it by replacing the Z-axis with a spline along which the h-coordinate is measured. Having used the proposed algorithm, we generated new images with the catheter inserted into different heart cavities while varying its location and shape. Afterward, we compared the results of deep neural networks trained on the datasets comprised of real and synthetic data. The network trained on both real and synthetic datasets performed more accurate segmentation than the model trained only on real data. For instance, modified U-net trained on combined datasets performed segmentation with the Dice similarity coefficient of 92.6±2.2%, while the same model trained only on real samples achieved the level of 86.5±3.6%. Using a synthetic dataset allowed decreasing the accuracy spread and improving the generalization of the model. It is worth noting that the proposed algorithm allows reducing subjectivity, minimizing the labeling routine, increasing the number of samples, and improving the heterogeneity.

Download Full-text

Anomaly Detection Using Temporal Data Mining in a Smart Home Environment

Methods of Information in Medicine ◽

10.3414/me9103 ◽

2008 ◽

Vol 47 (01) ◽

pp. 70-75 ◽

Cited By ~ 54

Author(s):

V. Jakkula ◽

D. J. Cook

Keyword(s):

Machine Learning ◽

Smart Home ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Sensor Data ◽

Temporal Data Mining ◽

Activity Data ◽

Real Activity ◽

Smart Home Environment

Summary Objectives: To many people, home is a sanctuary. With the maturing of smart home technologies, many people with cognitive and physical disabilities can lead independent lives in their own homes for extended periods of time. In this paper, we investigate the design of machine learning algorithms that support this goal. We hypothesize that machine learning algorithms can be designed to automatically learn models of resident behavior in a smart home, and that the results can be used to perform automated health monitoring and to detect anomalies. Methods: Specifically, our algorithms draw upon the temporal nature of sensor data collected in a smart home to build a model of expected activities and to detect unexpected, and possibly health-critical, events in the home. Results: We validate our algorithms using synthetic data and real activity data collected from volunteers in an automated smart environment. Conclusions: The results from our experiments support our hypothesis that a model can be learned from observed smart home data and used to report anomalies, as they occur, in a smart home.

Download Full-text

A Machine Learning Framework for Olive Farms Profit Prediction

Water ◽

10.3390/w13233461 ◽

2021 ◽

Vol 13 (23) ◽

pp. 3461

Author(s):

Panagiotis Christias ◽

Mariana Mocanu

Keyword(s):

Machine Learning ◽

Water Resources ◽

Management Practices ◽

Learning Algorithms ◽

Real Data ◽

Olive Tree ◽

Climatic Conditions ◽

Machine Learning Algorithms ◽

Related Factors ◽

Learning Framework

Agricultural systems are constantly stressed due to higher demands for products. Consequently, water resources consumed on irrigation are increased. In combination with the climatic change, those are major obstacles to maintaining sustainable development, especially in a semi-arid land. This paper presents an end-to-end Machine Learning framework for predicting the potential profit from olive farms. The objective is to estimate the optimal economic gain while preserving water resources on irrigation by considering various related factors such as climatic conditions, crop management practices, soil characteristics, and crop yield. The case study focuses on olive tree farms located on the Hellenic Island of Crete. Real data from the farms and the weather in the area will be used. The target is to build a framework that will preprocess input data, compare the results among a group of Machine Learning algorithms and propose the best-predicted value of economic profit. Various aspects during this process will be thoroughly examined such as the bias-variance tradeoff and the problem of overfitting, data transforms, feature engineering and selection, ensemble methods as well as pursuing optimal resampling towards better model accuracy. Results indicated that through data preprocessing and resampling, Machine Learning algorithms performance is enhanced. Ultimately, prediction accuracy and reliability are greatly improved compared to algorithms’ performances without the framework’s operation.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

One method of generating synthetic data to assess the upper limit of machine learning algorithms performance

Cogent Engineering ◽

10.1080/23311916.2020.1718821 ◽

2020 ◽

Vol 7 (1) ◽

pp. 1718821 ◽

Cited By ~ 1

Author(s):

Yan I. Kuchin ◽

Ravil I. Mukhamediev ◽

Kirill O. Yakunin ◽

Duc Pham

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Upper Limit

Download Full-text

A statistical physics approach for the analysis of machine learning algorithms on real data

Journal of Statistical Mechanics Theory and Experiment ◽

10.1088/1742-5468/2005/11/p11001 ◽

2005 ◽

Vol 2005 (11) ◽

pp. P11001-P11001 ◽

Cited By ~ 7

Author(s):

Dörthe Malzahn ◽

Manfred Opper

Keyword(s):

Machine Learning ◽

Statistical Physics ◽

Learning Algorithms ◽

Real Data ◽

Machine Learning Algorithms

Download Full-text

Development of Software Tools to Improve the Work of the Code Completion Mechanism Using Machine Learning Algorithms in an Integrated Development Environment for Python

Vestnik NSU Series Information Technologies ◽

10.25205/1818-7900-2020-18-2-62-75 ◽

2020 ◽

Vol 18 (2) ◽

pp. 62-75

Author(s):

Andrey O. Matveev ◽

Alexander V. Bystrov ◽

Vitaly I. Bibaev ◽

Nikita I. Povarov

Keyword(s):

Machine Learning ◽

Essential Feature ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Text Editor ◽

Development Environment ◽

Code Completion ◽

Automatic Code

Auto-completion is an essential feature of any popular text editor for some language. It allows users to avoid the process of annoying typing of long expressions in their projects. There are a lot of different works in this direction in scientific research and commercial products. These works are very different and either use some special features and heuristics to improve code completion or use machine learning techniques. Most of these approaches rely on synthetic data and do not take into account the behavior of real users. The article proposes an approach to improve the automatic code completion mechanism for the Python language by collecting information about usage of this mechanism by real users. The obtained data is used to train the model to rank completion variants with machine learning algorithms. To train the model, two types of features are used: contextual and elemental. Contextual features describe information about the code next to the cursor position in a text editor. Elemental features describe the characteristics of the proposed variant, for example, the length of the matching prefix. When building a model, it is important to take into account the limits of the response time of the model and its size. Also, in the paper, various approaches of assessing the quality of the final model are considered.

Download Full-text

A Framework for Analyzing 4G/LTE-A Real Data Using Machine Learning Algorithms

Advances in Intelligent Systems and Computing - Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020 ◽

10.1007/978-3-030-58669-0_73 ◽

2020 ◽

pp. 826-838

Author(s):

Nihal H. Mohammed ◽

Heba Nashaat ◽

Salah M. Abdel-Mageid ◽

Rawia Y. Rizk

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Real Data ◽

Machine Learning Algorithms ◽

4G Lte

Download Full-text

Towards synthetic data generation for machine learning models in weather and climate

10.5194/egusphere-egu2020-20132 ◽

2020 ◽

Author(s):

David Meyer

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Climate Models ◽

Synthetic Data ◽

Real Data ◽

Data Generation ◽

Learning Models ◽

Synthetic Data Generation ◽

Weather And Climate ◽

Machine Learning Models

<p>The use of real data for training machine learning (ML) models are often a cause of major limitations. For example, real data may be (a) representative of a subset of situations and domains, (b) expensive to produce, (c) limited to specific individuals due to licensing restrictions. Although the use of synthetic data are becoming increasingly popular in computer vision, ML models used in weather and climate models still rely on the use of large real data datasets. Here we present some recent work towards the generation of synthetic data for weather and climate applications and outline some of the major challenges and limitations encountered.</p>

Download Full-text

Copula-based synthetic data generation for machine learning emulators in weather and climate: application to a simple radiation model

10.5194/gmd-2020-427 ◽

2021 ◽

Author(s):

David Meyer ◽

Thomas Nagler ◽

Robin J. Hogan

Keyword(s):

Machine Learning ◽

Synthetic Data ◽

Longwave Radiation ◽

Real Data ◽

Absolute Error ◽

Mean Bias Error ◽

Bias Error ◽

Data Generation ◽

Weather And Climate ◽

The Mean

Abstract. Can we improve machine learning (ML) emulators with synthetic data? The use of real data for training ML models is often the cause of major limitations. For example, real data may be (a) only representative of a subset of situations and domains, (b) expensive to source, (c) limited to specific individuals due to licensing restrictions. Although the use of synthetic data is becoming increasingly popular in computer vision, the training of ML emulators in weather and climate still relies on the use of real data datasets. Here we investigate whether the use of copula-based synthetically-augmented datasets improves the prediction of ML emulators for estimating the downwelling longwave radiation. Results show that bulk errors are cut by up to 75 % for the mean bias error (from 0.08 to −0.02 W m−2) and by up to 62 % (from 1.17 to 0.44 W m−2) for the mean absolute error, thus showing potential for improving the generalization of future ML emulators.

Download Full-text

Comparing Performance of Machine Learning Algorithms in a Flood Prediction Model with Real Data Sets

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2019/2381.42019 ◽

2019 ◽

pp. 152-157

Author(s):

Fadratul Hafinaz Hassan ◽

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Learning Algorithms ◽

Real Data ◽

Machine Learning Algorithms ◽

Data Sets ◽

Flood Prediction

Download Full-text