Synthetic Data: How AI Is Transitioning From Data Consumer to Data Producer... and Why That's Important

This article deals with method of binary classification of images with small text on them Classification is based on the fact that the text can have 2 directions – it can be positioned horizontally and read from left to right or it can be turned 180 degrees so the image must be rotated to read the sign. This type of text can be found on the covers of a variety of books, so in case of recognizing the covers, it is necessary first to determine the direction of the text before we will directly recognize it. The article suggests the development of a deep neural network for determination of the text position in the context of book covers recognizing. The results of training and testing of a convolutional neural network on synthetic data as well as the examples of the network functioning on the real data are presented.

Download Full-text

A Technique to Determine Unsteady-State Inhibition Kinetics in the Activated Sludge Process

Water Science & Technology ◽

10.2166/wst.1989.0261 ◽

1989 ◽

Vol 21 (6-7) ◽

pp. 593-602 ◽

Cited By ~ 3

Author(s):

Andrew T. Watkin ◽

W. Wesley Eckenfelder

Keyword(s):

Activated Sludge ◽

Kinetic Parameters ◽

Curve Fitting ◽

Glucose Utilization ◽

Batch Reactor ◽

Synthetic Data ◽

Utilization Rate ◽

Fitting Algorithm ◽

Test Reactor ◽

Two Parameter

A technique for rapidly determining Monod and inhibition kinetic parameters in activated sludge is evaluated. The method studied is known as the fed-batch reactor technique and requires approximately three hours to complete. The technique allows for a gradual build-up of substrate in the test reactor by introducing the substrate at a feed rate greater than the maximum substrate utilization rate. Both inhibitory and non-inhibitory substrate responses are modeled using a nonlinear numerical curve-fitting technique. The responses of both glucose and 2,4-dichlorophenol (DCP) are studied using activated sludges with various acclimation histories. Statistically different inhibition constants, KI, for DCP inhibition of glucose utilization were found for the various sludges studied. The curve-fitting algorithm was verified in its ability to accurately retrieve two kinetic parameters from synthetic data generated by superimposing normally distributed random error onto the two parameter numerical solution generated by the algorithm.

Download Full-text

Reverse time migration in tilted transversely isotropic media

Brazilian Journal of Geophysics ◽

10.22564/rbgf.v38i2.2041 ◽

2020 ◽

Vol 38 (2) ◽

Author(s):

Razec Cezar Sampaio Pinto da Silva Torres ◽

Leandro Di Bartolo

Keyword(s):

Seismic Anisotropy ◽

Synthetic Data ◽

Transversely Isotropic ◽

Reverse Time ◽

Reverse Time Migration ◽

Computer Clusters ◽

Time Migration ◽

Transversely Isotropic Media ◽

Isotropic Media ◽

Tti Media

ABSTRACT. Reverse time migration (RTM) is one of the most powerful methods used to generate images of the subsurface. The RTM was proposed in the early 1980s, but only recently it has been routinely used in exploratory projects involving complex geology – Brazilian pre-salt, for example. Because the method uses the two-way wave equation, RTM is able to correctly image any kind of geological environment (simple or complex), including those with anisotropy. On the other hand, RTM is computationally expensive and requires the use of computer clusters. This paper proposes to investigate the influence of anisotropy on seismic imaging through the application of RTM for tilted transversely isotropic (TTI) media in pre-stack synthetic data. This work presents in detail how to implement RTM for TTI media, addressing the main issues and specific details, e.g., the computational resources required. A couple of simple models results are presented, including the application to a BP TTI 2007 benchmark model.Keywords: finite differences, wave numerical modeling, seismic anisotropy. Migração reversa no tempo em meios transversalmente isotrópicos inclinadosRESUMO. A migração reversa no tempo (RTM) é um dos mais poderosos métodos utilizados para gerar imagens da subsuperfície. A RTM foi proposta no início da década de 80, mas apenas recentemente tem sido rotineiramente utilizada em projetos exploratórios envolvendo geologia complexa, em especial no pré-sal brasileiro. Por ser um método que utiliza a equação completa da onda, qualquer configuração do meio geológico pode ser corretamente tratada, em especial na presença de anisotropia. Por outro lado, a RTM é dispendiosa computacionalmente e requer o uso de clusters de computadores por parte da indústria. Este artigo apresenta em detalhes uma implementação da RTM para meios transversalmente isotrópicos inclinados (TTI), abordando as principais dificuldades na sua implementação, além dos recursos computacionais exigidos. O algoritmo desenvolvido é aplicado a casos simples e a um benchmark padrão, conhecido como BP TTI 2007.Palavras-chave: diferenças finitas, modelagem numérica de ondas, anisotropia sísmica.

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions

ACM Transactions on Graphics ◽

10.1145/3451341 ◽

2021 ◽

Vol 40 (3) ◽

pp. 1-12

Author(s):

Hao Zhang ◽

Yuxiao Zhou ◽

Yifei Tian ◽

Jun-Hai Yong ◽

Feng Xu

Keyword(s):

Real Time ◽

Synthetic Data ◽

Real Data ◽

Depth Image ◽

Real Time System ◽

The Real ◽

Time Performance ◽

Contact Constraint ◽

Object Shapes ◽

Object Interactions

Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment the hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for the real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefitting from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hand and object shapes, various interactive motions, and moving cameras.

Download Full-text

A Reinforcement Learning and Synthetic Data Approach to Mobile Notification Management

Proceedings of the 17th International Conference on Advances in Mobile Computing & Multimedia ◽

10.1145/3365921.3365932 ◽

2019 ◽

Cited By ~ 1

Author(s):

Rowan Sutton ◽

Kieran Fraser ◽

Owen Conlan

Keyword(s):

Reinforcement Learning ◽

Synthetic Data

Download Full-text

A Unified View of Causal and Non-causal Feature Selection

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3436891 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-46

Author(s):

Kui Yu ◽

Lin Liu ◽

Jiuyong Li

Keyword(s):

Feature Selection ◽

Bayesian Network ◽

Synthetic Data ◽

Selection Methods ◽

Bayesian Network Model ◽

Real World Data ◽

Feature Sets ◽

Unified View ◽

Optimal Feature ◽

Different Levels

In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and information theory, we first show that causal and non-causal feature selection methods share the same objective. That is to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We then examine the assumptions made by causal and non-causal feature selection methods when searching for the optimal feature set, and unify the assumptions by mapping them to the restrictions on the structure of the Bayesian network model of the studied problem. We further analyze in detail how the structural assumptions lead to the different levels of approximations employed by the methods in their search, which then result in the approximations in the feature sets found by the methods with respect to the optimal feature set. With the unified view, we can interpret the output of non-causal methods from a causal perspective and derive the error bounds of both types of methods. Finally, we present practical understanding of the relation between causal and non-causal methods using extensive experiments with synthetic data and various types of real-world data.

Download Full-text

Simple Index to Assess the Calibration Quality of Safety Performance Functions Based on Multiple Goodness-of-Fit Metrics

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211008896 ◽

2021 ◽

pp. 036119812110088

Author(s):

Raul E. Avelar ◽

Karen Dixon ◽

Boniphace Kutela ◽

Sam Klump ◽

Beth Wemple ◽

...

Keyword(s):

Goodness Of Fit ◽

Synthetic Data ◽

Calibration Procedure ◽

Safety Performance ◽

Absolute Deviation ◽

Data Set ◽

Safety Database ◽

Simple Index ◽

Safety Performance Functions

The calibration of safety performance functions (SPFs) is a mechanism included in the Highway Safety Manual (HSM) to adjust SPFs in the HSM for use in intended jurisdictions. Critically, the quality of the calibration procedure must be assessed before using the calibrated SPFs. Multiple resources to aid practitioners in calibrating SPFs have been developed in the years following the publication of the HSM 1st edition. Similarly, the literature suggests multiple ways to assess the goodness-of-fit (GOF) of a calibrated SPF to a data set from a given jurisdiction. This paper uses the calibration results of multiple intersection SPFs to a large Mississippi safety database to examine the relations between multiple GOF metrics. The goal is to develop a sensible single index that leverages the joint information from multiple GOF metrics to assess overall quality of calibration. A factor analysis applied to the calibration results revealed three underlying factors explaining 76% of the variability in the data. From these results, the authors developed an index and performed a sensitivity analysis. The key metrics were found to be, in descending order: the deviation of the cumulative residual (CURE) plot from the 95% confidence area, the mean absolute deviation, the modified R-squared, and the value of the calibration factor. This paper also presents comparisons between the index and alternative scoring strategies, as well as an effort to verify the results using synthetic data. The developed index is recommended to comprehensively assess the quality of the calibrated intersection SPFs.

Download Full-text

Real UAV-Bird Image Classification Using CNN with a Synthetic Dataset

Applied Sciences ◽

10.3390/app11093863 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3863

Author(s):

Ali Emre Öztürk ◽

Ergun Erçelebi

Keyword(s):

Deep Learning ◽

Image Classification ◽

Synthetic Data ◽

Real Data ◽

Corner Detection ◽

Batch Size ◽

Test Accuracy ◽

Classification Problems ◽

Auc Value ◽

Classification Test

A large amount of training image data is required for solving image classification problems using deep learning (DL) networks. In this study, we aimed to train DL networks with synthetic images generated by using a game engine and determine the effects of the networks on performance when solving real-image classification problems. The study presents the results of using corner detection and nearest three-point selection (CDNTS) layers to classify bird and rotary-wing unmanned aerial vehicle (RW-UAV) images, provides a comprehensive comparison of two different experimental setups, and emphasizes the significant improvements in the performance in deep learning-based networks due to the inclusion of a CDNTS layer. Experiment 1 corresponds to training the commonly used deep learning-based networks with synthetic data and an image classification test on real data. Experiment 2 corresponds to training the CDNTS layer and commonly used deep learning-based networks with synthetic data and an image classification test on real data. In experiment 1, the best area under the curve (AUC) value for the image classification test accuracy was measured as 72%. In experiment 2, using the CDNTS layer, the AUC value for the image classification test accuracy was measured as 88.9%. A total of 432 different combinations of trainings were investigated in the experimental setups. The experiments were trained with various DL networks using four different optimizers by considering all combinations of batch size, learning rate, and dropout hyperparameters. The test accuracy AUC values for networks in experiment 1 ranged from 55% to 74%, whereas the test accuracy AUC values in experiment 2 networks with a CDNTS layer ranged from 76% to 89.9%. It was observed that the CDNTS layer has considerable effects on the image classification accuracy performance of deep learning-based networks. AUC, F-score, and test accuracy measures were used to validate the success of the networks.

Download Full-text

A Novel Deep Learning-Based Diagnosis Method Applied to Power Quality Disturbances

Energies ◽

10.3390/en14102839 ◽

2021 ◽

Vol 14 (10) ◽

pp. 2839

Author(s):

Artvin-Darien Gonzalez-Abreu ◽

Miguel Delgado-Prieto ◽

Roque-Alfredo Osornio-Rios ◽

Juan-Jose Saucedo-Dorantes ◽

Rene-de-Jesus Romero-Troncoso

Keyword(s):

Deep Learning ◽

Power Quality ◽

Electrical Power ◽

Synthetic Data ◽

Industrial Sector ◽

Industrial Facilities ◽

Diagnosis Method ◽

Neural Network Structure ◽

Power Quality Disturbances ◽

Disturbance Detection

Monitoring electrical power quality has become a priority in the industrial sector background: avoiding unwanted effects that affect the whole performance at industrial facilities is an aim. The lack of commercial equipment capable of detecting them is a proven fact. Studies and research related to these types of grid behaviors are still a subject for which contributions are required. Although research has been conducted for disturbance detection, most methodologies consider only a few standardized disturbance combinations. This paper proposes an innovative deep learning-based diagnosis method to be applied on power quality disturbances, and it is based on three stages. Firstly, a domain fusion approach is considered in a feature extraction stage to characterize the electrical power grid. Secondly, an adaptive pattern characterization is carried out by considering a stacked autoencoder. Finally, a neural network structure is applied to identify disturbances. The proposed approach relies on the training and validation of the diagnosis system with synthetic data: single, double and triple disturbances combinations and different noise levels, also validated with available experimental measurements provided by IEEE 1159.2 Working Group. The proposed method achieves nearly a 100% hit rate allowing a far more practical application due to its capability of pattern characterization.

Download Full-text