Revisiting Dead Leaves Model: Training with Synthetic Data

Universal domain adaptation (UDA) is a crucial research topic for efficient deep learning model training using data from various imaging sensors. However, its development is affected by unlabeled target data. Moreover, the nonexistence of prior knowledge of the source and target domain makes it more challenging for UDA to train models. I hypothesize that the degradation of trained models in the target domain is caused by the lack of direct training loss to improve the discriminative power of the target domain data. As a result, the target data adapted to the source representations is biased toward the source domain. I found that the degradation was more pronounced when I used synthetic data for the source domain and real data for the target domain. In this paper, I propose a UDA method with target domain contrastive learning. The proposed method enables models to leverage synthetic data for the source domain and train the discriminativeness of target features in an unsupervised manner. In addition, the target domain feature extraction network is shared with the source domain classification task, preventing unnecessary computational growth. Extensive experimental results on VisDa-2017 and MNIST to SVHN demonstrated that the proposed method significantly outperforms the baseline by 2.7% and 5.1%, respectively.

Download Full-text

Evaluation of Synthetic Datasets Generation for Intent Classification Tasks in Portuguese

10.5753/stil.2021.17806 ◽

2021 ◽

Author(s):

Robson T. Paula ◽

Décio G. Aguiar Neto ◽

Davi Romero ◽

Paulo T. Guerra

Keyword(s):

Information Service ◽

Synthetic Data ◽

Text Generation ◽

Initial Model ◽

Viable Option ◽

Domain Experts ◽

Classification Tasks ◽

Model Training ◽

Synthetic Datasets ◽

Artificial Datasets

A chatbot is an artificial intelligence based system aimed at chatting with users, commonly used as a virtual assistant to help people or answer questions. Intent classification is an essential task for chatbots where it aims to identify what the user wants in a certain dialogue. However, for many domains, little data are available to properly train those systems. In this work, we evaluate the performance of two methods to generate synthetic data for chatbots, one based on template questions and another based on neural text generation. We build four datasets that are used training chatbot components in the intent classification task. We intend to simulate the task of migrating a search-based portal to an interactive dialogue-based information service by using artificial datasets for initial model training. Our results show that template-based datasets are slightly superior to those neural-based generated in our application domain, however, neural-generated present good results and they are a viable option when one has limited access to domain experts to hand-code text templates.

Download Full-text

Synthetic data generation for deep learning model training to understand livestock behavior

10.31274/etd-20200902-98 ◽

2020 ◽

Author(s):

Armin Maraghehmoghaddam

Keyword(s):

Deep Learning ◽

Synthetic Data ◽

Learning Model ◽

Data Generation ◽

Synthetic Data Generation ◽

Model Training ◽

Livestock Behavior ◽

Deep Learning Model

Download Full-text

Method of determination of the text direction on the image with the use of convolutional neural network

Informatization and communication ◽

10.34219/2078-8320-2020-11-2-96-99 ◽

2020 ◽

pp. 96-99

Author(s):

P.L. Nikolaev

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Binary Classification ◽

Synthetic Data ◽

Real Data ◽

Method Of Determination ◽

Classification Of Images

This article deals with method of binary classification of images with small text on them Classification is based on the fact that the text can have 2 directions – it can be positioned horizontally and read from left to right or it can be turned 180 degrees so the image must be rotated to read the sign. This type of text can be found on the covers of a variety of books, so in case of recognizing the covers, it is necessary first to determine the direction of the text before we will directly recognize it. The article suggests the development of a deep neural network for determination of the text position in the context of book covers recognizing. The results of training and testing of a convolutional neural network on synthetic data as well as the examples of the network functioning on the real data are presented.

Download Full-text

A Technique to Determine Unsteady-State Inhibition Kinetics in the Activated Sludge Process

Water Science & Technology ◽

10.2166/wst.1989.0261 ◽

1989 ◽

Vol 21 (6-7) ◽

pp. 593-602 ◽

Cited By ~ 3

Author(s):

Andrew T. Watkin ◽

W. Wesley Eckenfelder

Keyword(s):

Activated Sludge ◽

Kinetic Parameters ◽

Curve Fitting ◽

Glucose Utilization ◽

Batch Reactor ◽

Synthetic Data ◽

Utilization Rate ◽

Fitting Algorithm ◽

Test Reactor ◽

Two Parameter

A technique for rapidly determining Monod and inhibition kinetic parameters in activated sludge is evaluated. The method studied is known as the fed-batch reactor technique and requires approximately three hours to complete. The technique allows for a gradual build-up of substrate in the test reactor by introducing the substrate at a feed rate greater than the maximum substrate utilization rate. Both inhibitory and non-inhibitory substrate responses are modeled using a nonlinear numerical curve-fitting technique. The responses of both glucose and 2,4-dichlorophenol (DCP) are studied using activated sludges with various acclimation histories. Statistically different inhibition constants, KI, for DCP inhibition of glucose utilization were found for the various sludges studied. The curve-fitting algorithm was verified in its ability to accurately retrieve two kinetic parameters from synthetic data generated by superimposing normally distributed random error onto the two parameter numerical solution generated by the algorithm.

Download Full-text

Privacy-preserving Collaborative Training for Medical Image Analysis Based on Multi-Blockchain

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323666201022110616 ◽

2020 ◽

Vol 23 ◽

Author(s):

Wanlu Zhang ◽

Qigang Wang ◽

Mei Li

Keyword(s):

Medical Image ◽

Data Privacy ◽

Medical Image Analysis ◽

Auxiliary Information ◽

Training Process ◽

Private Data ◽

Medical Institutions ◽

Model Training ◽

Collaborative Training ◽

Similar Task

Background: As artificial intelligence and big data analysis develop rapidly, data privacy, especially patient medical data privacy, is getting more and more attention. Objective: To strengthen the protection of private data while ensuring the model training process, this article introduces a multi-Blockchain-based decentralized collaborative machine learning training method for medical image analysis. In this way, researchers from different medical institutions are able to collaborate to train models without exchanging sensitive patient data. Method: Partial parameter update method is applied to prevent indirect privacy leakage during model propagation. With the peer-to-peer communication in the multi-Blockchain system, a machine learning task can leverage auxiliary information from another similar task in another Blockchain. In addition, after the collaborative training process, personalized models of different medical institutions will be trained. Results: The experimental results show that our method achieves similar performance with the centralized model-training method by collecting data sets of all participants and prevents private data leakage at the same time. Transferring auxiliary information from similar task on another Blockchain has also been proven to effectively accelerate model convergence and improve model accuracy, especially in the scenario of absence of data. Personalization training process further improves model performance. Conclusion: Our approach can effectively help researchers from different organizations to achieve collaborative training without disclosing their private data.

Download Full-text

Reverse time migration in tilted transversely isotropic media

Brazilian Journal of Geophysics ◽

10.22564/rbgf.v38i2.2041 ◽

2020 ◽

Vol 38 (2) ◽

Author(s):

Razec Cezar Sampaio Pinto da Silva Torres ◽

Leandro Di Bartolo

Keyword(s):

Seismic Anisotropy ◽

Synthetic Data ◽

Transversely Isotropic ◽

Reverse Time ◽

Reverse Time Migration ◽

Computer Clusters ◽

Time Migration ◽

Transversely Isotropic Media ◽

Isotropic Media ◽

Tti Media

ABSTRACT. Reverse time migration (RTM) is one of the most powerful methods used to generate images of the subsurface. The RTM was proposed in the early 1980s, but only recently it has been routinely used in exploratory projects involving complex geology – Brazilian pre-salt, for example. Because the method uses the two-way wave equation, RTM is able to correctly image any kind of geological environment (simple or complex), including those with anisotropy. On the other hand, RTM is computationally expensive and requires the use of computer clusters. This paper proposes to investigate the influence of anisotropy on seismic imaging through the application of RTM for tilted transversely isotropic (TTI) media in pre-stack synthetic data. This work presents in detail how to implement RTM for TTI media, addressing the main issues and specific details, e.g., the computational resources required. A couple of simple models results are presented, including the application to a BP TTI 2007 benchmark model.Keywords: finite differences, wave numerical modeling, seismic anisotropy. Migração reversa no tempo em meios transversalmente isotrópicos inclinadosRESUMO. A migração reversa no tempo (RTM) é um dos mais poderosos métodos utilizados para gerar imagens da subsuperfície. A RTM foi proposta no início da década de 80, mas apenas recentemente tem sido rotineiramente utilizada em projetos exploratórios envolvendo geologia complexa, em especial no pré-sal brasileiro. Por ser um método que utiliza a equação completa da onda, qualquer configuração do meio geológico pode ser corretamente tratada, em especial na presença de anisotropia. Por outro lado, a RTM é dispendiosa computacionalmente e requer o uso de clusters de computadores por parte da indústria. Este artigo apresenta em detalhes uma implementação da RTM para meios transversalmente isotrópicos inclinados (TTI), abordando as principais dificuldades na sua implementação, além dos recursos computacionais exigidos. O algoritmo desenvolvido é aplicado a casos simples e a um benchmark padrão, conhecido como BP TTI 2007.Palavras-chave: diferenças finitas, modelagem numérica de ondas, anisotropia sísmica.

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions

ACM Transactions on Graphics ◽

10.1145/3451341 ◽

2021 ◽

Vol 40 (3) ◽

pp. 1-12

Author(s):

Hao Zhang ◽

Yuxiao Zhou ◽

Yifei Tian ◽

Jun-Hai Yong ◽

Feng Xu

Keyword(s):

Real Time ◽

Synthetic Data ◽

Real Data ◽

Depth Image ◽

Real Time System ◽

The Real ◽

Time Performance ◽

Contact Constraint ◽

Object Shapes ◽

Object Interactions

Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment the hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for the real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefitting from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hand and object shapes, various interactive motions, and moving cameras.

Download Full-text