data generation Latest Research Papers

The Generative Models have gained considerable attention in unsupervised learning via a new and practical framework called Generative Adversarial Networks (GAN) due to their outstanding data generation capability. Many GAN models have been proposed, and several practical applications have emerged in various domains of computer vision and machine learning. Despite GANs excellent success, there are still obstacles to stable training. The problems are Nash equilibrium, internal covariate shift, mode collapse, vanishing gradient, and lack of proper evaluation metrics. Therefore, stable training is a crucial issue in different applications for the success of GANs. Herein, we survey several training solutions proposed by different researchers to stabilize GAN training. We discuss (I) the original GAN model and its modified versions, (II) a detailed analysis of various GAN applications in different domains, and (III) a detailed study about the various GAN training obstacles as well as training solutions. Finally, we reveal several issues as well as research outlines to the topic.

Download Full-text

High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies

ACM Transactions on Graphics ◽

10.1145/3472954 ◽

2022 ◽

Vol 41 (1) ◽

pp. 1-21

Author(s):

Linchao Bao ◽

Xiangkai Lin ◽

Yajing Chen ◽

Haoxian Zhang ◽

Sheng Wang ◽

...

Keyword(s):

Hybrid Approach ◽

Three Dimensional ◽

Selection Procedure ◽

Human Head ◽

High Fidelity ◽

Data Generation ◽

High Quality ◽

Geometry Modeling ◽

Human Faces ◽

Digital Human

We present a fully automatic system that can produce high-fidelity, photo-realistic three-dimensional (3D) digital human heads with a consumer RGB-D selfie camera. The system only needs the user to take a short selfie RGB-D video while rotating his/her head and can produce a high-quality head reconstruction in less than 30 s. Our main contribution is a new facial geometry modeling and reflectance synthesis procedure that significantly improves the state of the art. Specifically, given the input video a two-stage frame selection procedure is first employed to select a few high-quality frames for reconstruction. Then a differentiable renderer-based 3D Morphable Model (3DMM) fitting algorithm is applied to recover facial geometries from multiview RGB-D data, which takes advantages of a powerful 3DMM basis constructed with extensive data generation and perturbation. Our 3DMM has much larger expressive capacities than conventional 3DMM, allowing us to recover more accurate facial geometry using merely linear basis. For reflectance synthesis, we present a hybrid approach that combines parametric fitting and Convolutional Neural Networks (CNNs) to synthesize high-resolution albedo/normal maps with realistic hair/pore/wrinkle details. Results show that our system can produce faithful 3D digital human faces with extremely realistic details. The main code and the newly constructed 3DMM basis is publicly available.

Download Full-text

Improvement of Airborne LiDAR Intensity Image Content with Shaded nDSM and Assessment of Its Utility in Geospatial Data Generation

Journal of the Indian Society of Remote Sensing ◽

10.1007/s12524-021-01468-6 ◽

2022 ◽

Author(s):

B. Sadasiva Rao ◽

G. Anil Kumar ◽

C. Runjhun ◽

C. V. K. V. P. Jagannadha Rao ◽

G. Varaprasad Babu

Keyword(s):

Airborne Lidar ◽

Geospatial Data ◽

Data Generation ◽

Image Content ◽

Intensity Image

Download Full-text

Artificial Neural Network Potentials for Mechanics and Fracture Dynamics of Materials

10.21203/rs.3.rs-1178290/v1 ◽

2022 ◽

Author(s):

Gang Seob Jung ◽

Hoon Joo Myung ◽

Stephan Irle

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Crack Propagation ◽

Density Functional ◽

Tight Binding ◽

Training Data ◽

Data Generation ◽

Deformation And Failure ◽

Deformation And Fracture ◽

Artificial Neural

Abstract Atomistic understanding of mechanics and failure of materials is the key for engineering and applications. Modeling accurately brittle failure with crack propagation in covalent crystals requires a quantum mechanics-based description of individual bond-breaking events for large system sizes. Machine Learned (ML) potentials have emerged to overcome the traditional, physics-based modeling tradeoff between accuracy and accessible time and length scales. Previous studies have shown successful applications of ML potentials for describing the structure and dynamics of molecular systems and amorphous or liquid phases of materials. However, their application to deformation and failure processes in materials is yet uncommon. In this study, we discuss apparent limitations of ML potentials to describe deformation and fracture under loadings and propose a way to generate and select training data for their employment in simulations of deformation and fracture of crystals. We applied the proposed approach to 2D crystal graphene, utilizing the density-functional tight-binding (DFTB) method for more efficient and extensive data generation in place of density functional theory (DFT). Then, we explore how the data selection affects the accuracy of the developed artificial neural network potential (NNP), indicating that only the errors in total energies and atomic forces are insufficient to judge the NNP’s reliability. Therefore, we evaluate and select NNPs based on their performance in describing physical properties, e.g., stress-strain curves and geometric deformation. In sharp contrast to popular reactive bond order potentials, our optimized NNP predicts straight crack propagation in graphene along both armchair and zigzag lattice directions, as well as higher fracture toughness of zigzag edge direction. Our study provides significant insight into crack propagation mechanisms at atomic scales and highlights strategies for NNP developments of broader materials.

Download Full-text

An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape

10.20944/preprints202201.0160.v1 ◽

2022 ◽

Author(s):

Damian JJ Farnell

Keyword(s):

Small Sample ◽

Generation Model ◽

Data Generation ◽

Sample Sizes ◽

Facial Shape ◽

Group Variation ◽

Level Model ◽

Components Analysis ◽

Small Sample Sizes ◽

Component Scores

3D facial surface imaging is a useful tool in dentistry and in terms of diagnostics and treatment planning. Between-groups PCA (bgPCA) is a method that has been used to analyse shapes in biological morphometrics, although various “pathologies” of bgPCA have recently been proposed. Monte Carlo (MC) simulated datasets were created here in order to explore “pathologies” of multilevel PCA (mPCA), where mPCA with two levels is equivalent to bgPCA. The first set of MC experiments involved 300 uncorrelated normally distributed variables, whereas the second set of MC experiments used correlated multivariate MC data describing 3D facial shape. We confirmed previous results of other researchers that indicated that bgPCA (and so also mPCA) can give a false impression of strong differences in component scores between groups when there is none in reality. These spurious differences in component scores via mPCA reduced strongly as the sample sizes per group were increased. Eigenvalues via mPCA were also found to be strongly effected by imbalances in sample sizes per group, although this problem was removed by using weighted forms of covariance matrices suggested by the maximum likelihood solution of the two-level model. However, this did not solve problems of spurious differences between groups in these simulations, which was driven by very small sample sizes in one group here. As a “rule of thumb” only, all of our experiments indicate that reasonable results are obtained when sample sizes per group in all groups are at least equal to the number of variables. Interestingly, the sum of all eigenvalues over both levels via mPCA scaled approximately linearly with the inverse of the sample size per group in all experiments. Finally, between-group variation was added explicitly to the MC data generation model in two experiments considered here. Results for the sum of all eigenvalues via mPCA predicted the asymptotic amount for the total amount of variance correctly in this case, whereas standard “single-level” PCA underestimated this quantity.

Download Full-text

ImageMech: From Image to Particle Spring Network for Mechanical Characterization

Frontiers in Materials ◽

10.3389/fmats.2021.803875 ◽

2022 ◽

Vol 8 ◽

Author(s):

Yuan Chiang ◽

Ting-Wai Chiu ◽

Shu-Wei Chang

Keyword(s):

Machine Learning ◽

Computer Aided Design ◽

Functional Materials ◽

Coarse Grained ◽

Stability And Convergence ◽

Data Generation ◽

Modeling Tools ◽

Lattice Spring Model ◽

Sufficient Detail ◽

Aided Design

The emerging demand for advanced structural and biological materials calls for novel modeling tools that can rapidly yield high-fidelity estimation on materials properties in design cycles. Lattice spring model , a coarse-grained particle spring network, has gained attention in recent years for predicting the mechanical properties and giving insights into the fracture mechanism with high reproducibility and generalizability. However, to simulate the materials in sufficient detail for guaranteed numerical stability and convergence, most of the time a large number of particles are needed, greatly diminishing the potential for high-throughput computation and therewith data generation for machine learning frameworks. Here, we implement CuLSM, a GPU-accelerated compute unified device architecture C++ code realizing parallelism over the spring list instead of the commonly used spatial decomposition, which requires intermittent updates on the particle neighbor list. Along with the image-to-particle conversion tool Img2Particle, our toolkit offers a fast and flexible platform to characterize the elastic and fracture behaviors of materials, expediting the design process between additive manufacturing and computer-aided design. With the growing demand for new lightweight, adaptable, and multi-functional materials and structures, such tailored and optimized modeling platform has profound impacts, enabling faster exploration in design spaces, better quality control for 3D printing by digital twin techniques, and larger data generation pipelines for image-based generative machine learning models.

Download Full-text

Confronting Deep-Learning and Biodiversity Challenges for Automatic Video-Monitoring of Marine Ecosystems

Sensors ◽

10.3390/s22020497 ◽

2022 ◽

Vol 22 (2) ◽

pp. 497

Author(s):

Sébastien Villon ◽

Corina Iovan ◽

Morgan Mangeas ◽

Laurent Vigliola

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Low Cost ◽

Video Recording ◽

Species Abundance ◽

Automatic Monitoring ◽

Training Dataset ◽

Data Generation ◽

Automatic Assessment ◽

Open Set

With the availability of low-cost and efficient digital cameras, ecologists can now survey the world’s biodiversity through image sensors, especially in the previously rather inaccessible marine realm. However, the data rapidly accumulates, and ecologists face a data processing bottleneck. While computer vision has long been used as a tool to speed up image processing, it is only since the breakthrough of deep learning (DL) algorithms that the revolution in the automatic assessment of biodiversity by video recording can be considered. However, current applications of DL models to biodiversity monitoring do not consider some universal rules of biodiversity, especially rules on the distribution of species abundance, species rarity and ecosystem openness. Yet, these rules imply three issues for deep learning applications: the imbalance of long-tail datasets biases the training of DL models; scarce data greatly lessens the performances of DL models for classes with few data. Finally, the open-world issue implies that objects that are absent from the training dataset are incorrectly classified in the application dataset. Promising solutions to these issues are discussed, including data augmentation, data generation, cross-entropy modification, few-shot learning and open set recognition. At a time when biodiversity faces the immense challenges of climate change and the Anthropocene defaunation, stronger collaboration between computer scientists and ecologists is urgently needed to unlock the automatic monitoring of biodiversity.

Download Full-text

Base Models for Configural Frequency Analysis – Data Generation Processes

Integrative Psychological and Behavioral Science ◽

10.1007/s12124-021-09665-1 ◽

2022 ◽

Author(s):

Alexander von Eye ◽

Wolfgang Wiedermann ◽

Stefan von Weber

Keyword(s):

Frequency Analysis ◽

Analysis Data ◽

Data Generation ◽

Configural Frequency Analysis

Download Full-text

Machine learning guided batched design of a bacterial Ribosome Binding Site

10.1101/2022.01.05.475140 ◽

2022 ◽

Author(s):

Mengyan Zhang ◽

Maciej B Holowko ◽

Huw Hayman Zumpe ◽

Cheng Soon Ong

Keyword(s):

Machine Learning ◽

Binding Site ◽

Gaussian Process Regression ◽

Machine Learning Algorithms ◽

Ribosome Binding Site ◽

Quality Data ◽

Data Generation ◽

Translation Control ◽

Ribosome Binding

Optimisation of gene expression levels is an essential part of the organism design process. Fine control of this process can be achieved through engineering transcription and translation control elements, including the ribosome binding site (RBS). Unfortunately, design of specific genetic parts can still be challenging due to lack of reliable design methods. To address this problem, we have created a machine learning guided Design-Build-Test-Learn (DBTL) cycle for the experimental design of bacterial RBSs to show how small genetic parts can be reliably designed using relatively small, high-quality data sets. We used Gaussian Process Regression for the Learn phase of cycle and the Upper Confidence Bound multi-armed bandit algorithm for the Design of genetic variants to be tested in vivo. We have integrated these machine learning algorithms with laboratory automation and high-throughput processes for reliable data generation. Notably, by Testing a total of 450 RBS variants in four DBTL cycles, we experimentally validated RBSs with high translation initiation rates equalling or exceeding our benchmark RBS by up to 34%. Overall, our results show that machine learning is a powerful tool for designing RBSs, and they pave the way towards more complicated genetic devices.

Download Full-text

Raking and Relabeling for Imbalanced Data

10.36227/techrxiv.17712122.v1 ◽

2022 ◽

Author(s):

Seunghwan Park ◽

Hae-Wwan Lee ◽

Jongho Im

Keyword(s):

High Dimensional Data ◽

Imbalanced Data ◽

Sampling Strategy ◽

Classification Performance ◽

Mixed Data ◽

Categorical Variables ◽

High Dimensional ◽

Data Generation ◽

Minority Class ◽

Imbalanced Data Classification

<div>We consider the binary classification of imbalanced data. A dataset is imbalanced if the proportion of classes are heavily skewed. Imbalanced data classification is often challengeable, especially for high-dimensional data, because unequal classes deteriorate classifier performance. Under sampling the majority class or oversampling the minority class are popular methods to construct balanced samples, facilitating classification performance improvement. However, many existing sampling methods cannot be easily extended to high-dimensional data and mixed data, including categorical variables, because they often require approximating the attribute distributions, which becomes another critical issue. In this paper, we propose a new sampling strategy employing raking and relabeling procedures, such that the attribute values of the majority class are imputed for the values of the minority class in the construction of balanced samples. The proposed algorithms produce comparable performance as existing popular methods but are more flexible regarding the data shape and attribute size. The sampling algorithm is attractive in practice, considering that it does not require density estimation for synthetic data generation in oversampling and is not bothered by mixed-type variables. In addition, the proposed sampling strategy is robust to classifiers in the sense that classification performance is not sensitive to choosing the classifiers.</div>

Download Full-text

data generation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Survey on Generative Adversarial Networks: Variants, Applications, and Training

High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies

Improvement of Airborne LiDAR Intensity Image Content with Shaded nDSM and Assessment of Its Utility in Geospatial Data Generation

Artificial Neural Network Potentials for Mechanics and Fracture Dynamics of Materials

An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape

ImageMech: From Image to Particle Spring Network for Mechanical Characterization

Confronting Deep-Learning and Biodiversity Challenges for Automatic Video-Monitoring of Marine Ecosystems

Base Models for Configural Frequency Analysis – Data Generation Processes

Machine learning guided batched design of a bacterial Ribosome Binding Site

Raking and Relabeling for Imbalanced Data

Export Citation Format

data generationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Survey on Generative Adversarial Networks: Variants, Applications, and Training

High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies

Improvement of Airborne LiDAR Intensity Image Content with Shaded nDSM and Assessment of Its Utility in Geospatial Data Generation

Artificial Neural Network Potentials for Mechanics and Fracture Dynamics of Materials

An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape

ImageMech: From Image to Particle Spring Network for Mechanical Characterization

Confronting Deep-Learning and Biodiversity Challenges for Automatic Video-Monitoring of Marine Ecosystems

Base Models for Configural Frequency Analysis – Data Generation Processes

Machine learning guided batched design of a bacterial Ribosome Binding Site

Raking and Relabeling for Imbalanced Data

data generation
Recently Published Documents