data generation
Recently Published Documents





2022 ◽  
Vol 54 (8) ◽  
pp. 1-49
Abdul Jabbar ◽  
Xi Li ◽  
Bourahla Omar

The Generative Models have gained considerable attention in unsupervised learning via a new and practical framework called Generative Adversarial Networks (GAN) due to their outstanding data generation capability. Many GAN models have been proposed, and several practical applications have emerged in various domains of computer vision and machine learning. Despite GANs excellent success, there are still obstacles to stable training. The problems are Nash equilibrium, internal covariate shift, mode collapse, vanishing gradient, and lack of proper evaluation metrics. Therefore, stable training is a crucial issue in different applications for the success of GANs. Herein, we survey several training solutions proposed by different researchers to stabilize GAN training. We discuss (I) the original GAN model and its modified versions, (II) a detailed analysis of various GAN applications in different domains, and (III) a detailed study about the various GAN training obstacles as well as training solutions. Finally, we reveal several issues as well as research outlines to the topic.

2022 ◽  
Vol 41 (1) ◽  
pp. 1-21
Linchao Bao ◽  
Xiangkai Lin ◽  
Yajing Chen ◽  
Haoxian Zhang ◽  
Sheng Wang ◽  

We present a fully automatic system that can produce high-fidelity, photo-realistic three-dimensional (3D) digital human heads with a consumer RGB-D selfie camera. The system only needs the user to take a short selfie RGB-D video while rotating his/her head and can produce a high-quality head reconstruction in less than 30 s. Our main contribution is a new facial geometry modeling and reflectance synthesis procedure that significantly improves the state of the art. Specifically, given the input video a two-stage frame selection procedure is first employed to select a few high-quality frames for reconstruction. Then a differentiable renderer-based 3D Morphable Model (3DMM) fitting algorithm is applied to recover facial geometries from multiview RGB-D data, which takes advantages of a powerful 3DMM basis constructed with extensive data generation and perturbation. Our 3DMM has much larger expressive capacities than conventional 3DMM, allowing us to recover more accurate facial geometry using merely linear basis. For reflectance synthesis, we present a hybrid approach that combines parametric fitting and Convolutional Neural Networks (CNNs) to synthesize high-resolution albedo/normal maps with realistic hair/pore/wrinkle details. Results show that our system can produce faithful 3D digital human faces with extremely realistic details. The main code and the newly constructed 3DMM basis is publicly available.

2022 ◽  
Gang Seob Jung ◽  
Hoon Joo Myung ◽  
Stephan Irle

Abstract Atomistic understanding of mechanics and failure of materials is the key for engineering and applications. Modeling accurately brittle failure with crack propagation in covalent crystals requires a quantum mechanics-based description of individual bond-breaking events for large system sizes. Machine Learned (ML) potentials have emerged to overcome the traditional, physics-based modeling tradeoff between accuracy and accessible time and length scales. Previous studies have shown successful applications of ML potentials for describing the structure and dynamics of molecular systems and amorphous or liquid phases of materials. However, their application to deformation and failure processes in materials is yet uncommon. In this study, we discuss apparent limitations of ML potentials to describe deformation and fracture under loadings and propose a way to generate and select training data for their employment in simulations of deformation and fracture of crystals. We applied the proposed approach to 2D crystal graphene, utilizing the density-functional tight-binding (DFTB) method for more efficient and extensive data generation in place of density functional theory (DFT). Then, we explore how the data selection affects the accuracy of the developed artificial neural network potential (NNP), indicating that only the errors in total energies and atomic forces are insufficient to judge the NNP’s reliability. Therefore, we evaluate and select NNPs based on their performance in describing physical properties, e.g., stress-strain curves and geometric deformation. In sharp contrast to popular reactive bond order potentials, our optimized NNP predicts straight crack propagation in graphene along both armchair and zigzag lattice directions, as well as higher fracture toughness of zigzag edge direction. Our study provides significant insight into crack propagation mechanisms at atomic scales and highlights strategies for NNP developments of broader materials.

Damian JJ Farnell

3D facial surface imaging is a useful tool in dentistry and in terms of diagnostics and treatment planning. Between-groups PCA (bgPCA) is a method that has been used to analyse shapes in biological morphometrics, although various “pathologies” of bgPCA have recently been proposed. Monte Carlo (MC) simulated datasets were created here in order to explore “pathologies” of multilevel PCA (mPCA), where mPCA with two levels is equivalent to bgPCA. The first set of MC experiments involved 300 uncorrelated normally distributed variables, whereas the second set of MC experiments used correlated multivariate MC data describing 3D facial shape. We confirmed previous results of other researchers that indicated that bgPCA (and so also mPCA) can give a false impression of strong differences in component scores between groups when there is none in reality. These spurious differences in component scores via mPCA reduced strongly as the sample sizes per group were increased. Eigenvalues via mPCA were also found to be strongly effected by imbalances in sample sizes per group, although this problem was removed by using weighted forms of covariance matrices suggested by the maximum likelihood solution of the two-level model. However, this did not solve problems of spurious differences between groups in these simulations, which was driven by very small sample sizes in one group here. As a “rule of thumb” only, all of our experiments indicate that reasonable results are obtained when sample sizes per group in all groups are at least equal to the number of variables. Interestingly, the sum of all eigenvalues over both levels via mPCA scaled approximately linearly with the inverse of the sample size per group in all experiments. Finally, between-group variation was added explicitly to the MC data generation model in two experiments considered here. Results for the sum of all eigenvalues via mPCA predicted the asymptotic amount for the total amount of variance correctly in this case, whereas standard “single-level” PCA underestimated this quantity.

2022 ◽  
Vol 8 ◽  
Yuan Chiang ◽  
Ting-Wai Chiu ◽  
Shu-Wei Chang

The emerging demand for advanced structural and biological materials calls for novel modeling tools that can rapidly yield high-fidelity estimation on materials properties in design cycles. Lattice spring model , a coarse-grained particle spring network, has gained attention in recent years for predicting the mechanical properties and giving insights into the fracture mechanism with high reproducibility and generalizability. However, to simulate the materials in sufficient detail for guaranteed numerical stability and convergence, most of the time a large number of particles are needed, greatly diminishing the potential for high-throughput computation and therewith data generation for machine learning frameworks. Here, we implement CuLSM, a GPU-accelerated compute unified device architecture C++ code realizing parallelism over the spring list instead of the commonly used spatial decomposition, which requires intermittent updates on the particle neighbor list. Along with the image-to-particle conversion tool Img2Particle, our toolkit offers a fast and flexible platform to characterize the elastic and fracture behaviors of materials, expediting the design process between additive manufacturing and computer-aided design. With the growing demand for new lightweight, adaptable, and multi-functional materials and structures, such tailored and optimized modeling platform has profound impacts, enabling faster exploration in design spaces, better quality control for 3D printing by digital twin techniques, and larger data generation pipelines for image-based generative machine learning models.

Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 497
Sébastien Villon ◽  
Corina Iovan ◽  
Morgan Mangeas ◽  
Laurent Vigliola

With the availability of low-cost and efficient digital cameras, ecologists can now survey the world’s biodiversity through image sensors, especially in the previously rather inaccessible marine realm. However, the data rapidly accumulates, and ecologists face a data processing bottleneck. While computer vision has long been used as a tool to speed up image processing, it is only since the breakthrough of deep learning (DL) algorithms that the revolution in the automatic assessment of biodiversity by video recording can be considered. However, current applications of DL models to biodiversity monitoring do not consider some universal rules of biodiversity, especially rules on the distribution of species abundance, species rarity and ecosystem openness. Yet, these rules imply three issues for deep learning applications: the imbalance of long-tail datasets biases the training of DL models; scarce data greatly lessens the performances of DL models for classes with few data. Finally, the open-world issue implies that objects that are absent from the training dataset are incorrectly classified in the application dataset. Promising solutions to these issues are discussed, including data augmentation, data generation, cross-entropy modification, few-shot learning and open set recognition. At a time when biodiversity faces the immense challenges of climate change and the Anthropocene defaunation, stronger collaboration between computer scientists and ecologists is urgently needed to unlock the automatic monitoring of biodiversity.

2022 ◽  
Mengyan Zhang ◽  
Maciej B Holowko ◽  
Huw Hayman Zumpe ◽  
Cheng Soon Ong

Optimisation of gene expression levels is an essential part of the organism design process. Fine control of this process can be achieved through engineering transcription and translation control elements, including the ribosome binding site (RBS). Unfortunately, design of specific genetic parts can still be challenging due to lack of reliable design methods. To address this problem, we have created a machine learning guided Design-Build-Test-Learn (DBTL) cycle for the experimental design of bacterial RBSs to show how small genetic parts can be reliably designed using relatively small, high-quality data sets. We used Gaussian Process Regression for the Learn phase of cycle and the Upper Confidence Bound multi-armed bandit algorithm for the Design of genetic variants to be tested in vivo. We have integrated these machine learning algorithms with laboratory automation and high-throughput processes for reliable data generation. Notably, by Testing a total of 450 RBS variants in four DBTL cycles, we experimentally validated RBSs with high translation initiation rates equalling or exceeding our benchmark RBS by up to 34%. Overall, our results show that machine learning is a powerful tool for designing RBSs, and they pave the way towards more complicated genetic devices.

2022 ◽  
Seunghwan Park ◽  
Hae-Wwan Lee ◽  
Jongho Im

<div>We consider the binary classification of imbalanced data. A dataset is imbalanced if the proportion of classes are heavily skewed. Imbalanced data classification is often challengeable, especially for high-dimensional data, because unequal classes deteriorate classifier performance. Under sampling the majority class or oversampling the minority class are popular methods to construct balanced samples, facilitating classification performance improvement. However, many existing sampling methods cannot be easily extended to high-dimensional data and mixed data, including categorical variables, because they often require approximating the attribute distributions, which becomes another critical issue. In this paper, we propose a new sampling strategy employing raking and relabeling procedures, such that the attribute values of the majority class are imputed for the values of the minority class in the construction of balanced samples. The proposed algorithms produce comparable performance as existing popular methods but are more flexible regarding the data shape and attribute size. The sampling algorithm is attractive in practice, considering that it does not require density estimation for synthetic data generation in oversampling and is not bothered by mixed-type variables. In addition, the proposed sampling strategy is robust to classifiers in the sense that classification performance is not sensitive to choosing the classifiers.</div>

Sign in / Sign up

Export Citation Format

Share Document